[Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
@ 2020-06-07 22:20 Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible Chris Wilson
                   ` (32 more replies)
  0 siblings, 33 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Sentinels are supposed to be last reqeusts in the elsp queue, not the
only one, so adjust the assert accordingly.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
 1 file changed, 3 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index d55a5e0466e5..db8a170b0e5c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
 		ccid = ce->lrc.ccid;
 
 		/*
-		 * Sentinels are supposed to be lonely so they flush the
-		 * current exection off the HW. Check that they are the
-		 * only request in the pending submission.
+		 * Sentinels are supposed to be the last request so they flush
+		 * the current exection off the HW. Check that they are the only
+		 * request in the pending submission.
 		 */
 		if (sentinel) {
 			GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
@@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
 				      port - execlists->pending);
 			return false;
 		}
-
 		sentinel = i915_request_has_sentinel(rq);
-		if (sentinel && port != execlists->pending) {
-			GEM_TRACE_ERR("%s: sentinel context:%llx not in prime position[%zd]\n",
-				      engine->name,
-				      ce->timeline->fence_context,
-				      port - execlists->pending);
-			return false;
-		}
 
 		/* Hold tightly onto the lock to prevent concurrent retires! */
 		if (!spin_trylock_irqsave(&rq->lock, flags))
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-08 20:58   ` Mika Kuoppala
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 03/28] drm/i915/selftests: Teach hang-self to target only itself Chris Wilson
                   ` (31 subsequent siblings)
  32 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

In some of our hangtests, we try to reset an active engine while it is
spinning inside the recursive spinner. However, we also try to flood the
engine with requests that preempt the hang, and so should disable the
preemption to be sure that we reset the right request.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 36 ++++++++++++++------
 1 file changed, 26 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 4aa4cc917d8b..035f363fb0f8 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -203,12 +203,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = upper_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 
 		memset(batch, 0, 1024);
 		batch += 1024 / sizeof(*batch);
 
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 		*batch++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
 		*batch++ = lower_32_bits(vma->node.start);
 		*batch++ = upper_32_bits(vma->node.start);
@@ -217,12 +217,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 		*batch++ = 0;
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 
 		memset(batch, 0, 1024);
 		batch += 1024 / sizeof(*batch);
 
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 		*batch++ = MI_BATCH_BUFFER_START | 1 << 8;
 		*batch++ = lower_32_bits(vma->node.start);
 	} else if (INTEL_GEN(gt->i915) >= 4) {
@@ -230,24 +230,24 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 		*batch++ = 0;
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 
 		memset(batch, 0, 1024);
 		batch += 1024 / sizeof(*batch);
 
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 		*batch++ = MI_BATCH_BUFFER_START | 2 << 6;
 		*batch++ = lower_32_bits(vma->node.start);
 	} else {
 		*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
 		*batch++ = lower_32_bits(hws_address(hws, rq));
 		*batch++ = rq->fence.seqno;
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 
 		memset(batch, 0, 1024);
 		batch += 1024 / sizeof(*batch);
 
-		*batch++ = MI_ARB_CHECK;
+		*batch++ = MI_NOOP;
 		*batch++ = MI_BATCH_BUFFER_START | 2 << 6;
 		*batch++ = lower_32_bits(vma->node.start);
 	}
@@ -866,13 +866,29 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			count++;
 
 			if (rq) {
+				if (rq->fence.error != -EIO) {
+					pr_err("i915_reset_engine(%s:%s):"
+					       " failed to reset request %llx:%lld\n",
+					       engine->name, test_name,
+					       rq->fence.context,
+					       rq->fence.seqno);
+					i915_request_put(rq);
+
+					GEM_TRACE_DUMP();
+					intel_gt_set_wedged(gt);
+					err = -EIO;
+					break;
+				}
+
 				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
 					struct drm_printer p =
 						drm_info_printer(gt->i915->drm.dev);
 
 					pr_err("i915_reset_engine(%s:%s):"
-					       " failed to complete request after reset\n",
-					       engine->name, test_name);
+					       " failed to complete request %llx:%lld after reset\n",
+					       engine->name, test_name,
+					       rq->fence.context,
+					       rq->fence.seqno);
 					intel_engine_dump(engine, &p,
 							  "%s\n", engine->name);
 					i915_request_put(rq);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 03/28] drm/i915/selftests: Teach hang-self to target only itself
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-10 13:21   ` Mika Kuoppala
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 04/28] drm/i915/selftests: Remove live_suppress_wait_preempt Chris Wilson
                   ` (30 subsequent siblings)
  32 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

We have a test case to exercise resetting an engine while the other
engines are busy, all the TEST_SELF adds on top is that the target
engine also has background activity. In this case it is useful to first
test resetting the engine while there is background activity, as a
separate flag from exercising all others.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 035f363fb0f8..2af66f8ffbd2 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -805,10 +805,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
 			threads[tmp].resets =
 				i915_reset_engine_count(global, other);
 
-			if (!(flags & TEST_OTHERS))
+			if (other == engine && !(flags & TEST_SELF))
 				continue;
 
-			if (other == engine && !(flags & TEST_SELF))
+			if (other != engine && !(flags & TEST_OTHERS))
 				continue;
 
 			threads[tmp].engine = other;
@@ -999,7 +999,7 @@ static int igt_reset_engines(void *arg)
 		},
 		{
 			"self-priority",
-			TEST_OTHERS | TEST_ACTIVE | TEST_PRIORITY | TEST_SELF,
+			TEST_ACTIVE | TEST_PRIORITY | TEST_SELF,
 		},
 		{ }
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 04/28] drm/i915/selftests: Remove live_suppress_wait_preempt
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 03/28] drm/i915/selftests: Teach hang-self to target only itself Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-11 11:38   ` Tvrtko Ursulin
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 05/28] drm/i915/selftests: Trim execlists runtime Chris Wilson
                   ` (29 subsequent siblings)
  32 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

With the removal of the internal wait-priority boosting, we can also
remove the selftest to ensure that those waits were being suppressed
from causing preemptions.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 178 -------------------------
 1 file changed, 178 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 67d74e6432a8..e838e38a262c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -2379,183 +2379,6 @@ static int live_suppress_self_preempt(void *arg)
 	goto err_client_b;
 }
 
-static int __i915_sw_fence_call
-dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
-{
-	return NOTIFY_DONE;
-}
-
-static struct i915_request *dummy_request(struct intel_engine_cs *engine)
-{
-	struct i915_request *rq;
-
-	rq = kzalloc(sizeof(*rq), GFP_KERNEL);
-	if (!rq)
-		return NULL;
-
-	rq->engine = engine;
-
-	spin_lock_init(&rq->lock);
-	INIT_LIST_HEAD(&rq->fence.cb_list);
-	rq->fence.lock = &rq->lock;
-	rq->fence.ops = &i915_fence_ops;
-
-	i915_sched_node_init(&rq->sched);
-
-	/* mark this request as permanently incomplete */
-	rq->fence.seqno = 1;
-	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
-	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
-	GEM_BUG_ON(i915_request_completed(rq));
-
-	i915_sw_fence_init(&rq->submit, dummy_notify);
-	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
-
-	spin_lock_init(&rq->lock);
-	rq->fence.lock = &rq->lock;
-	INIT_LIST_HEAD(&rq->fence.cb_list);
-
-	return rq;
-}
-
-static void dummy_request_free(struct i915_request *dummy)
-{
-	/* We have to fake the CS interrupt to kick the next request */
-	i915_sw_fence_commit(&dummy->submit);
-
-	i915_request_mark_complete(dummy);
-	dma_fence_signal(&dummy->fence);
-
-	i915_sched_node_fini(&dummy->sched);
-	i915_sw_fence_fini(&dummy->submit);
-
-	dma_fence_free(&dummy->fence);
-}
-
-static int live_suppress_wait_preempt(void *arg)
-{
-	struct intel_gt *gt = arg;
-	struct preempt_client client[4];
-	struct i915_request *rq[ARRAY_SIZE(client)] = {};
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-	int err = -ENOMEM;
-	int i;
-
-	/*
-	 * Waiters are given a little priority nudge, but not enough
-	 * to actually cause any preemption. Double check that we do
-	 * not needlessly generate preempt-to-idle cycles.
-	 */
-
-	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
-		return 0;
-
-	if (preempt_client_init(gt, &client[0])) /* ELSP[0] */
-		return -ENOMEM;
-	if (preempt_client_init(gt, &client[1])) /* ELSP[1] */
-		goto err_client_0;
-	if (preempt_client_init(gt, &client[2])) /* head of queue */
-		goto err_client_1;
-	if (preempt_client_init(gt, &client[3])) /* bystander */
-		goto err_client_2;
-
-	for_each_engine(engine, gt, id) {
-		int depth;
-
-		if (!intel_engine_has_preemption(engine))
-			continue;
-
-		if (!engine->emit_init_breadcrumb)
-			continue;
-
-		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
-			struct i915_request *dummy;
-
-			engine->execlists.preempt_hang.count = 0;
-
-			dummy = dummy_request(engine);
-			if (!dummy)
-				goto err_client_3;
-
-			for (i = 0; i < ARRAY_SIZE(client); i++) {
-				struct i915_request *this;
-
-				this = spinner_create_request(&client[i].spin,
-							      client[i].ctx, engine,
-							      MI_NOOP);
-				if (IS_ERR(this)) {
-					err = PTR_ERR(this);
-					goto err_wedged;
-				}
-
-				/* Disable NEWCLIENT promotion */
-				__i915_active_fence_set(&i915_request_timeline(this)->last_request,
-							&dummy->fence);
-
-				rq[i] = i915_request_get(this);
-				i915_request_add(this);
-			}
-
-			dummy_request_free(dummy);
-
-			GEM_BUG_ON(i915_request_completed(rq[0]));
-			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
-				pr_err("%s: First client failed to start\n",
-				       engine->name);
-				goto err_wedged;
-			}
-			GEM_BUG_ON(!i915_request_started(rq[0]));
-
-			if (i915_request_wait(rq[depth],
-					      I915_WAIT_PRIORITY,
-					      1) != -ETIME) {
-				pr_err("%s: Waiter depth:%d completed!\n",
-				       engine->name, depth);
-				goto err_wedged;
-			}
-
-			for (i = 0; i < ARRAY_SIZE(client); i++) {
-				igt_spinner_end(&client[i].spin);
-				i915_request_put(rq[i]);
-				rq[i] = NULL;
-			}
-
-			if (igt_flush_test(gt->i915))
-				goto err_wedged;
-
-			if (engine->execlists.preempt_hang.count) {
-				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
-				       engine->name,
-				       engine->execlists.preempt_hang.count,
-				       depth);
-				err = -EINVAL;
-				goto err_client_3;
-			}
-		}
-	}
-
-	err = 0;
-err_client_3:
-	preempt_client_fini(&client[3]);
-err_client_2:
-	preempt_client_fini(&client[2]);
-err_client_1:
-	preempt_client_fini(&client[1]);
-err_client_0:
-	preempt_client_fini(&client[0]);
-	return err;
-
-err_wedged:
-	for (i = 0; i < ARRAY_SIZE(client); i++) {
-		igt_spinner_end(&client[i].spin);
-		i915_request_put(rq[i]);
-	}
-	intel_gt_set_wedged(gt);
-	err = -EIO;
-	goto err_client_3;
-}
-
 static int live_chain_preempt(void *arg)
 {
 	struct intel_gt *gt = arg;
@@ -4592,7 +4415,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_nopreempt),
 		SUBTEST(live_preempt_cancel),
 		SUBTEST(live_suppress_self_preempt),
-		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_gang),
 		SUBTEST(live_preempt_timeout),
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 05/28] drm/i915/selftests: Trim execlists runtime
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (2 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 04/28] drm/i915/selftests: Remove live_suppress_wait_preempt Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-12 23:05   ` Andi Shyti
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
                   ` (28 subsequent siblings)
  32 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Reduce the smoke depth by trimming the number of contexts, repetitions
and wait times. This is in preparation for a less greedy scheduler that
tries to be fair across contexts, resulting in a great many more context
switches. A thousand context switches may be 50-100ms, causing us to
timeout as the HW is not fast enough to complete the deep smoketests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c       | 66 ++++++--------------
 drivers/gpu/drm/i915/selftests/igt_spinner.c |  4 +-
 2 files changed, 21 insertions(+), 49 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index e838e38a262c..f651bdf7f191 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -845,10 +845,11 @@ static int live_timeslice_preempt(void *arg)
 {
 	struct intel_gt *gt = arg;
 	struct drm_i915_gem_object *obj;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	struct i915_vma *vma;
 	void *vaddr;
 	int err = 0;
-	int count;
 
 	/*
 	 * If a request takes too long, we would like to give other users
@@ -885,26 +886,21 @@ static int live_timeslice_preempt(void *arg)
 	if (err)
 		goto err_pin;
 
-	for_each_prime_number_from(count, 1, 16) {
-		struct intel_engine_cs *engine;
-		enum intel_engine_id id;
-
-		for_each_engine(engine, gt, id) {
-			if (!intel_engine_has_preemption(engine))
-				continue;
+	for_each_engine(engine, gt, id) {
+		if (!intel_engine_has_preemption(engine))
+			continue;
 
-			memset(vaddr, 0, PAGE_SIZE);
+		memset(vaddr, 0, PAGE_SIZE);
 
-			engine_heartbeat_disable(engine);
-			err = slice_semaphore_queue(engine, vma, count);
-			engine_heartbeat_enable(engine);
-			if (err)
-				goto err_pin;
+		engine_heartbeat_disable(engine);
+		err = slice_semaphore_queue(engine, vma, 5);
+		engine_heartbeat_enable(engine);
+		if (err)
+			goto err_pin;
 
-			if (igt_flush_test(gt->i915)) {
-				err = -EIO;
-				goto err_pin;
-			}
+		if (igt_flush_test(gt->i915)) {
+			err = -EIO;
+			goto err_pin;
 		}
 	}
 
@@ -1251,22 +1247,6 @@ static int live_timeslice_queue(void *arg)
 			intel_engine_flush_submission(engine);
 		} while (READ_ONCE(engine->execlists.pending[0]));
 
-		if (!READ_ONCE(engine->execlists.timer.expires) &&
-		    execlists_active(&engine->execlists) == rq &&
-		    !i915_request_completed(rq)) {
-			struct drm_printer p =
-				drm_info_printer(gt->i915->drm.dev);
-
-			GEM_TRACE_ERR("%s: Failed to enable timeslicing!\n",
-				      engine->name);
-			intel_engine_dump(engine, &p,
-					  "%s\n", engine->name);
-			GEM_TRACE_DUMP();
-
-			memset(vaddr, 0xff, PAGE_SIZE);
-			err = -EINVAL;
-		}
-
 		/* Timeslice every jiffy, so within 2 we should signal */
 		if (i915_request_wait(rq, 0, slice_timeout(engine)) < 0) {
 			struct drm_printer p =
@@ -2671,16 +2651,8 @@ static int live_preempt_gang(void *arg)
 
 			/* Submit each spinner at increasing priority */
 			engine->schedule(rq, &attr);
-
-			if (prio < attr.priority)
-				break;
-
-			if (prio <= I915_PRIORITY_MAX)
-				continue;
-
-			if (__igt_timeout(end_time, NULL))
-				break;
-		} while (1);
+		} while (prio <= I915_PRIORITY_MAX &&
+			 !__igt_timeout(end_time, NULL));
 		pr_debug("%s: Preempt chain of %d requests\n",
 			 engine->name, prio);
 
@@ -3248,7 +3220,7 @@ static int smoke_crescendo_thread(void *arg)
 			return err;
 
 		count++;
-	} while (!__igt_timeout(end_time, NULL));
+	} while (count < smoke->ncontext && !__igt_timeout(end_time, NULL));
 
 	smoke->count = count;
 	return 0;
@@ -3324,7 +3296,7 @@ static int smoke_random(struct preempt_smoke *smoke, unsigned int flags)
 
 			count++;
 		}
-	} while (!__igt_timeout(end_time, NULL));
+	} while (count < smoke->ncontext && !__igt_timeout(end_time, NULL));
 
 	pr_info("Submitted %lu random:%x requests across %d engines and %d contexts\n",
 		count, flags,
@@ -3337,7 +3309,7 @@ static int live_preempt_smoke(void *arg)
 	struct preempt_smoke smoke = {
 		.gt = arg,
 		.prng = I915_RND_STATE_INITIALIZER(i915_selftest.random_seed),
-		.ncontext = 1024,
+		.ncontext = 256,
 	};
 	const unsigned int phase[] = { 0, BATCH };
 	struct igt_live_test t;
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 699bfe0328fb..ec0ecb4e4ca6 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -221,8 +221,8 @@ bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
 {
 	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
 					       rq->fence.seqno),
-			     10) &&
+			     100) &&
 		 wait_for(i915_seqno_passed(hws_seqno(spin, rq),
 					    rq->fence.seqno),
-			  1000));
+			  50));
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 06/28] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (3 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 05/28] drm/i915/selftests: Trim execlists runtime Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 07/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
                   ` (27 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rather than going back and forth between the rb_node entry and the
virtual_engine type, store the ve local and reuse it. As the
container_of conversion from rb_node to virtual_engine requires a
variable offset, performing that conversion just once shaves off a bit
of code.

v2: Keep a single virtual engine lookup, for typical use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 217 +++++++++++++---------------
 1 file changed, 104 insertions(+), 113 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index db8a170b0e5c..ab1f3131b357 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -454,7 +454,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq,
-				struct rb_node *rb)
+				struct virtual_engine *ve)
 {
 	int last_prio;
 
@@ -491,9 +491,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
 		return true;
 
-	if (rb) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	if (ve) {
 		bool preempt = false;
 
 		if (engine == ve->siblings[0]) { /* only preempt one sibling */
@@ -1811,6 +1809,35 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	return true;
 }
 
+static struct virtual_engine *
+first_virtual_engine(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *el = &engine->execlists;
+	struct rb_node *rb = rb_first_cached(&el->virtual);
+
+	while (rb) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (!rq) { /* lazily cleanup after another engine handled rq */
+			rb_erase_cached(rb, &el->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&el->virtual);
+			continue;
+		}
+
+		if (!virtual_matches(ve, rq, engine)) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		return ve;
+	}
+
+	return NULL;
+}
+
 static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
 {
 	/*
@@ -1895,7 +1922,7 @@ static void defer_active(struct intel_engine_cs *engine)
 static bool
 need_timeslice(const struct intel_engine_cs *engine,
 	       const struct i915_request *rq,
-	       const struct rb_node *rb)
+	       struct virtual_engine *ve)
 {
 	int hint;
 
@@ -1904,9 +1931,7 @@ need_timeslice(const struct intel_engine_cs *engine,
 
 	hint = engine->execlists.queue_priority_hint;
 
-	if (rb) {
-		const struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	if (ve) {
 		const struct intel_engine_cs *inflight =
 			intel_context_inflight(&ve->context);
 
@@ -2057,7 +2082,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **port = execlists->pending;
 	struct i915_request ** const last_port = port + execlists->port_mask;
-	struct i915_request * const *active;
+	struct i915_request * const *active = READ_ONCE(execlists->active);
+	struct virtual_engine *ve = first_virtual_engine(engine);
 	struct i915_request *last;
 	struct rb_node *rb;
 	bool submit = false;
@@ -2084,26 +2110,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
-		struct i915_request *rq = READ_ONCE(ve->request);
-
-		if (!rq) { /* lazily cleanup after another engine handled rq */
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
-
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
-
-		break;
-	}
-
 	/*
 	 * If the queue is higher priority than the last
 	 * request in the currently active context, submit afresh.
@@ -2111,10 +2117,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * the active context to interject the preemption request,
 	 * i.e. we will retrigger preemption following the ack in case
 	 * of trouble.
-	 */
-	active = READ_ONCE(execlists->active);
-
-	/*
+	 *
 	 * In theory we can skip over completed contexts that have not
 	 * yet been processed by events (as those events are in flight):
 	 *
@@ -2125,9 +2128,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * find itself trying to jump back into a context it has just
 	 * completed and barf.
 	 */
-
 	if ((last = *active)) {
-		if (need_preempt(engine, last, rb)) {
+		if (need_preempt(engine, last, ve)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
 				return;
@@ -2158,7 +2160,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			__unwind_incomplete_requests(engine);
 
 			last = NULL;
-		} else if (need_timeslice(engine, last, rb) &&
+		} else if (need_timeslice(engine, last, ve) &&
 			   timeslice_expired(execlists, last)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
@@ -2212,110 +2214,99 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 	}
 
-	while (rb) { /* XXX virtual is always taking precedence */
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	while (ve) { /* XXX virtual is always taking precedence */
 		struct i915_request *rq;
 
 		spin_lock(&ve->base.active.lock);
 
 		rq = ve->request;
-		if (unlikely(!rq)) { /* lost the race to a sibling */
-			spin_unlock(&ve->base.active.lock);
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
+		if (unlikely(!rq)) /* lost the race to a sibling */
+			goto unlock;
 
 		GEM_BUG_ON(rq != ve->request);
 		GEM_BUG_ON(rq->engine != &ve->base);
 		GEM_BUG_ON(rq->context != &ve->context);
 
-		if (rq_prio(rq) >= queue_prio(execlists)) {
-			if (!virtual_matches(ve, rq, engine)) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_next(rb);
-				continue;
-			}
+		if (unlikely(rq_prio(rq) < queue_prio(execlists))) {
+			spin_unlock(&ve->base.active.lock);
+			break;
+		}
 
-			if (last && !can_merge_rq(last, rq)) {
-				spin_unlock(&ve->base.active.lock);
-				start_timeslice(engine, rq_prio(rq));
-				return; /* leave this for another sibling */
-			}
+		GEM_BUG_ON(!virtual_matches(ve, rq, engine));
 
-			ENGINE_TRACE(engine,
-				     "virtual rq=%llx:%lld%s, new engine? %s\n",
-				     rq->fence.context,
-				     rq->fence.seqno,
-				     i915_request_completed(rq) ? "!" :
-				     i915_request_started(rq) ? "*" :
-				     "",
-				     yesno(engine != ve->siblings[0]));
-
-			WRITE_ONCE(ve->request, NULL);
-			WRITE_ONCE(ve->base.execlists.queue_priority_hint,
-				   INT_MIN);
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
+		if (last && !can_merge_rq(last, rq)) {
+			spin_unlock(&ve->base.active.lock);
+			start_timeslice(engine, rq_prio(rq));
+			return; /* leave this for another sibling */
+		}
 
-			GEM_BUG_ON(!(rq->execution_mask & engine->mask));
-			WRITE_ONCE(rq->engine, engine);
+		ENGINE_TRACE(engine,
+			     "virtual rq=%llx:%lld%s, new engine? %s\n",
+			     rq->fence.context,
+			     rq->fence.seqno,
+			     i915_request_completed(rq) ? "!" :
+			     i915_request_started(rq) ? "*" :
+			     "",
+			     yesno(engine != ve->siblings[0]));
 
-			if (engine != ve->siblings[0]) {
-				u32 *regs = ve->context.lrc_reg_state;
-				unsigned int n;
+		WRITE_ONCE(ve->request, NULL);
+		WRITE_ONCE(ve->base.execlists.queue_priority_hint,
+			   INT_MIN);
 
-				GEM_BUG_ON(READ_ONCE(ve->context.inflight));
+		rb = &ve->nodes[engine->id].rb;
+		rb_erase_cached(rb, &execlists->virtual);
+		RB_CLEAR_NODE(rb);
 
-				if (!intel_engine_has_relative_mmio(engine))
-					virtual_update_register_offsets(regs,
-									engine);
+		GEM_BUG_ON(!(rq->execution_mask & engine->mask));
+		WRITE_ONCE(rq->engine, engine);
 
-				if (!list_empty(&ve->context.signals))
-					virtual_xfer_breadcrumbs(ve);
+		if (engine != ve->siblings[0]) {
+			u32 *regs = ve->context.lrc_reg_state;
+			unsigned int n;
 
-				/*
-				 * Move the bound engine to the top of the list
-				 * for future execution. We then kick this
-				 * tasklet first before checking others, so that
-				 * we preferentially reuse this set of bound
-				 * registers.
-				 */
-				for (n = 1; n < ve->num_siblings; n++) {
-					if (ve->siblings[n] == engine) {
-						swap(ve->siblings[n],
-						     ve->siblings[0]);
-						break;
-					}
-				}
+			GEM_BUG_ON(READ_ONCE(ve->context.inflight));
 
-				GEM_BUG_ON(ve->siblings[0] != engine);
-			}
+			if (!intel_engine_has_relative_mmio(engine))
+				virtual_update_register_offsets(regs,
+								engine);
 
-			if (__i915_request_submit(rq)) {
-				submit = true;
-				last = rq;
-			}
-			i915_request_put(rq);
+			if (!list_empty(&ve->context.signals))
+				virtual_xfer_breadcrumbs(ve);
 
 			/*
-			 * Hmm, we have a bunch of virtual engine requests,
-			 * but the first one was already completed (thanks
-			 * preempt-to-busy!). Keep looking at the veng queue
-			 * until we have no more relevant requests (i.e.
-			 * the normal submit queue has higher priority).
+			 * Move the bound engine to the top of the list for
+			 * future execution. We then kick this tasklet first
+			 * before checking others, so that we preferentially
+			 * reuse this set of bound registers.
 			 */
-			if (!submit) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_first_cached(&execlists->virtual);
-				continue;
+			for (n = 1; n < ve->num_siblings; n++) {
+				if (ve->siblings[n] == engine) {
+					swap(ve->siblings[n],
+					     ve->siblings[0]);
+					break;
+				}
 			}
+
+			GEM_BUG_ON(ve->siblings[0] != engine);
 		}
 
+		if (__i915_request_submit(rq)) {
+			submit = true;
+			last = rq;
+		}
+
+		i915_request_put(rq);
+unlock:
 		spin_unlock(&ve->base.active.lock);
-		break;
+
+		/*
+		 * Hmm, we have a bunch of virtual engine requests,
+		 * but the first one was already completed (thanks
+		 * preempt-to-busy!). Keep looking at the veng queue
+		 * until we have no more relevant requests (i.e.
+		 * the normal submit queue has higher priority).
+		 */
+		ve = submit ? NULL : first_virtual_engine(engine);
 	}
 
 	while ((rb = rb_first_cached(&execlists->queue))) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 07/28] drm/i915/gt: Decouple inflight virtual engines
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (4 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
                   ` (26 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Once a virtual engine has been bound to a sibling, it will remain bound
until we finally schedule out the last active request. We can not rebind
the context to a new sibling while it is inflight as the context save
will conflict, hence we wait. As we cannot then use any other sibliing
while the context is inflight, only kick the bound sibling while it
inflight and upon scheduling out the kick the rest (so that we can swap
engines on timeslicing if the previously bound engine becomes
oversubscribed).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index ab1f3131b357..d98e37900171 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1403,9 +1403,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
-	struct i915_request *next = READ_ONCE(ve->request);
 
-	if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
+	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
 
@@ -1820,18 +1819,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
 			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
 		struct i915_request *rq = READ_ONCE(ve->request);
 
-		if (!rq) { /* lazily cleanup after another engine handled rq */
+		/* lazily cleanup after another engine handled rq */
+		if (!rq || !virtual_matches(ve, rq, engine)) {
 			rb_erase_cached(rb, &el->virtual);
 			RB_CLEAR_NODE(rb);
 			rb = rb_first_cached(&el->virtual);
 			continue;
 		}
 
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
-
 		return ve;
 	}
 
@@ -5469,7 +5464,6 @@ static void virtual_submission_tasklet(unsigned long data)
 	if (unlikely(!mask))
 		return;
 
-	local_irq_disable();
 	for (n = 0; n < ve->num_siblings; n++) {
 		struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
 		struct ve_node * const node = &ve->nodes[sibling->id];
@@ -5479,20 +5473,19 @@ static void virtual_submission_tasklet(unsigned long data)
 		if (!READ_ONCE(ve->request))
 			break; /* already handled by a sibling's tasklet */
 
+		spin_lock_irq(&sibling->active.lock);
+
 		if (unlikely(!(mask & sibling->mask))) {
 			if (!RB_EMPTY_NODE(&node->rb)) {
-				spin_lock(&sibling->active.lock);
 				rb_erase_cached(&node->rb,
 						&sibling->execlists.virtual);
 				RB_CLEAR_NODE(&node->rb);
-				spin_unlock(&sibling->active.lock);
 			}
-			continue;
-		}
 
-		spin_lock(&sibling->active.lock);
+			goto unlock_engine;
+		}
 
-		if (!RB_EMPTY_NODE(&node->rb)) {
+		if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
 			/*
 			 * Cheat and avoid rebalancing the tree if we can
 			 * reuse this node in situ.
@@ -5532,9 +5525,12 @@ static void virtual_submission_tasklet(unsigned long data)
 		if (first && prio > sibling->execlists.queue_priority_hint)
 			tasklet_hi_schedule(&sibling->execlists.tasklet);
 
-		spin_unlock(&sibling->active.lock);
+unlock_engine:
+		spin_unlock_irq(&sibling->active.lock);
+
+		if (intel_context_inflight(&ve->context))
+			break;
 	}
-	local_irq_enable();
 }
 
 static void virtual_submit_request(struct i915_request *rq)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 08/28] drm/i915/gt: Resubmit the virtual engine on schedule-out
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (5 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 07/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 09/28] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
                   ` (25 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Having recognised that we do not change the sibling until we schedule
out, we can then defer the decision to resubmit the virtual engine from
the unwind of the active queue to scheduling out of the virtual context.

By keeping the unwind order intact on the local engine, we can preserve
data dependency ordering while doing a preempt-to-busy pass until we
have determined the new ELSP. This means that if we try to timeslice
between a virtual engine and a data-dependent ordinary request, the pair
will maintain their relative ordering and we will avoid the
resubmission, cancelling the timeslicing until further change.

The dilemma though is that we then may end up in a situation where the
'demotion' of the virtual request to an ordinary request in the engine
queue results in filling the ELSP[] with virtual requests instead of
spreading the load across the engines. To compensate for this, we mark
each virtual request and refuse to resubmit a virtual request in the
secondary ELSP slots, thus forcing subsequent virtual requests to be
scheduled out after timeslicing. By delaying the decision until we
schedule out, we will avoid unnecessary resubmission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c    | 133 ++++++++++++++++---------
 drivers/gpu/drm/i915/gt/selftest_lrc.c |   2 +-
 2 files changed, 89 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index d98e37900171..cbcbe694f931 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1117,46 +1117,17 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		__i915_request_unsubmit(rq);
 
-		/*
-		 * Push the request back into the queue for later resubmission.
-		 * If this request is not native to this physical engine (i.e.
-		 * it came from a virtual source), push it back onto the virtual
-		 * engine so that it can be moved across onto another physical
-		 * engine as load dictates.
-		 */
-		if (likely(rq->execution_mask == engine->mask)) {
-			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-			if (rq_prio(rq) != prio) {
-				prio = rq_prio(rq);
-				pl = i915_sched_lookup_priolist(engine, prio);
-			}
-			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
-
-			list_move(&rq->sched.link, pl);
-			set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(engine, prio);
+		}
+		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
-			active = rq;
-		} else {
-			struct intel_engine_cs *owner = rq->context->engine;
+		list_move(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
-			/*
-			 * Decouple the virtual breadcrumb before moving it
-			 * back to the virtual engine -- we don't want the
-			 * request to complete in the background and try
-			 * and cancel the breadcrumb on the virtual engine
-			 * (instead of the old engine where it is linked)!
-			 */
-			if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-				     &rq->fence.flags)) {
-				spin_lock_nested(&rq->lock,
-						 SINGLE_DEPTH_NESTING);
-				i915_request_cancel_breadcrumb(rq);
-				spin_unlock(&rq->lock);
-			}
-			WRITE_ONCE(rq->engine, owner);
-			owner->submit_request(rq);
-			active = NULL;
-		}
+		active = rq;
 	}
 
 	return active;
@@ -1400,12 +1371,54 @@ execlists_schedule_in(struct i915_request *rq, int idx)
 	return i915_request_get(rq);
 }
 
+static void
+resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
+{
+	struct intel_engine_cs *engine = rq->engine;
+
+	/*
+	 * Note that although __execlists_schedule_out() may be called from
+	 * inside execlists_dequeue (under the spinlock), it can only do so
+	 * as a result of request completion, and a completed request is
+	 * not resubmitted.
+	 */
+	spin_lock_irq(&engine->active.lock);
+
+	/*
+	 * Decouple the virtual breadcrumb before moving it back to the virtual
+	 * engine -- we don't want the request to complete in the background
+	 * and then try and cancel the breadcrumb on the virtual engine
+	 * (instead of the old engine where it is linked)!
+	 */
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) {
+		spin_lock_nested(&rq->lock, SINGLE_DEPTH_NESTING);
+		i915_request_cancel_breadcrumb(rq);
+		spin_unlock(&rq->lock);
+	}
+
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	WRITE_ONCE(rq->engine, &ve->base);
+	ve->base.submit_request(rq);
+
+	spin_unlock_irq(&engine->active.lock);
+}
+
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 
 	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
+
+	/*
+	 * This engine is now too busy to run this virtual request, so
+	 * see if we can find an alternative engine for it to execute on.
+	 * Once a request has become bonded to this engine, we treat it the
+	 * same as other native request.
+	 */
+	if (i915_request_in_priority_queue(rq) &&
+	    rq->execution_mask != rq->engine->mask)
+		resubmit_virtual_request(rq, ve);
 }
 
 static inline void
@@ -1645,6 +1658,20 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
 		}
 		sentinel = i915_request_has_sentinel(rq);
 
+		/*
+		 * We want virtual requests to only be in the first slot so
+		 * that they are never stuck behind a hog and can be immediately
+		 * transferred onto the next idle engine.
+		 */
+		if (rq->execution_mask != engine->mask &&
+		    port != execlists->pending) {
+			GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n",
+				      engine->name,
+				      ce->timeline->fence_context,
+				      port - execlists->pending);
+			return false;
+		}
+
 		/* Hold tightly onto the lock to prevent concurrent retires! */
 		if (!spin_trylock_irqsave(&rq->lock, flags))
 			continue;
@@ -2343,6 +2370,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				if (i915_request_has_sentinel(last))
 					goto done;
 
+				/*
+				 * We avoid submitting virtual requests into
+				 * the secondary ports so that we can migrate
+				 * the request immediately to another engine
+				 * rather than wait for the primary request.
+				 */
+				if (rq->execution_mask != engine->mask)
+					goto done;
+
 				/*
 				 * If GVT overrides us we only ever submit
 				 * port[0], leaving port[1] empty. Note that we
@@ -3148,13 +3184,6 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 	if (reset_in_progress(execlists))
 		return; /* defer until we restart the engine following reset */
 
-	/* Hopefully we clear execlists->pending[] to let us through */
-	if (READ_ONCE(execlists->pending[0]) &&
-	    tasklet_trylock(&execlists->tasklet)) {
-		process_csb(engine);
-		tasklet_unlock(&execlists->tasklet);
-	}
-
 	__execlists_submission_tasklet(engine);
 }
 
@@ -3177,11 +3206,25 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine,
 	return !list_empty(&engine->active.hold) && hold_request(rq);
 }
 
+static void flush_csb(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *el = &engine->execlists;
+
+	if (READ_ONCE(el->pending[0]) && tasklet_trylock(&el->tasklet)) {
+		if (!reset_in_progress(el))
+			process_csb(engine);
+		tasklet_unlock(&el->tasklet);
+	}
+}
+
 static void execlists_submit_request(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 	unsigned long flags;
 
+	/* Hopefully we clear execlists->pending[] to let us through */
+	flush_csb(engine);
+
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->active.lock, flags);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index f651bdf7f191..a8bcea8aa1b4 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -4289,7 +4289,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	spin_lock_irq(&engine->active.lock);
 	__unwind_incomplete_requests(engine);
 	spin_unlock_irq(&engine->active.lock);
-	GEM_BUG_ON(rq->engine != ve->engine);
+	GEM_BUG_ON(rq->engine != engine);
 
 	/* Reset the engine while keeping our active request on hold */
 	execlists_hold(engine, rq);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 09/28] drm/i915: Add list_for_each_entry_safe_continue_reverse
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (6 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
                   ` (24 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

One more list iterator variant, for when we want to unwind from inside
one list iterator with the intention of restarting from the current
entry as the new head of the list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_utils.h | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 03a73d2bd50d..6ebccdd12d4c 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -266,6 +266,12 @@ static inline int list_is_last_rcu(const struct list_head *list,
 	return READ_ONCE(list->next) == head;
 }
 
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))
+
 /*
  * Wait until the work is finally complete, even if it tries to postpone
  * by requeueing itself. Note, that if the worker never cancels itself,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (7 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 09/28] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-09  7:47   ` Tvrtko Ursulin
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 11/28] drm/i915/gem: Lift GPU relocation allocation Chris Wilson
                   ` (23 subsequent siblings)
  32 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Over the next couple of patches, we will want to lock all the modified
vma for relocation processing under a single ww_mutex. We neither want
to have to include the vma that are skipped (due to no modifications
required) nor do we want those to be marked as written too. So separate
out the reloc validation into an early step, which we can use both to
reject the execbuf before committing to making our changes, and to
filter out the unmodified vma.

This does introduce a second pass through the reloc[], but only if we
need to emit relocations.

v2: reuse the outer loop, not cut'n'paste.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 145 +++++++++++-------
 1 file changed, 86 insertions(+), 59 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 23db79b806db..01ab1e15a142 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -911,9 +911,9 @@ static void eb_destroy(const struct i915_execbuffer *eb)
 
 static inline u64
 relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
-		  const struct i915_vma *target)
+		  u64 target)
 {
-	return gen8_canonical_addr((int)reloc->delta + target->node.start);
+	return gen8_canonical_addr((int)reloc->delta + target);
 }
 
 static void reloc_cache_init(struct reloc_cache *cache,
@@ -1292,26 +1292,11 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
 	return 0;
 }
 
-static u64
-relocate_entry(struct i915_execbuffer *eb,
-	       struct i915_vma *vma,
-	       const struct drm_i915_gem_relocation_entry *reloc,
-	       const struct i915_vma *target)
-{
-	u64 target_addr = relocation_target(reloc, target);
-	int err;
-
-	err = __reloc_entry_gpu(eb, vma, reloc->offset, target_addr);
-	if (err)
-		return err;
-
-	return target->node.start | UPDATE;
-}
-
-static u64
-eb_relocate_entry(struct i915_execbuffer *eb,
-		  struct eb_vma *ev,
-		  const struct drm_i915_gem_relocation_entry *reloc)
+static int
+eb_reloc_prepare(struct i915_execbuffer *eb,
+		 struct eb_vma *ev,
+		 const struct drm_i915_gem_relocation_entry *reloc,
+		 struct drm_i915_gem_relocation_entry __user *user)
 {
 	struct drm_i915_private *i915 = eb->i915;
 	struct eb_vma *target;
@@ -1389,6 +1374,32 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 		return -EINVAL;
 	}
 
+	return 1;
+}
+
+static int
+eb_reloc_entry(struct i915_execbuffer *eb,
+	       struct eb_vma *ev,
+	       const struct drm_i915_gem_relocation_entry *reloc,
+	       struct drm_i915_gem_relocation_entry __user *user)
+{
+	struct eb_vma *target;
+	u64 offset;
+	int err;
+
+	/* we've already hold a reference to all valid objects */
+	target = eb_get_vma(eb, reloc->target_handle);
+	if (unlikely(!target))
+		return -ENOENT;
+
+	/*
+	 * If the relocation already has the right value in it, no
+	 * more work needs to be done.
+	 */
+	offset = gen8_canonical_addr(target->vma->node.start);
+	if (offset == reloc->presumed_offset)
+		return 0;
+
 	/*
 	 * If we write into the object, we need to force the synchronisation
 	 * barrier, either with an asynchronous clflush or if we executed the
@@ -1399,11 +1410,41 @@ eb_relocate_entry(struct i915_execbuffer *eb,
 	 */
 	ev->flags &= ~EXEC_OBJECT_ASYNC;
 
-	/* and update the user's relocation entry */
-	return relocate_entry(eb, ev->vma, reloc, target->vma);
+	err = __reloc_entry_gpu(eb, ev->vma, reloc->offset,
+				relocation_target(reloc, offset));
+	if (err)
+		return err;
+
+	/*
+	 * Note that reporting an error now
+	 * leaves everything in an inconsistent
+	 * state as we have *already* changed
+	 * the relocation value inside the
+	 * object. As we have not changed the
+	 * reloc.presumed_offset or will not
+	 * change the execobject.offset, on the
+	 * call we may not rewrite the value
+	 * inside the object, leaving it
+	 * dangling and causing a GPU hang. Unless
+	 * userspace dynamically rebuilds the
+	 * relocations on each execbuf rather than
+	 * presume a static tree.
+	 *
+	 * We did previously check if the relocations
+	 * were writable (access_ok), an error now
+	 * would be a strange race with mprotect,
+	 * having already demonstrated that we
+	 * can read from this userspace address.
+	 */
+	__put_user(offset, &user->presumed_offset);
+	return 0;
 }
 
-static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
+static long eb_reloc_vma(struct i915_execbuffer *eb, struct eb_vma *ev,
+			 int (*fn)(struct i915_execbuffer *eb,
+				   struct eb_vma *ev,
+				   const struct drm_i915_gem_relocation_entry *reloc,
+				   struct drm_i915_gem_relocation_entry __user *user))
 {
 #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
 	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
@@ -1411,6 +1452,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 	struct drm_i915_gem_relocation_entry __user *urelocs =
 		u64_to_user_ptr(entry->relocs_ptr);
 	unsigned long remain = entry->relocation_count;
+	int required = 0;
 
 	if (unlikely(remain > N_RELOC(ULONG_MAX)))
 		return -EINVAL;
@@ -1443,42 +1485,18 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
 
 		remain -= count;
 		do {
-			u64 offset = eb_relocate_entry(eb, ev, r);
+			int ret;
 
-			if (likely(offset == 0)) {
-			} else if ((s64)offset < 0) {
-				return (int)offset;
-			} else {
-				/*
-				 * Note that reporting an error now
-				 * leaves everything in an inconsistent
-				 * state as we have *already* changed
-				 * the relocation value inside the
-				 * object. As we have not changed the
-				 * reloc.presumed_offset or will not
-				 * change the execobject.offset, on the
-				 * call we may not rewrite the value
-				 * inside the object, leaving it
-				 * dangling and causing a GPU hang. Unless
-				 * userspace dynamically rebuilds the
-				 * relocations on each execbuf rather than
-				 * presume a static tree.
-				 *
-				 * We did previously check if the relocations
-				 * were writable (access_ok), an error now
-				 * would be a strange race with mprotect,
-				 * having already demonstrated that we
-				 * can read from this userspace address.
-				 */
-				offset = gen8_canonical_addr(offset & ~UPDATE);
-				__put_user(offset,
-					   &urelocs[r - stack].presumed_offset);
-			}
+			ret = fn(eb, ev, r, &urelocs[r - stack]);
+			if (ret < 0)
+				return ret;
+
+			required |= ret;
 		} while (r++, --count);
 		urelocs += ARRAY_SIZE(stack);
 	} while (remain);
 
-	return 0;
+	return required;
 }
 
 static int eb_relocate(struct i915_execbuffer *eb)
@@ -1497,12 +1515,21 @@ static int eb_relocate(struct i915_execbuffer *eb)
 
 	/* The objects are in their final locations, apply the relocations. */
 	if (eb->args->flags & __EXEC_HAS_RELOC) {
-		struct eb_vma *ev;
+		struct eb_vma *ev, *en;
 		int flush;
 
+		list_for_each_entry_safe(ev, en, &eb->relocs, reloc_link) {
+			err = eb_reloc_vma(eb, ev, eb_reloc_prepare);
+			if (err < 0)
+				return err;
+
+			if (err == 0)
+				list_del_init(&ev->reloc_link);
+		}
+
 		list_for_each_entry(ev, &eb->relocs, reloc_link) {
-			err = eb_relocate_vma(eb, ev);
-			if (err)
+			err = eb_reloc_vma(eb, ev, eb_reloc_entry);
+			if (err < 0)
 				break;
 		}
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 11/28] drm/i915/gem: Lift GPU relocation allocation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (8 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 12/28] drm/i915/gem: Build the reloc request first Chris Wilson
                   ` (22 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since we have reduced the relocations paths to just use the async GPU,
we can lift the request allocation to the start of the relocations.
Knowing that we use one request for all relocations will simplify
tracking the relocation fence.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 100 ++++++++++--------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |   5 +-
 2 files changed, 57 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 01ab1e15a142..e012857be129 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -900,8 +900,6 @@ eb_get_vma(const struct i915_execbuffer *eb, unsigned long handle)
 
 static void eb_destroy(const struct i915_execbuffer *eb)
 {
-	GEM_BUG_ON(eb->reloc_cache.rq);
-
 	if (eb->array)
 		eb_vma_array_put(eb->array);
 
@@ -926,7 +924,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
-	cache->rq = NULL;
 	cache->target = NULL;
 }
 
@@ -1007,13 +1004,9 @@ static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
 
 static int reloc_gpu_flush(struct reloc_cache *cache)
 {
-	struct i915_request *rq;
+	struct i915_request *rq = cache->rq;
 	int err;
 
-	rq = fetch_and_zero(&cache->rq);
-	if (!rq)
-		return 0;
-
 	if (cache->rq_vma) {
 		struct drm_i915_gem_object *obj = cache->rq_vma->obj;
 
@@ -1062,9 +1055,8 @@ static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
 	return err;
 }
 
-static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
-			     struct intel_engine_cs *engine,
-			     unsigned int len)
+static int
+__reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
 	struct intel_gt_buffer_pool_node *pool;
@@ -1154,33 +1146,14 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	return err;
 }
 
-static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
-{
-	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
-}
-
-static u32 *reloc_gpu(struct i915_execbuffer *eb,
-		      struct i915_vma *vma,
-		      unsigned int len)
+static u32 *reloc_batch_grow(struct i915_execbuffer *eb,
+			     struct i915_vma *vma,
+			     unsigned int len)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
 	u32 *cmd;
 	int err;
 
-	if (unlikely(!cache->rq)) {
-		struct intel_engine_cs *engine = eb->engine;
-
-		if (!reloc_can_use_engine(engine)) {
-			engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
-			if (!engine)
-				return ERR_PTR(-ENODEV);
-		}
-
-		err = __reloc_gpu_alloc(eb, engine, len);
-		if (unlikely(err))
-			return ERR_PTR(err);
-	}
-
 	if (vma != cache->target) {
 		err = reloc_move_to_gpu(cache->rq, vma);
 		if (unlikely(err)) {
@@ -1238,7 +1211,7 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
 	else
 		len = 3;
 
-	batch = reloc_gpu(eb, vma, len);
+	batch = reloc_batch_grow(eb, vma, len);
 	if (IS_ERR(batch))
 		return PTR_ERR(batch);
 
@@ -1499,6 +1472,47 @@ static long eb_reloc_vma(struct i915_execbuffer *eb, struct eb_vma *ev,
 	return required;
 }
 
+static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
+{
+	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
+}
+
+static int reloc_gpu_alloc(struct i915_execbuffer *eb)
+{
+	struct intel_engine_cs *engine = eb->engine;
+
+	if (!reloc_can_use_engine(engine)) {
+		engine = engine->gt->engine_class[COPY_ENGINE_CLASS][0];
+		if (!engine)
+			return -ENODEV;
+	}
+
+	return __reloc_gpu_alloc(eb, engine);
+}
+
+static int reloc_gpu(struct i915_execbuffer *eb)
+{
+	struct eb_vma *ev;
+	int flush, err;
+
+	err = reloc_gpu_alloc(eb);
+	if (err)
+		return err;
+	GEM_BUG_ON(!eb->reloc_cache.rq);
+
+	list_for_each_entry(ev, &eb->relocs, reloc_link) {
+		err = eb_reloc_vma(eb, ev, eb_reloc_entry);
+		if (err < 0)
+			goto out;
+	}
+
+out:
+	flush = reloc_gpu_flush(&eb->reloc_cache);
+	if (!err)
+		err = flush;
+	return err;
+}
+
 static int eb_relocate(struct i915_execbuffer *eb)
 {
 	int err;
@@ -1516,7 +1530,6 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	/* The objects are in their final locations, apply the relocations. */
 	if (eb->args->flags & __EXEC_HAS_RELOC) {
 		struct eb_vma *ev, *en;
-		int flush;
 
 		list_for_each_entry_safe(ev, en, &eb->relocs, reloc_link) {
 			err = eb_reloc_vma(eb, ev, eb_reloc_prepare);
@@ -1527,18 +1540,14 @@ static int eb_relocate(struct i915_execbuffer *eb)
 				list_del_init(&ev->reloc_link);
 		}
 
-		list_for_each_entry(ev, &eb->relocs, reloc_link) {
-			err = eb_reloc_vma(eb, ev, eb_reloc_entry);
-			if (err < 0)
-				break;
+		if (!list_empty(&eb->relocs)) {
+			err = reloc_gpu(eb);
+			if (err)
+				return err;
 		}
-
-		flush = reloc_gpu_flush(&eb->reloc_cache);
-		if (!err)
-			err = flush;
 	}
 
-	return err;
+	return 0;
 }
 
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
@@ -2538,9 +2547,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 		batch = vma;
 	}
 
-	/* All GPU relocation batches must be submitted prior to the user rq */
-	GEM_BUG_ON(eb.reloc_cache.rq);
-
 	/* Allocate a request for this batch buffer nice and early. */
 	eb.request = i915_request_create(eb.context);
 	if (IS_ERR(eb.request)) {
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 57c14d3340cd..50fe22d87ae1 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -36,6 +36,10 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		return err;
 
+	err = reloc_gpu_alloc(eb);
+	if (err)
+		goto unpin_vma;
+
 	/* 8-Byte aligned */
 	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
 	if (err)
@@ -63,7 +67,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	err = reloc_gpu_flush(&eb->reloc_cache);
 	if (err)
 		goto put_rq;
-	GEM_BUG_ON(eb->reloc_cache.rq);
 
 	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
 	if (err) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 12/28] drm/i915/gem: Build the reloc request first
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (9 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 11/28] drm/i915/gem: Lift GPU relocation allocation Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 13/28] drm/i915/gem: Add all GPU reloc awaits/signals en masse Chris Wilson
                   ` (21 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If we get interrupted in the middle of chaining up the relocation
entries, we will fail to submit the relocation batch. However, we will
report having already completed some of the relocations, and so the
reloc.presumed_offset will no longer match the batch contents, causing
confusion and invalid future batches. If we build the relocation request
packet first, we can always emit as far as we get up in the relocation
chain.

Fixes: 0e97fbb08055 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 49 ++++++++++---------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  8 +--
 2 files changed, 30 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index e012857be129..83cea2ea7c61 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -1002,11 +1002,27 @@ static unsigned int reloc_bb_flags(const struct reloc_cache *cache)
 	return cache->gen > 5 ? 0 : I915_DISPATCH_SECURE;
 }
 
-static int reloc_gpu_flush(struct reloc_cache *cache)
+static int reloc_gpu_emit(struct reloc_cache *cache)
 {
 	struct i915_request *rq = cache->rq;
 	int err;
 
+	err = 0;
+	if (rq->engine->emit_init_breadcrumb)
+		err = rq->engine->emit_init_breadcrumb(rq);
+	if (!err)
+		err = rq->engine->emit_bb_start(rq,
+						rq->batch->node.start,
+						PAGE_SIZE,
+						reloc_bb_flags(cache));
+
+	return err;
+}
+
+static void reloc_gpu_flush(struct reloc_cache *cache)
+{
+	struct i915_request *rq = cache->rq;
+
 	if (cache->rq_vma) {
 		struct drm_i915_gem_object *obj = cache->rq_vma->obj;
 
@@ -1018,21 +1034,8 @@ static int reloc_gpu_flush(struct reloc_cache *cache)
 		i915_gem_object_unpin_map(obj);
 	}
 
-	err = 0;
-	if (rq->engine->emit_init_breadcrumb)
-		err = rq->engine->emit_init_breadcrumb(rq);
-	if (!err)
-		err = rq->engine->emit_bb_start(rq,
-						rq->batch->node.start,
-						PAGE_SIZE,
-						reloc_bb_flags(cache));
-	if (err)
-		i915_request_set_error_once(rq, err);
-
 	intel_gt_chipset_flush(rq->engine->gt);
 	i915_request_add(rq);
-
-	return err;
 }
 
 static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
@@ -1120,7 +1123,7 @@ __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 		err = i915_vma_move_to_active(batch, rq, 0);
 	i915_vma_unlock(batch);
 	if (err)
-		goto skip_request;
+		goto err_request;
 
 	rq->batch = batch;
 	i915_vma_unpin(batch);
@@ -1133,8 +1136,6 @@ __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 	/* Return with batch mapping (cmd) still pinned */
 	goto out_pool;
 
-skip_request:
-	i915_request_set_error_once(rq, err);
 err_request:
 	i915_request_add(rq);
 err_unpin:
@@ -1167,10 +1168,8 @@ static u32 *reloc_batch_grow(struct i915_execbuffer *eb,
 	if (unlikely(cache->rq_size + len >
 		     PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
 		err = reloc_gpu_chain(cache);
-		if (unlikely(err)) {
-			i915_request_set_error_once(cache->rq, err);
+		if (unlikely(err))
 			return ERR_PTR(err);
-		}
 	}
 
 	GEM_BUG_ON(cache->rq_size + len >= PAGE_SIZE  / sizeof(u32));
@@ -1493,13 +1492,17 @@ static int reloc_gpu_alloc(struct i915_execbuffer *eb)
 static int reloc_gpu(struct i915_execbuffer *eb)
 {
 	struct eb_vma *ev;
-	int flush, err;
+	int err;
 
 	err = reloc_gpu_alloc(eb);
 	if (err)
 		return err;
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 
+	err = reloc_gpu_emit(&eb->reloc_cache);
+	if (err)
+		goto out;
+
 	list_for_each_entry(ev, &eb->relocs, reloc_link) {
 		err = eb_reloc_vma(eb, ev, eb_reloc_entry);
 		if (err < 0)
@@ -1507,9 +1510,7 @@ static int reloc_gpu(struct i915_execbuffer *eb)
 	}
 
 out:
-	flush = reloc_gpu_flush(&eb->reloc_cache);
-	if (!err)
-		err = flush;
+	reloc_gpu_flush(&eb->reloc_cache);
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 50fe22d87ae1..faed6480a792 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -40,6 +40,10 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
+	err = reloc_gpu_emit(&eb->reloc_cache);
+	if (err)
+		goto unpin_vma;
+
 	/* 8-Byte aligned */
 	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
 	if (err)
@@ -64,9 +68,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 	rq = i915_request_get(eb->reloc_cache.rq);
-	err = reloc_gpu_flush(&eb->reloc_cache);
-	if (err)
-		goto put_rq;
+	reloc_gpu_flush(&eb->reloc_cache);
 
 	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
 	if (err) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 13/28] drm/i915/gem: Add all GPU reloc awaits/signals en masse
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (10 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 12/28] drm/i915/gem: Build the reloc request first Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 14/28] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
                   ` (20 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Asynchronous waits and signaling form a traditional semaphore with all
the usual ordering problems with taking multiple locks. If we want to
add more than one wait on a shared resource by the GPU, we must ensure
that all the associated timelines are advanced atomically, ergo we must
lock all the timelines en masse.

Testcase: igt/gem_exec_reloc/basic-concurrent16
Fixes: 0e97fbb08055 ("drm/i915/gem: Use a single chained reloc batches for a single execbuf")
References: https://gitlab.freedesktop.org/drm/intel/-/issues/1889
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 114 ++++++++++++------
 .../i915/gem/selftests/i915_gem_execbuffer.c  |  24 ++--
 2 files changed, 93 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 83cea2ea7c61..8f3c1cf5af31 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -259,7 +259,6 @@ struct i915_execbuffer {
 		bool has_fence : 1;
 		bool needs_unfenced : 1;
 
-		struct i915_vma *target;
 		struct i915_request *rq;
 		struct i915_vma *rq_vma;
 		u32 *rq_cmd;
@@ -924,7 +923,6 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
-	cache->target = NULL;
 }
 
 #define RELOC_TAIL 4
@@ -1038,26 +1036,6 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 	i915_request_add(rq);
 }
 
-static int reloc_move_to_gpu(struct i915_request *rq, struct i915_vma *vma)
-{
-	struct drm_i915_gem_object *obj = vma->obj;
-	int err;
-
-	i915_vma_lock(vma);
-
-	if (obj->cache_dirty & ~obj->cache_coherent)
-		i915_gem_clflush_object(obj, 0);
-	obj->write_domain = 0;
-
-	err = i915_request_await_object(rq, vma->obj, true);
-	if (err == 0)
-		err = i915_vma_move_to_active(vma, rq, EXEC_OBJECT_WRITE);
-
-	i915_vma_unlock(vma);
-
-	return err;
-}
-
 static int
 __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 {
@@ -1147,24 +1125,12 @@ __reloc_gpu_alloc(struct i915_execbuffer *eb, struct intel_engine_cs *engine)
 	return err;
 }
 
-static u32 *reloc_batch_grow(struct i915_execbuffer *eb,
-			     struct i915_vma *vma,
-			     unsigned int len)
+static u32 *reloc_batch_grow(struct i915_execbuffer *eb, unsigned int len)
 {
 	struct reloc_cache *cache = &eb->reloc_cache;
 	u32 *cmd;
 	int err;
 
-	if (vma != cache->target) {
-		err = reloc_move_to_gpu(cache->rq, vma);
-		if (unlikely(err)) {
-			i915_request_set_error_once(cache->rq, err);
-			return ERR_PTR(err);
-		}
-
-		cache->target = vma;
-	}
-
 	if (unlikely(cache->rq_size + len >
 		     PAGE_SIZE / sizeof(u32) - RELOC_TAIL)) {
 		err = reloc_gpu_chain(cache);
@@ -1210,7 +1176,7 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
 	else
 		len = 3;
 
-	batch = reloc_batch_grow(eb, vma, len);
+	batch = reloc_batch_grow(eb, len);
 	if (IS_ERR(batch))
 		return PTR_ERR(batch);
 
@@ -1471,6 +1437,78 @@ static long eb_reloc_vma(struct i915_execbuffer *eb, struct eb_vma *ev,
 	return required;
 }
 
+static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev)
+{
+	struct i915_request *rq = cache->rq;
+	struct i915_vma *vma = ev->vma;
+	struct drm_i915_gem_object *obj = vma->obj;
+	int err;
+
+	if (obj->cache_dirty & ~obj->cache_coherent)
+		i915_gem_clflush_object(obj, 0);
+
+	obj->write_domain = I915_GEM_DOMAIN_RENDER;
+	obj->read_domains = I915_GEM_DOMAIN_RENDER;
+
+	err = i915_request_await_object(rq, obj, true);
+	if (err)
+		return err;
+
+	err = __i915_vma_move_to_active(vma, rq);
+	if (err)
+		return err;
+
+	dma_resv_add_excl_fence(vma->resv, &rq->fence);
+
+	return 0;
+}
+
+static int
+lock_relocs(struct i915_execbuffer *eb)
+{
+	struct ww_acquire_ctx acquire;
+	struct eb_vma *ev;
+	int err = 0;
+
+	ww_acquire_init(&acquire, &reservation_ww_class);
+
+	list_for_each_entry(ev, &eb->relocs, reloc_link) {
+		struct i915_vma *vma = ev->vma;
+
+		err = ww_mutex_lock_interruptible(&vma->resv->lock, &acquire);
+		if (err == -EDEADLK) {
+			struct eb_vma *unlock = ev, *en;
+
+			list_for_each_entry_safe_continue_reverse(unlock, en,
+								  &eb->relocs,
+								  reloc_link) {
+				ww_mutex_unlock(&unlock->vma->resv->lock);
+				list_move_tail(&unlock->reloc_link,
+					       &eb->relocs);
+			}
+
+			GEM_BUG_ON(!list_is_first(&ev->reloc_link,
+						  &eb->relocs));
+			err = ww_mutex_lock_slow_interruptible(&vma->resv->lock,
+							       &acquire);
+		}
+		if (err)
+			break;
+	}
+
+	ww_acquire_done(&acquire);
+
+	list_for_each_entry_continue_reverse(ev, &eb->relocs, reloc_link) {
+		if (err == 0)
+			err = reloc_move_to_gpu(&eb->reloc_cache, ev);
+		ww_mutex_unlock(&ev->vma->resv->lock);
+	}
+
+	ww_acquire_fini(&acquire);
+
+	return err;
+}
+
 static bool reloc_can_use_engine(const struct intel_engine_cs *engine)
 {
 	return engine->class != VIDEO_DECODE_CLASS || !IS_GEN(engine->i915, 6);
@@ -1499,6 +1537,10 @@ static int reloc_gpu(struct i915_execbuffer *eb)
 		return err;
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 
+	err = lock_relocs(eb);
+	if (err)
+		goto out;
+
 	err = reloc_gpu_emit(&eb->reloc_cache);
 	if (err)
 		goto out;
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index faed6480a792..4f10b51f9a7e 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -24,15 +24,15 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
 	const u32 *map = page_mask_bits(obj->mm.mapping);
 	struct i915_request *rq;
-	struct i915_vma *vma;
+	struct eb_vma ev;
 	int err;
 	int i;
 
-	vma = i915_vma_instance(obj, eb->context->vm, NULL);
-	if (IS_ERR(vma))
-		return PTR_ERR(vma);
+	ev.vma = i915_vma_instance(obj, eb->context->vm, NULL);
+	if (IS_ERR(ev.vma))
+		return PTR_ERR(ev.vma);
 
-	err = i915_vma_pin(vma, 0, 0, PIN_USER | PIN_HIGH);
+	err = i915_vma_pin(ev.vma, 0, 0, PIN_USER | PIN_HIGH);
 	if (err)
 		return err;
 
@@ -40,17 +40,22 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
+	list_add(&ev.reloc_link, &eb->relocs);
+	err = lock_relocs(eb);
+	if (err)
+		goto unpin_vma;
+
 	err = reloc_gpu_emit(&eb->reloc_cache);
 	if (err)
 		goto unpin_vma;
 
 	/* 8-Byte aligned */
-	err = __reloc_entry_gpu(eb, vma, offsets[0] * sizeof(u32), 0);
+	err = __reloc_entry_gpu(eb, ev.vma, offsets[0] * sizeof(u32), 0);
 	if (err)
 		goto unpin_vma;
 
 	/* !8-Byte aligned */
-	err = __reloc_entry_gpu(eb, vma, offsets[1] * sizeof(u32), 1);
+	err = __reloc_entry_gpu(eb, ev.vma, offsets[1] * sizeof(u32), 1);
 	if (err)
 		goto unpin_vma;
 
@@ -62,7 +67,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	eb->reloc_cache.rq_size += i;
 
 	/* Force batch chaining */
-	err = __reloc_entry_gpu(eb, vma, offsets[2] * sizeof(u32), 2);
+	err = __reloc_entry_gpu(eb, ev.vma, offsets[2] * sizeof(u32), 2);
 	if (err)
 		goto unpin_vma;
 
@@ -97,7 +102,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 put_rq:
 	i915_request_put(rq);
 unpin_vma:
-	i915_vma_unpin(vma);
+	i915_vma_unpin(ev.vma);
 	return err;
 }
 
@@ -121,6 +126,7 @@ static int igt_gpu_reloc(void *arg)
 	}
 
 	for_each_uabi_engine(eb.engine, eb.i915) {
+		INIT_LIST_HEAD(&eb.relocs);
 		reloc_cache_init(&eb.reloc_cache, eb.i915);
 		memset(map, POISON_INUSE, 4096);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 14/28] dma-buf: Proxy fence, an unsignaled fence placeholder
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (11 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 13/28] drm/i915/gem: Add all GPU reloc awaits/signals en masse Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 15/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
                   ` (19 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Often we need to create a fence for a future event that has not yet been
associated with a fence. We can store a proxy fence, a placeholder, in
the timeline and replace it later when the real fence is known. Any
listeners that attach to the proxy fence will automatically be signaled
when the real fence completes, and any future listeners will instead be
attach directly to the real fence avoiding any indirection overhead.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Lionel Landwerlin <lionel.g.landwerlin@intel.com>
---
 drivers/dma-buf/Makefile             |  13 +-
 drivers/dma-buf/dma-fence-private.h  |  20 +
 drivers/dma-buf/dma-fence-proxy.c    | 306 +++++++++++
 drivers/dma-buf/dma-fence.c          |   4 +-
 drivers/dma-buf/selftests.h          |   1 +
 drivers/dma-buf/st-dma-fence-proxy.c | 752 +++++++++++++++++++++++++++
 include/linux/dma-fence-proxy.h      |  38 ++
 7 files changed, 1130 insertions(+), 4 deletions(-)
 create mode 100644 drivers/dma-buf/dma-fence-private.h
 create mode 100644 drivers/dma-buf/dma-fence-proxy.c
 create mode 100644 drivers/dma-buf/st-dma-fence-proxy.c
 create mode 100644 include/linux/dma-fence-proxy.h

diff --git a/drivers/dma-buf/Makefile b/drivers/dma-buf/Makefile
index 995e05f609ff..afaf6dadd9a3 100644
--- a/drivers/dma-buf/Makefile
+++ b/drivers/dma-buf/Makefile
@@ -1,6 +1,12 @@
 # SPDX-License-Identifier: GPL-2.0-only
-obj-y := dma-buf.o dma-fence.o dma-fence-array.o dma-fence-chain.o \
-	 dma-resv.o seqno-fence.o
+obj-y := \
+	dma-buf.o \
+	dma-fence.o \
+	dma-fence-array.o \
+	dma-fence-chain.o \
+	dma-fence-proxy.o \
+	dma-resv.o \
+	seqno-fence.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= dma-heap.o
 obj-$(CONFIG_DMABUF_HEAPS)	+= heaps/
 obj-$(CONFIG_SYNC_FILE)		+= sync_file.o
@@ -10,6 +16,7 @@ obj-$(CONFIG_UDMABUF)		+= udmabuf.o
 dmabuf_selftests-y := \
 	selftest.o \
 	st-dma-fence.o \
-	st-dma-fence-chain.o
+	st-dma-fence-chain.o \
+	st-dma-fence-proxy.o
 
 obj-$(CONFIG_DMABUF_SELFTESTS)	+= dmabuf_selftests.o
diff --git a/drivers/dma-buf/dma-fence-private.h b/drivers/dma-buf/dma-fence-private.h
new file mode 100644
index 000000000000..6924d28af0fa
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-private.h
@@ -0,0 +1,20 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Fence mechanism for dma-buf and to allow for asynchronous dma access
+ *
+ * Copyright (C) 2012 Canonical Ltd
+ * Copyright (C) 2012 Texas Instruments
+ *
+ * Authors:
+ * Rob Clark <robdclark@gmail.com>
+ * Maarten Lankhorst <maarten.lankhorst@canonical.com>
+ */
+
+#ifndef DMA_FENCE_PRIVATE_H
+#define DMA_FENCE_PRIAVTE_H
+
+struct dma_fence;
+
+bool __dma_fence_enable_signaling(struct dma_fence *fence);
+
+#endif /* DMA_FENCE_PRIAVTE_H */
diff --git a/drivers/dma-buf/dma-fence-proxy.c b/drivers/dma-buf/dma-fence-proxy.c
new file mode 100644
index 000000000000..42674e92b0f9
--- /dev/null
+++ b/drivers/dma-buf/dma-fence-proxy.c
@@ -0,0 +1,306 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * dma-fence-proxy: placeholder unsignaled fence
+ *
+ * Copyright (C) 2017-2019 Intel Corporation
+ */
+
+#include <linux/dma-fence.h>
+#include <linux/dma-fence-proxy.h>
+#include <linux/export.h>
+#include <linux/irq_work.h>
+#include <linux/slab.h>
+
+#include "dma-fence-private.h"
+
+struct dma_fence_proxy {
+	struct dma_fence base;
+
+	struct dma_fence *real;
+	struct dma_fence_cb cb;
+	struct irq_work work;
+
+	wait_queue_head_t wq;
+};
+
+#ifdef CONFIG_DEBUG_LOCK_ALLOC
+#define same_lockclass(A, B) (A)->dep_map.key == (B)->dep_map.key
+#else
+#define same_lockclass(A, B) 0
+#endif
+
+static const char *proxy_get_driver_name(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+
+	return real ? real->ops->get_driver_name(real) : "proxy";
+}
+
+static const char *proxy_get_timeline_name(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+
+	return real ? real->ops->get_timeline_name(real) : "unset";
+}
+
+static void proxy_irq_work(struct irq_work *work)
+{
+	struct dma_fence_proxy *p = container_of(work, typeof(*p), work);
+
+	dma_fence_signal(&p->base);
+	dma_fence_put(&p->base);
+}
+
+static void proxy_callback(struct dma_fence *real, struct dma_fence_cb *cb)
+{
+	struct dma_fence_proxy *p = container_of(cb, typeof(*p), cb);
+
+	/* Signaled before enabling signalling callbacks? */
+	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &p->base.flags)) {
+		dma_fence_put(&p->base);
+		return;
+	}
+
+	if (real->error)
+		dma_fence_set_error(&p->base, real->error);
+
+	/* Lower the height of the proxy chain -> single stack frame */
+	irq_work_queue(&p->work);
+}
+
+static bool proxy_enable_signaling(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	struct dma_fence *real = READ_ONCE(p->real);
+	bool ret = true;
+
+	if (real) {
+		spin_lock_nested(real->lock,
+				 same_lockclass(&p->wq.lock, real->lock));
+		ret = __dma_fence_enable_signaling(real);
+		if (!ret && real->error)
+			dma_fence_set_error(&p->base, real->error);
+		spin_unlock(real->lock);
+	}
+
+	return ret;
+}
+
+static void proxy_release(struct dma_fence *fence)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+
+	dma_fence_put(p->real);
+	dma_fence_free(&p->base);
+}
+
+const struct dma_fence_ops dma_fence_proxy_ops = {
+	.get_driver_name = proxy_get_driver_name,
+	.get_timeline_name = proxy_get_timeline_name,
+	.enable_signaling = proxy_enable_signaling,
+	.wait = dma_fence_default_wait,
+	.release = proxy_release,
+};
+EXPORT_SYMBOL_GPL(dma_fence_proxy_ops);
+
+/**
+ * __dma_fence_create_proxy - Create an unset dma-fence
+ * @context: context number to use for proxy fence
+ * @seqno: sequence number to use for proxy fence
+ *
+ * __dma_fence_create_proxy() creates a new dma_fence stub that is initially
+ * unsignaled and may later be replaced with a real fence. Any listeners
+ * to the proxy fence will be signaled when the target fence signals its
+ * completion.
+ */
+struct dma_fence *__dma_fence_create_proxy(u64 context, u64 seqno)
+{
+	struct dma_fence_proxy *p;
+
+	p = kzalloc(sizeof(*p), GFP_KERNEL);
+	if (!p)
+		return NULL;
+
+	init_waitqueue_head(&p->wq);
+	dma_fence_init(&p->base, &dma_fence_proxy_ops, &p->wq.lock,
+		       context, seqno);
+	init_irq_work(&p->work, proxy_irq_work);
+
+	return &p->base;
+}
+EXPORT_SYMBOL(__dma_fence_create_proxy);
+
+/**
+ * dma_fence_create_proxy - Create an unset dma-fence
+ *
+ * Wraps __dma_fence_create_proxy() to create a new proxy fence with the
+ * next available (unique) context id.
+ */
+struct dma_fence *dma_fence_create_proxy(void)
+{
+	return __dma_fence_create_proxy(dma_fence_context_alloc(1), 0);
+}
+EXPORT_SYMBOL(dma_fence_create_proxy);
+
+static void __wake_up_listeners(struct dma_fence_proxy *p)
+{
+	struct wait_queue_entry *wait, *next;
+
+	list_for_each_entry_safe(wait, next, &p->wq.head, entry) {
+		INIT_LIST_HEAD(&wait->entry);
+		wait->func(wait, TASK_NORMAL, 0, p->real);
+	}
+}
+
+static void set_proxy_callback(struct dma_fence *real, struct dma_fence_cb *cb)
+{
+	cb->func = proxy_callback;
+	list_add_tail(&cb->node, &real->cb_list);
+}
+
+static void proxy_assign(struct dma_fence *fence, struct dma_fence *real)
+{
+	struct dma_fence_proxy *p = container_of(fence, typeof(*p), base);
+	unsigned long flags;
+
+	if (WARN_ON(fence == real))
+		return;
+
+	if (WARN_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags)))
+		return;
+
+	if (WARN_ON(p->real))
+		return;
+
+	spin_lock_irqsave(&p->wq.lock, flags);
+
+	if (unlikely(!real)) {
+		dma_fence_signal_locked(&p->base);
+		goto unlock;
+	}
+
+	p->real = dma_fence_get(real);
+
+	dma_fence_get(&p->base);
+	spin_lock_nested(real->lock, same_lockclass(&p->wq.lock, real->lock));
+	if (dma_fence_is_signaled_locked(real))
+		proxy_callback(real, &p->cb);
+	else if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &p->base.flags) &&
+		 !__dma_fence_enable_signaling(real))
+		proxy_callback(real, &p->cb);
+	else
+		set_proxy_callback(real, &p->cb);
+	spin_unlock(real->lock);
+
+unlock:
+	__wake_up_listeners(p);
+	spin_unlock_irqrestore(&p->wq.lock, flags);
+}
+
+/**
+ * dma_fence_replace_proxy - Replace the proxy fence with the real target
+ * @slot: pointer to location of fence to update
+ * @fence: the new fence to store in @slot
+ *
+ * Once the real dma_fence is known, we can replace the proxy fence holder
+ * with a pointer to the real dma fence. Future listeners will attach to
+ * the real fence, avoiding any indirection overhead. Previous listeners
+ * will remain attached to the proxy fence, and be signaled in turn when
+ * the target fence completes.
+ */
+struct dma_fence *
+dma_fence_replace_proxy(struct dma_fence __rcu **slot, struct dma_fence *fence)
+{
+	struct dma_fence *old;
+
+	if (fence)
+		dma_fence_get(fence);
+
+	old = rcu_replace_pointer(*slot, fence, true);
+	if (old && dma_fence_is_proxy(old))
+		proxy_assign(old, fence);
+
+	return old;
+}
+EXPORT_SYMBOL(dma_fence_replace_proxy);
+
+/**
+ * dma_fence_proxy_set_real - Set the target of a proxy fence
+ * @fence: the proxy fence
+ * @real: the target fence.
+ *
+ */
+void dma_fence_proxy_set_real(struct dma_fence *fence, struct dma_fence *real)
+{
+	if (dma_fence_is_proxy(fence))
+		proxy_assign(fence, real);
+}
+EXPORT_SYMBOL(dma_fence_proxy_set_real);
+
+/**
+ * dma_fence_proxy_get_real - Query the target of a proxy fence
+ * @fence: the proxy fence
+ *
+ * Unpeel the proxy fence to see if it has been replaced with a real fence.
+ */
+struct dma_fence *dma_fence_proxy_get_real(struct dma_fence *fence)
+{
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+
+		if (p->real)
+			fence = p->real;
+	}
+
+	return fence;
+}
+EXPORT_SYMBOL(dma_fence_proxy_get_real);
+
+void dma_fence_add_proxy_listener(struct dma_fence *fence,
+				  struct wait_queue_entry *wait)
+{
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+		unsigned long flags;
+
+		spin_lock_irqsave(&p->wq.lock, flags);
+		if (!p->real) {
+			list_add_tail(&wait->entry, &p->wq.head);
+			wait = NULL;
+		}
+		fence = p->real;
+		spin_unlock_irqrestore(&p->wq.lock, flags);
+	}
+
+	if (wait) {
+		INIT_LIST_HEAD(&wait->entry);
+		wait->func(wait, TASK_NORMAL, 0, fence);
+	}
+}
+EXPORT_SYMBOL(dma_fence_add_proxy_listener);
+
+bool dma_fence_remove_proxy_listener(struct dma_fence *fence,
+				     struct wait_queue_entry *wait)
+{
+	bool ret = false;
+
+	if (dma_fence_is_proxy(fence)) {
+		struct dma_fence_proxy *p =
+			container_of(fence, typeof(*p), base);
+		unsigned long flags;
+
+		spin_lock_irqsave(&p->wq.lock, flags);
+		if (!list_empty(&wait->entry)) {
+			list_del_init(&wait->entry);
+			ret = true;
+		}
+		spin_unlock_irqrestore(&p->wq.lock, flags);
+	}
+
+	return ret;
+}
+EXPORT_SYMBOL(dma_fence_remove_proxy_listener);
diff --git a/drivers/dma-buf/dma-fence.c b/drivers/dma-buf/dma-fence.c
index 656e9ac2d028..329bd033059f 100644
--- a/drivers/dma-buf/dma-fence.c
+++ b/drivers/dma-buf/dma-fence.c
@@ -19,6 +19,8 @@
 #define CREATE_TRACE_POINTS
 #include <trace/events/dma_fence.h>
 
+#include "dma-fence-private.h"
+
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_emit);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_enable_signal);
 EXPORT_TRACEPOINT_SYMBOL(dma_fence_signaled);
@@ -275,7 +277,7 @@ void dma_fence_free(struct dma_fence *fence)
 }
 EXPORT_SYMBOL(dma_fence_free);
 
-static bool __dma_fence_enable_signaling(struct dma_fence *fence)
+bool __dma_fence_enable_signaling(struct dma_fence *fence)
 {
 	bool was_set;
 
diff --git a/drivers/dma-buf/selftests.h b/drivers/dma-buf/selftests.h
index bc8cea67bf1e..92d15bf50f64 100644
--- a/drivers/dma-buf/selftests.h
+++ b/drivers/dma-buf/selftests.h
@@ -12,3 +12,4 @@
 selftest(sanitycheck, __sanitycheck__) /* keep first (igt selfcheck) */
 selftest(dma_fence, dma_fence)
 selftest(dma_fence_chain, dma_fence_chain)
+selftest(dma_fence_proxy, dma_fence_proxy)
diff --git a/drivers/dma-buf/st-dma-fence-proxy.c b/drivers/dma-buf/st-dma-fence-proxy.c
new file mode 100644
index 000000000000..c3f210bc4e60
--- /dev/null
+++ b/drivers/dma-buf/st-dma-fence-proxy.c
@@ -0,0 +1,752 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include <linux/delay.h>
+#include <linux/dma-fence.h>
+#include <linux/dma-fence-proxy.h>
+#include <linux/kernel.h>
+#include <linux/sched/signal.h>
+#include <linux/slab.h>
+#include <linux/spinlock.h>
+
+#include "selftest.h"
+
+static struct kmem_cache *slab_fences;
+
+static struct mock_fence {
+	struct dma_fence base;
+	spinlock_t lock;
+} *to_mock_fence(struct dma_fence *f) {
+	return container_of(f, struct mock_fence, base);
+}
+
+static const char *mock_name(struct dma_fence *f)
+{
+	return "mock";
+}
+
+static void mock_fence_release(struct dma_fence *f)
+{
+	kmem_cache_free(slab_fences, to_mock_fence(f));
+}
+
+static const struct dma_fence_ops mock_ops = {
+	.get_driver_name = mock_name,
+	.get_timeline_name = mock_name,
+	.release = mock_fence_release,
+};
+
+static struct dma_fence *mock_fence(void)
+{
+	struct mock_fence *f;
+
+	f = kmem_cache_alloc(slab_fences, GFP_KERNEL);
+	if (!f)
+		return NULL;
+
+	spin_lock_init(&f->lock);
+	dma_fence_init(&f->base, &mock_ops, &f->lock, 0, 0);
+
+	return &f->base;
+}
+
+static int sanitycheck(void *arg)
+{
+	struct dma_fence *f;
+
+	f = dma_fence_create_proxy();
+	if (!f)
+		return -ENOMEM;
+
+	dma_fence_signal(f);
+	dma_fence_put(f);
+
+	return 0;
+}
+
+struct fences {
+	struct dma_fence *real;
+	struct dma_fence *proxy;
+	struct dma_fence __rcu *slot;
+};
+
+static int create_fences(struct fences *f, bool attach)
+{
+	f->proxy = dma_fence_create_proxy();
+	if (!f->proxy)
+		return -ENOMEM;
+
+	RCU_INIT_POINTER(f->slot, f->proxy);
+
+	f->real = mock_fence();
+	if (!f->real) {
+		dma_fence_put(f->proxy);
+		return -ENOMEM;
+	}
+
+	if (attach)
+		dma_fence_replace_proxy(&f->slot, f->real);
+
+	return 0;
+}
+
+static void free_fences(struct fences *f)
+{
+	dma_fence_put(dma_fence_replace_proxy(&f->slot, NULL));
+
+	dma_fence_signal(f->real);
+	dma_fence_put(f->real);
+
+	dma_fence_signal(f->proxy);
+	dma_fence_put(f->proxy);
+}
+
+static int wrap_target(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_proxy_get_real(f.proxy) != f.proxy) {
+		pr_err("Unwrapped proxy fenced reported a target fence!\n");
+		goto err_free;
+	}
+
+	dma_fence_proxy_set_real(f.proxy, f.real);
+	rcu_assign_pointer(f.slot, dma_fence_get(f.real)); /* free_fences() */
+
+	if (dma_fence_proxy_get_real(f.proxy) != f.real) {
+		pr_err("Wrapped proxy fenced did not report the target fence!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_proxy(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_proxy_get_real(f.proxy) != f.real) {
+		pr_err("Wrapped proxy fenced did not report the target fence!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_signaling(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence unexpectedly signaled on creation\n");
+		goto err_free;
+	}
+
+	if (dma_fence_signal(f.real)) {
+		pr_err("Fence reported being already signaled\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence not reporting signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_signaling_recurse(void *arg)
+{
+	struct fences f;
+	struct dma_fence *chain;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	chain = dma_fence_create_proxy();
+	if (!chain) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, chain);
+	dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real));
+	dma_fence_put(chain);
+
+	/* f.real <- chain <- f.proxy */
+
+	if (dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence unexpectedly signaled on creation\n");
+		goto err_free;
+	}
+
+	if (dma_fence_signal(f.real)) {
+		pr_err("Fence reported being already signaled\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_is_signaled(f.proxy)) {
+		pr_err("Fence not reporting signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+struct simple_cb {
+	struct dma_fence_cb cb;
+	bool seen;
+};
+
+static void simple_callback(struct dma_fence *f, struct dma_fence_cb *cb)
+{
+	/* Ensure the callback marker is visible, no excuses for missing it! */
+	smp_store_mb(container_of(cb, struct simple_cb, cb)->seen, true);
+}
+
+static int wrap_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_add_callback_recurse(void *arg)
+{
+	struct simple_cb cb = {};
+	struct dma_fence *chain;
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	chain = dma_fence_create_proxy();
+	if (!chain) {
+		err = -ENOMEM;
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, chain);
+	dma_fence_put(dma_fence_replace_proxy(&f.slot, f.real));
+	dma_fence_put(chain);
+
+	/* f.real <- chain <- f.proxy */
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_late_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	dma_fence_signal(f.real);
+
+	if (!dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Added callback, but fence was already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (cb.seen) {
+		pr_err("Callback called after failed attachment!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback_late(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_signal(f.real);
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_early_add_callback_early(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_rm_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	if (!dma_fence_remove_callback(f.proxy, &cb.cb)) {
+		pr_err("Failed to remove callback!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (cb.seen) {
+		pr_err("Callback still signaled after removal!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_late_rm_callback(void *arg)
+{
+	struct simple_cb cb = {};
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_add_callback(f.proxy, &cb.cb, simple_callback)) {
+		pr_err("Failed to add callback, fence already signaled!\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!cb.seen) {
+		pr_err("Callback failed!\n");
+		goto err_free;
+	}
+
+	if (dma_fence_remove_callback(f.proxy, &cb.cb)) {
+		pr_err("Callback removal succeed after being executed!\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_status(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_get_status(f.proxy)) {
+		pr_err("Fence unexpectedly has signaled status on creation\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (!dma_fence_get_status(f.proxy)) {
+		pr_err("Fence not reporting signaled status\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_error(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	dma_fence_set_error(f.real, -EIO);
+
+	if (dma_fence_get_status(f.proxy)) {
+		pr_err("Fence unexpectedly has error status before signal\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+	if (dma_fence_get_status(f.proxy) != -EIO) {
+		pr_err("Fence not reporting error status, got %d\n",
+		       dma_fence_get_status(f.proxy));
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_wait(void *arg)
+{
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, true))
+		return -ENOMEM;
+
+	if (dma_fence_wait_timeout(f.proxy, false, 0) != 0) {
+		pr_err("Wait reported complete before being signaled\n");
+		goto err_free;
+	}
+
+	dma_fence_signal(f.real);
+
+	if (dma_fence_wait_timeout(f.proxy, false, 0) == 0) {
+		pr_err("Wait reported incomplete after being signaled\n");
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+struct wait_timer {
+	struct timer_list timer;
+	struct fences f;
+};
+
+static void wait_timer(struct timer_list *timer)
+{
+	struct wait_timer *wt = from_timer(wt, timer, timer);
+
+	dma_fence_signal(wt->f.real);
+}
+
+static int wrap_wait_timeout(void *arg)
+{
+	struct wait_timer wt;
+	int err = -EINVAL;
+
+	if (create_fences(&wt.f, true))
+		return -ENOMEM;
+
+	timer_setup_on_stack(&wt.timer, wait_timer, 0);
+
+	if (dma_fence_wait_timeout(wt.f.proxy, false, 1) != 0) {
+		pr_err("Wait reported complete before being signaled\n");
+		goto err_free;
+	}
+
+	mod_timer(&wt.timer, jiffies + 1);
+
+	if (dma_fence_wait_timeout(wt.f.proxy, false, 2) != 0) {
+		if (timer_pending(&wt.timer)) {
+			pr_notice("Timer did not fire within the jiffie!\n");
+			err = 0; /* not our fault! */
+		} else {
+			pr_err("Wait reported incomplete after timeout\n");
+		}
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	del_timer_sync(&wt.timer);
+	destroy_timer_on_stack(&wt.timer);
+	dma_fence_signal(wt.f.real);
+	free_fences(&wt.f);
+	return err;
+}
+
+struct proxy_wait {
+	struct wait_queue_entry base;
+	struct dma_fence *fence;
+	bool seen;
+};
+
+static int proxy_wait_cb(struct wait_queue_entry *entry,
+			 unsigned int mode, int flags, void *key)
+{
+	struct proxy_wait *p = container_of(entry, typeof(*p), base);
+
+	p->fence = key;
+	p->seen = true;
+
+	return 0;
+}
+
+static int wrap_listen_early(void *arg)
+{
+	struct proxy_wait wait = { .base.func = proxy_wait_cb };
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_replace_proxy(&f.slot, f.real);
+	dma_fence_add_proxy_listener(f.proxy, &wait.base);
+
+	if (!wait.seen) {
+		pr_err("Proxy listener was not called after replace!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	if (wait.fence != f.real) {
+		pr_err("Proxy listener was not passed the real fence!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_listen_late(void *arg)
+{
+	struct proxy_wait wait = { .base.func = proxy_wait_cb };
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_add_proxy_listener(f.proxy, &wait.base);
+	dma_fence_replace_proxy(&f.slot, f.real);
+
+	if (!wait.seen) {
+		pr_err("Proxy listener was not called on replace!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	if (wait.fence != f.real) {
+		pr_err("Proxy listener was not passed the real fence!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+static int wrap_listen_cancel(void *arg)
+{
+	struct proxy_wait wait = { .base.func = proxy_wait_cb };
+	struct fences f;
+	int err = -EINVAL;
+
+	if (create_fences(&f, false))
+		return -ENOMEM;
+
+	dma_fence_add_proxy_listener(f.proxy, &wait.base);
+	if (!dma_fence_remove_proxy_listener(f.proxy, &wait.base)) {
+		pr_err("Cancelling listener, already detached?\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+	dma_fence_replace_proxy(&f.slot, f.real);
+
+	if (wait.seen) {
+		pr_err("Proxy listener was called after being removed!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	if (dma_fence_remove_proxy_listener(f.proxy, &wait.base)) {
+		pr_err("Double listener cancellation!\n");
+		err = -EINVAL;
+		goto err_free;
+	}
+
+	err = 0;
+err_free:
+	dma_fence_signal(f.real);
+	free_fences(&f);
+	return err;
+}
+
+int dma_fence_proxy(void)
+{
+	static const struct subtest tests[] = {
+		SUBTEST(sanitycheck),
+		SUBTEST(wrap_target),
+		SUBTEST(wrap_proxy),
+		SUBTEST(wrap_signaling),
+		SUBTEST(wrap_signaling_recurse),
+		SUBTEST(wrap_add_callback),
+		SUBTEST(wrap_add_callback_recurse),
+		SUBTEST(wrap_late_add_callback),
+		SUBTEST(wrap_early_add_callback),
+		SUBTEST(wrap_early_add_callback_late),
+		SUBTEST(wrap_early_add_callback_early),
+		SUBTEST(wrap_rm_callback),
+		SUBTEST(wrap_late_rm_callback),
+		SUBTEST(wrap_status),
+		SUBTEST(wrap_error),
+		SUBTEST(wrap_wait),
+		SUBTEST(wrap_wait_timeout),
+		SUBTEST(wrap_listen_early),
+		SUBTEST(wrap_listen_late),
+		SUBTEST(wrap_listen_cancel),
+	};
+	int ret;
+
+	slab_fences = KMEM_CACHE(mock_fence,
+				 SLAB_TYPESAFE_BY_RCU |
+				 SLAB_HWCACHE_ALIGN);
+	if (!slab_fences)
+		return -ENOMEM;
+
+	ret = subtests(tests, NULL);
+
+	kmem_cache_destroy(slab_fences);
+
+	return ret;
+}
diff --git a/include/linux/dma-fence-proxy.h b/include/linux/dma-fence-proxy.h
new file mode 100644
index 000000000000..6a986b5bb009
--- /dev/null
+++ b/include/linux/dma-fence-proxy.h
@@ -0,0 +1,38 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * dma-fence-proxy: allows waiting upon unset and future fences
+ *
+ * Copyright (C) 2017 Intel Corporation
+ */
+
+#ifndef __LINUX_DMA_FENCE_PROXY_H
+#define __LINUX_DMA_FENCE_PROXY_H
+
+#include <linux/kernel.h>
+#include <linux/dma-fence.h>
+
+struct wait_queue_entry;
+
+extern const struct dma_fence_ops dma_fence_proxy_ops;
+
+struct dma_fence *__dma_fence_create_proxy(u64 context, u64 seqno);
+struct dma_fence *dma_fence_create_proxy(void);
+
+static inline bool dma_fence_is_proxy(struct dma_fence *fence)
+{
+	return fence->ops == &dma_fence_proxy_ops;
+}
+
+void dma_fence_proxy_set_real(struct dma_fence *fence, struct dma_fence *real);
+struct dma_fence *dma_fence_proxy_get_real(struct dma_fence *fence);
+
+struct dma_fence *
+dma_fence_replace_proxy(struct dma_fence __rcu **slot,
+			struct dma_fence *fence);
+
+void dma_fence_add_proxy_listener(struct dma_fence *fence,
+				  struct wait_queue_entry *wait);
+bool dma_fence_remove_proxy_listener(struct dma_fence *fence,
+				     struct wait_queue_entry *wait);
+
+#endif /* __LINUX_DMA_FENCE_PROXY_H */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 15/28] drm/i915: Lift waiter/signaler iterators
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (12 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 14/28] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 16/28] drm/i915: Unpeel awaits on a proxy fence Chris Wilson
                   ` (18 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Lift the list iteration defines for traversing the signaler/waiter lists
into i915_scheduler.h for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 10 ----------
 drivers/gpu/drm/i915/i915_scheduler_types.h | 10 ++++++++++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index cbcbe694f931..5f5ac05ccbe4 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1872,16 +1872,6 @@ static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
 	intel_engine_transfer_stale_breadcrumbs(ve->siblings[0], &ve->context);
 }
 
-#define for_each_waiter(p__, rq__) \
-	list_for_each_entry_lockless(p__, \
-				     &(rq__)->sched.waiters_list, \
-				     wait_link)
-
-#define for_each_signaler(p__, rq__) \
-	list_for_each_entry_rcu(p__, \
-				&(rq__)->sched.signalers_list, \
-				signal_link)
-
 static void defer_request(struct i915_request *rq, struct list_head * const pl)
 {
 	LIST_HEAD(list);
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index f72e6c397b08..343ed44d5ed4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -81,4 +81,14 @@ struct i915_dependency {
 #define I915_DEPENDENCY_WEAK		BIT(2)
 };
 
+#define for_each_waiter(p__, rq__) \
+	list_for_each_entry_lockless(p__, \
+				     &(rq__)->sched.waiters_list, \
+				     wait_link)
+
+#define for_each_signaler(p__, rq__) \
+	list_for_each_entry_rcu(p__, \
+				&(rq__)->sched.signalers_list, \
+				signal_link)
+
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 16/28] drm/i915: Unpeel awaits on a proxy fence
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (13 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 15/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 17/28] drm/i915/gem: Make relocations atomic within execbuf Chris Wilson
                   ` (17 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If the real target for a proxy fence is known at the time we are
attaching our awaits, use the real target in preference to hooking up to
the proxy. If use the real target instead, we can optimize the awaits,
e.g. if it along the same engine, we can order the submission and avoid
the wait-for-completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   | 155 ++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c |  41 +++++++
 drivers/gpu/drm/i915/i915_scheduler.h |   3 +
 3 files changed, 199 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 3bb7320249ae..f04f91b4d879 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -24,6 +24,7 @@
 
 #include <linux/dma-fence-array.h>
 #include <linux/dma-fence-chain.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/irq_work.h>
 #include <linux/prefetch.h>
 #include <linux/sched.h>
@@ -461,6 +462,7 @@ static bool fatal_error(int error)
 	case 0: /* not an error! */
 	case -EAGAIN: /* innocent victim of a GT reset (__i915_request_reset) */
 	case -ETIMEDOUT: /* waiting for Godot (timer_i915_sw_fence_wake) */
+	case -EDEADLK: /* cyclic fence lockup (await_proxy)  */
 		return false;
 	default:
 		return true;
@@ -1241,6 +1243,136 @@ i915_request_await_external(struct i915_request *rq, struct dma_fence *fence)
 	return err;
 }
 
+struct await_proxy {
+	struct wait_queue_entry base;
+	struct i915_request *request;
+	struct dma_fence *fence;
+	struct timer_list timer;
+	struct work_struct work;
+	int (*attach)(struct await_proxy *ap);
+	void *data;
+};
+
+static void await_proxy_work(struct work_struct *work)
+{
+	struct await_proxy *ap = container_of(work, typeof(*ap), work);
+	struct i915_request *rq = ap->request;
+
+	del_timer_sync(&ap->timer);
+
+	if (ap->fence) {
+		int err = 0;
+
+		/*
+		 * If the fence is external, we impose a 10s timeout.
+		 * However, if the fence is internal, we skip a timeout in
+		 * the belief that all fences are in-order (DAG, no cycles)
+		 * and we can enforce forward progress by reset the GPU if
+		 * necessary. A future fence, provided userspace, can trivially
+		 * generate a cycle in the dependency graph, and so cause
+		 * that entire cycle to become deadlocked and for no forward
+		 * progress to either be made, and the driver being kept
+		 * eternally awake.
+		 */
+		if (dma_fence_is_i915(ap->fence) &&
+		    !i915_sched_node_verify_dag(&rq->sched,
+						&to_request(ap->fence)->sched))
+			err = -EDEADLK;
+
+		if (!err) {
+			mutex_lock(&rq->context->timeline->mutex);
+			err = ap->attach(ap);
+			mutex_unlock(&rq->context->timeline->mutex);
+		}
+
+		/* Don't flag an error for co-dependent scheduling */
+		if (err == -EDEADLK) {
+			struct i915_sched_node *waiter =
+				&to_request(ap->fence)->sched;
+			struct i915_dependency *p;
+
+			for_each_waiter(p, rq) {
+				if (p->waiter == waiter &&
+				    p->flags & I915_DEPENDENCY_WEAK) {
+					err = 0;
+					break;
+				}
+			}
+		}
+
+		if (err < 0)
+			i915_sw_fence_set_error_once(&rq->submit, err);
+	}
+
+	i915_sw_fence_complete(&rq->submit);
+
+	dma_fence_put(ap->fence);
+	kfree(ap);
+}
+
+static int
+await_proxy_wake(struct wait_queue_entry *entry,
+		 unsigned int mode,
+		 int flags,
+		 void *fence)
+{
+	struct await_proxy *ap = container_of(entry, typeof(*ap), base);
+
+	ap->fence = dma_fence_get(fence);
+	schedule_work(&ap->work);
+
+	return 0;
+}
+
+static void
+await_proxy_timer(struct timer_list *t)
+{
+	struct await_proxy *ap = container_of(t, typeof(*ap), timer);
+
+	if (dma_fence_remove_proxy_listener(ap->base.private, &ap->base)) {
+		struct i915_request *rq = ap->request;
+
+		pr_notice("Asynchronous wait on unset proxy fence by %s:%s:%llx timed out\n",
+			  rq->fence.ops->get_driver_name(&rq->fence),
+			  rq->fence.ops->get_timeline_name(&rq->fence),
+			  rq->fence.seqno);
+		i915_sw_fence_set_error_once(&rq->submit, -ETIMEDOUT);
+
+		schedule_work(&ap->work);
+	}
+}
+
+static int
+__i915_request_await_proxy(struct i915_request *rq,
+			   struct dma_fence *fence,
+			   unsigned long timeout,
+			   int (*attach)(struct await_proxy *ap),
+			   void *data)
+{
+	struct await_proxy *ap;
+
+	ap = kzalloc(sizeof(*ap), I915_FENCE_GFP);
+	if (!ap)
+		return -ENOMEM;
+
+	i915_sw_fence_await(&rq->submit);
+	mark_external(rq);
+
+	ap->base.private = fence;
+	ap->base.func = await_proxy_wake;
+	ap->request = rq;
+	INIT_WORK(&ap->work, await_proxy_work);
+	ap->attach = attach;
+	ap->data = data;
+
+	timer_setup(&ap->timer, await_proxy_timer, 0);
+	if (timeout)
+		mod_timer(&ap->timer, round_jiffies_up(jiffies + timeout));
+
+	dma_fence_add_proxy_listener(fence, &ap->base);
+	return 0;
+}
+
 int
 i915_request_await_execution(struct i915_request *rq,
 			     struct dma_fence *fence,
@@ -1339,6 +1471,24 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 	return 0;
 }
 
+static int await_proxy(struct await_proxy *ap)
+{
+	return i915_request_await_dma_fence(ap->request, ap->fence);
+}
+
+static int
+i915_request_await_proxy(struct i915_request *rq, struct dma_fence *fence)
+{
+	/*
+	 * Wait until we know the real fence so that can optimise the
+	 * inter-fence synchronisation.
+	 */
+	return __i915_request_await_proxy(rq, fence,
+					  i915_fence_context_timeout(rq->engine->i915,
+								     fence->context),
+					  await_proxy, NULL);
+}
+
 int
 i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 {
@@ -1346,6 +1496,9 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 	unsigned int nchild = 1;
 	int ret;
 
+	/* Unpeel the proxy fence if the real target is already known */
+	fence = dma_fence_proxy_get_real(fence);
+
 	/*
 	 * Note that if the fence-array was created in signal-on-any mode,
 	 * we should *not* decompose it into its individual fences. However,
@@ -1385,6 +1538,8 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 
 		if (dma_fence_is_i915(fence))
 			ret = i915_request_await_request(rq, to_request(fence));
+		else if (dma_fence_is_proxy(fence))
+			ret = i915_request_await_proxy(rq, fence);
 		else
 			ret = i915_request_await_external(rq, fence);
 		if (ret < 0)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index cbb880b10c65..250832768279 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -469,6 +469,47 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 	return 0;
 }
 
+bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
+				struct i915_sched_node *signaler)
+{
+	struct i915_dependency *dep, *p;
+	struct i915_dependency stack;
+	bool result = false;
+	LIST_HEAD(dfs);
+
+	if (list_empty(&waiter->waiters_list))
+		return true;
+
+	spin_lock_irq(&schedule_lock);
+
+	stack.signaler = signaler;
+	list_add(&stack.dfs_link, &dfs);
+
+	list_for_each_entry(dep, &dfs, dfs_link) {
+		struct i915_sched_node *node = dep->signaler;
+
+		if (node_signaled(node))
+			continue;
+
+		list_for_each_entry(p, &node->signalers_list, signal_link) {
+			if (p->signaler == waiter)
+				goto out;
+
+			if (list_empty(&p->dfs_link))
+				list_add_tail(&p->dfs_link, &dfs);
+		}
+	}
+
+	result = true;
+out:
+	list_for_each_entry_safe(dep, p, &dfs, dfs_link)
+		INIT_LIST_HEAD(&dep->dfs_link);
+
+	spin_unlock_irq(&schedule_lock);
+
+	return result;
+}
+
 void i915_sched_node_fini(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6f0bf00fc569..13432add8929 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -28,6 +28,9 @@
 void i915_sched_node_init(struct i915_sched_node *node);
 void i915_sched_node_reinit(struct i915_sched_node *node);
 
+bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
+				struct i915_sched_node *signal);
+
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      struct i915_sched_node *signal,
 				      struct i915_dependency *dep,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 17/28] drm/i915/gem: Make relocations atomic within execbuf
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (14 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 16/28] drm/i915: Unpeel awaits on a proxy fence Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 18/28] drm/i915: Strip out internal priorities Chris Wilson
                   ` (16 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Although we may chide userspace for reusing the same batches
concurrently from multiple threads, at the same time we must be very
careful to only execute the batch and its relocations as supplied by the
user. If we are not careful, we may allow another thread to rewrite the
current batch with its own relocations. We must order the relocations
and their batch such that they are an atomic pair on the GPU, and that
the ioctl itself appears atomic to userspace. The order of execution may
be undetermined, but it will not be subverted.

We could do this by moving the relocations into the main request, if it
were not for the situation where we need a second engine to perform the
relocations for us. Instead, we use the dependency tracking to only
publish the write fence on the main request and not on the relocation
request, so that concurrent updates are queued after the batch has
consumed its relocations.

Testcase: igt/gem_exec_reloc/basic-concurrent
Fixes: ef398881d27d ("drm/i915/gem: Limit struct_mutex to eb_reserve")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 92 ++++++++++++++-----
 .../i915/gem/selftests/i915_gem_execbuffer.c  | 11 ++-
 2 files changed, 73 insertions(+), 30 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
index 8f3c1cf5af31..4a50371fe6e5 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
@@ -5,6 +5,7 @@
  */
 
 #include <linux/intel-iommu.h>
+#include <linux/dma-fence-proxy.h>
 #include <linux/dma-resv.h>
 #include <linux/sync_file.h>
 #include <linux/uaccess.h>
@@ -259,6 +260,8 @@ struct i915_execbuffer {
 		bool has_fence : 1;
 		bool needs_unfenced : 1;
 
+		struct dma_fence *fence;
+
 		struct i915_request *rq;
 		struct i915_vma *rq_vma;
 		u32 *rq_cmd;
@@ -555,16 +558,6 @@ eb_add_vma(struct i915_execbuffer *eb,
 	ev->exec = entry;
 	ev->flags = entry->flags;
 
-	if (eb->lut_size > 0) {
-		ev->handle = entry->handle;
-		hlist_add_head(&ev->node,
-			       &eb->buckets[hash_32(entry->handle,
-						    eb->lut_size)]);
-	}
-
-	if (entry->relocation_count)
-		list_add_tail(&ev->reloc_link, &eb->relocs);
-
 	/*
 	 * SNA is doing fancy tricks with compressing batch buffers, which leads
 	 * to negative relocation deltas. Usually that works out ok since the
@@ -581,9 +574,21 @@ eb_add_vma(struct i915_execbuffer *eb,
 		if (eb->reloc_cache.has_fence)
 			ev->flags |= EXEC_OBJECT_NEEDS_FENCE;
 
+		INIT_LIST_HEAD(&ev->reloc_link);
+
 		eb->batch = ev;
 	}
 
+	if (entry->relocation_count)
+		list_add_tail(&ev->reloc_link, &eb->relocs);
+
+	if (eb->lut_size > 0) {
+		ev->handle = entry->handle;
+		hlist_add_head(&ev->node,
+			       &eb->buckets[hash_32(entry->handle,
+						    eb->lut_size)]);
+	}
+
 	if (eb_pin_vma(eb, entry, ev)) {
 		if (entry->offset != vma->node.start) {
 			entry->offset = vma->node.start | UPDATE;
@@ -923,6 +928,7 @@ static void reloc_cache_init(struct reloc_cache *cache,
 	cache->has_fence = cache->gen < 4;
 	cache->needs_unfenced = INTEL_INFO(i915)->unfenced_needs_alignment;
 	cache->node.flags = 0;
+	cache->fence = NULL;
 }
 
 #define RELOC_TAIL 4
@@ -1033,6 +1039,7 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 	}
 
 	intel_gt_chipset_flush(rq->engine->gt);
+	i915_request_get(rq);
 	i915_request_add(rq);
 }
 
@@ -1338,16 +1345,6 @@ eb_reloc_entry(struct i915_execbuffer *eb,
 	if (offset == reloc->presumed_offset)
 		return 0;
 
-	/*
-	 * If we write into the object, we need to force the synchronisation
-	 * barrier, either with an asynchronous clflush or if we executed the
-	 * patching using the GPU (though that should be serialised by the
-	 * timeline). To be completely sure, and since we are required to
-	 * do relocations we are already stalling, disable the user's opt
-	 * out of our synchronisation.
-	 */
-	ev->flags &= ~EXEC_OBJECT_ASYNC;
-
 	err = __reloc_entry_gpu(eb, ev->vma, reloc->offset,
 				relocation_target(reloc, offset));
 	if (err)
@@ -1449,6 +1446,11 @@ static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev)
 
 	obj->write_domain = I915_GEM_DOMAIN_RENDER;
 	obj->read_domains = I915_GEM_DOMAIN_RENDER;
+	ev->flags |= EXEC_OBJECT_ASYNC;
+
+	err = dma_resv_reserve_shared(vma->resv, 1);
+	if (err)
+		return err;
 
 	err = i915_request_await_object(rq, obj, true);
 	if (err)
@@ -1459,6 +1461,7 @@ static int reloc_move_to_gpu(struct reloc_cache *cache, struct eb_vma *ev)
 		return err;
 
 	dma_resv_add_excl_fence(vma->resv, &rq->fence);
+	dma_resv_add_shared_fence(vma->resv, cache->fence);
 
 	return 0;
 }
@@ -1527,14 +1530,28 @@ static int reloc_gpu_alloc(struct i915_execbuffer *eb)
 	return __reloc_gpu_alloc(eb, engine);
 }
 
+static void free_reloc_fence(struct i915_execbuffer *eb)
+{
+	struct dma_fence *f = fetch_and_zero(&eb->reloc_cache.fence);
+
+	dma_fence_signal(f);
+	dma_fence_put(f);
+}
+
 static int reloc_gpu(struct i915_execbuffer *eb)
 {
 	struct eb_vma *ev;
 	int err;
 
+	eb->reloc_cache.fence = __dma_fence_create_proxy(0, 0);
+	if (!eb->reloc_cache.fence)
+		return -ENOMEM;
+
 	err = reloc_gpu_alloc(eb);
-	if (err)
+	if (err) {
+		free_reloc_fence(eb);
 		return err;
+	}
 	GEM_BUG_ON(!eb->reloc_cache.rq);
 
 	err = lock_relocs(eb);
@@ -1593,6 +1610,15 @@ static int eb_relocate(struct i915_execbuffer *eb)
 	return 0;
 }
 
+static void eb_reloc_signal(struct i915_execbuffer *eb, struct i915_request *rq)
+{
+	dma_fence_proxy_set_real(eb->reloc_cache.fence, &rq->fence);
+	i915_request_put(eb->reloc_cache.rq);
+
+	dma_fence_put(eb->reloc_cache.fence);
+	eb->reloc_cache.fence = NULL;
+}
+
 static int eb_move_to_gpu(struct i915_execbuffer *eb)
 {
 	const unsigned int count = eb->buffer_count;
@@ -1873,10 +1899,15 @@ static int eb_parse_pipeline(struct i915_execbuffer *eb,
 	if (err)
 		goto err_commit_unlock;
 
-	/* Wait for all writes (and relocs) into the batch to complete */
-	err = i915_sw_fence_await_reservation(&pw->base.chain,
-					      pw->batch->resv, NULL, false,
-					      0, I915_FENCE_GFP);
+	/* Wait for all writes (or relocs) into the batch to complete */
+	if (!eb->reloc_cache.fence || list_empty(&eb->batch->reloc_link))
+		err = i915_sw_fence_await_reservation(&pw->base.chain,
+						      pw->batch->resv, NULL,
+						      false, 0, I915_FENCE_GFP);
+	else
+		err = i915_sw_fence_await_dma_fence(&pw->base.chain,
+						    &eb->reloc_cache.rq->fence,
+						    0, I915_FENCE_GFP);
 	if (err < 0)
 		goto err_commit_unlock;
 
@@ -2004,6 +2035,15 @@ static int eb_submit(struct i915_execbuffer *eb, struct i915_vma *batch)
 {
 	int err;
 
+	if (eb->reloc_cache.fence) {
+		err = i915_request_await_dma_fence(eb->request,
+						   &eb->reloc_cache.rq->fence);
+		if (err)
+			return err;
+
+		eb_reloc_signal(eb, eb->request);
+	}
+
 	err = eb_move_to_gpu(eb);
 	if (err)
 		return err;
@@ -2663,6 +2703,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (batch->private)
 		intel_gt_buffer_pool_put(batch->private);
 err_vma:
+	if (eb.reloc_cache.fence)
+		eb_reloc_signal(&eb, eb.reloc_cache.rq);
 	if (eb.trampoline)
 		i915_vma_unpin(eb.trampoline);
 	eb_unpin_engine(&eb);
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
index 4f10b51f9a7e..62bba179b455 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_execbuffer.c
@@ -23,7 +23,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	const u64 mask =
 		GENMASK_ULL(eb->reloc_cache.use_64bit_reloc ? 63 : 31, 0);
 	const u32 *map = page_mask_bits(obj->mm.mapping);
-	struct i915_request *rq;
 	struct eb_vma ev;
 	int err;
 	int i;
@@ -40,6 +39,9 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
+	/* Single stage pipeline in the selftest */
+	eb->reloc_cache.fence = &eb->reloc_cache.rq->fence;
+
 	list_add(&ev.reloc_link, &eb->relocs);
 	err = lock_relocs(eb);
 	if (err)
@@ -71,8 +73,6 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 	if (err)
 		goto unpin_vma;
 
-	GEM_BUG_ON(!eb->reloc_cache.rq);
-	rq = i915_request_get(eb->reloc_cache.rq);
 	reloc_gpu_flush(&eb->reloc_cache);
 
 	err = i915_gem_object_wait(obj, I915_WAIT_INTERRUPTIBLE, HZ / 2);
@@ -81,7 +81,7 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		goto put_rq;
 	}
 
-	if (!i915_request_completed(rq)) {
+	if (!i915_request_completed(eb->reloc_cache.rq)) {
 		pr_err("%s: did not wait for relocations!\n", eb->engine->name);
 		err = -EINVAL;
 		goto put_rq;
@@ -100,7 +100,8 @@ static int __igt_gpu_reloc(struct i915_execbuffer *eb,
 		igt_hexdump(map, 4096);
 
 put_rq:
-	i915_request_put(rq);
+	i915_request_put(eb->reloc_cache.rq);
+	eb->reloc_cache.rq = NULL;
 unpin_vma:
 	i915_vma_unpin(ev.vma);
 	return err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 18/28] drm/i915: Strip out internal priorities
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (15 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 17/28] drm/i915/gem: Make relocations atomic within execbuf Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 19/28] drm/i915: Remove I915_USER_PRIORITY_SHIFT Chris Wilson
                   ` (15 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since we are not using any internal priority levels, and in the next few
patches will introduce a new index for which the optimisation is not so
lear cut, discard the small table within the priolist.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 22 ++------
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  2 -
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |  6 +--
 drivers/gpu/drm/i915/i915_priolist_types.h    |  8 +--
 drivers/gpu/drm/i915/i915_scheduler.c         | 51 +++----------------
 drivers/gpu/drm/i915/i915_scheduler.h         | 18 ++-----
 7 files changed, 21 insertions(+), 88 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index f67ad937eefb..eecf666c772d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -97,7 +97,7 @@ static void heartbeat(struct work_struct *wrk)
 			 * low latency and no jitter] the chance to naturally
 			 * complete before being preempted.
 			 */
-			attr.priority = I915_PRIORITY_MASK;
+			attr.priority = 0;
 			if (rq->sched.attr.priority >= attr.priority)
 				attr.priority |= I915_USER_PRIORITY(I915_PRIORITY_HEARTBEAT);
 			if (rq->sched.attr.priority >= attr.priority)
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 5f5ac05ccbe4..0ca3604ab846 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -434,22 +434,13 @@ static int effective_prio(const struct i915_request *rq)
 
 static int queue_prio(const struct intel_engine_execlists *execlists)
 {
-	struct i915_priolist *p;
 	struct rb_node *rb;
 
 	rb = rb_first_cached(&execlists->queue);
 	if (!rb)
 		return INT_MIN;
 
-	/*
-	 * As the priolist[] are inverted, with the highest priority in [0],
-	 * we have to flip the index value to become priority.
-	 */
-	p = to_priolist(rb);
-	if (!I915_USER_PRIORITY_SHIFT)
-		return p->priority;
-
-	return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used);
+	return to_priolist(rb)->priority;
 }
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
@@ -2324,9 +2315,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
-		int i;
 
-		priolist_for_each_request_consume(rq, rn, p, i) {
+		priolist_for_each_request_consume(rq, rn, p) {
 			bool merge = true;
 
 			/*
@@ -4323,9 +4313,8 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 	/* Flush the queued requests to the timeline list (for retiring). */
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
-		int i;
 
-		priolist_for_each_request_consume(rq, rn, p, i) {
+		priolist_for_each_request_consume(rq, rn, p) {
 			mark_eio(rq);
 			__i915_request_submit(rq);
 		}
@@ -5337,7 +5326,7 @@ static int __execlists_context_alloc(struct intel_context *ce,
 
 static struct list_head *virtual_queue(struct virtual_engine *ve)
 {
-	return &ve->base.execlists.default_priolist.requests[0];
+	return &ve->base.execlists.default_priolist.requests;
 }
 
 static void virtual_context_destroy(struct kref *kref)
@@ -5896,9 +5885,8 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 	count = 0;
 	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
 		struct i915_priolist *p = rb_entry(rb, typeof(*p), node);
-		int i;
 
-		priolist_for_each_request(rq, p, i) {
+		priolist_for_each_request(rq, p) {
 			if (count++ < max - 1)
 				show_request(m, rq, "\t\tQ ");
 			else
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index a8bcea8aa1b4..a0248c47d7bd 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -964,7 +964,6 @@ create_rewinder(struct intel_context *ce,
 
 	intel_ring_advance(rq, cs);
 
-	rq->sched.attr.priority = I915_PRIORITY_MASK;
 	err = 0;
 err:
 	i915_request_get(rq);
@@ -5059,7 +5058,6 @@ create_timestamp(struct intel_context *ce, void *slot, int idx)
 
 	intel_ring_advance(rq, cs);
 
-	rq->sched.attr.priority = I915_PRIORITY_MASK;
 	err = 0;
 err:
 	i915_request_get(rq);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 94eb63f309ce..0c42e8b0c211 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -312,9 +312,8 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
-		int i;
 
-		priolist_for_each_request_consume(rq, rn, p, i) {
+		priolist_for_each_request_consume(rq, rn, p) {
 			if (last && rq->context != last->context) {
 				if (port == last_port)
 					goto done;
@@ -463,9 +462,8 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 	/* Flush the queued requests to the timeline list (for retiring). */
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
-		int i;
 
-		priolist_for_each_request_consume(rq, rn, p, i) {
+		priolist_for_each_request_consume(rq, rn, p) {
 			list_del_init(&rq->sched.link);
 			__i915_request_submit(rq);
 			dma_fence_set_error(&rq->fence, -EIO);
diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index 8aa7866ec6b6..9a7657bb002e 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -27,11 +27,8 @@ enum {
 #define I915_USER_PRIORITY_SHIFT 0
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
-#define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
-#define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
-
 /* Smallest priority value that cannot be bumped. */
-#define I915_PRIORITY_INVALID (INT_MIN | (u8)I915_PRIORITY_MASK)
+#define I915_PRIORITY_INVALID (INT_MIN)
 
 /*
  * Requests containing performance queries must not be preempted by
@@ -45,9 +42,8 @@ enum {
 #define I915_PRIORITY_BARRIER (I915_PRIORITY_UNPREEMPTABLE - 1)
 
 struct i915_priolist {
-	struct list_head requests[I915_PRIORITY_COUNT];
+	struct list_head requests;
 	struct rb_node node;
-	unsigned long used;
 	int priority;
 };
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 250832768279..7945cc161a12 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -43,7 +43,7 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 static void assert_priolists(struct intel_engine_execlists * const execlists)
 {
 	struct rb_node *rb;
-	long last_prio, i;
+	long last_prio;
 
 	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		return;
@@ -57,14 +57,6 @@ static void assert_priolists(struct intel_engine_execlists * const execlists)
 
 		GEM_BUG_ON(p->priority > last_prio);
 		last_prio = p->priority;
-
-		GEM_BUG_ON(!p->used);
-		for (i = 0; i < ARRAY_SIZE(p->requests); i++) {
-			if (list_empty(&p->requests[i]))
-				continue;
-
-			GEM_BUG_ON(!(p->used & BIT(i)));
-		}
 	}
 }
 
@@ -75,13 +67,10 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	struct i915_priolist *p;
 	struct rb_node **parent, *rb;
 	bool first = true;
-	int idx, i;
 
 	lockdep_assert_held(&engine->active.lock);
 	assert_priolists(execlists);
 
-	/* buckets sorted from highest [in slot 0] to lowest priority */
-	idx = I915_PRIORITY_COUNT - (prio & I915_PRIORITY_MASK) - 1;
 	prio >>= I915_USER_PRIORITY_SHIFT;
 	if (unlikely(execlists->no_priolist))
 		prio = I915_PRIORITY_NORMAL;
@@ -99,7 +88,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 			parent = &rb->rb_right;
 			first = false;
 		} else {
-			goto out;
+			return &p->requests;
 		}
 	}
 
@@ -125,15 +114,12 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	}
 
 	p->priority = prio;
-	for (i = 0; i < ARRAY_SIZE(p->requests); i++)
-		INIT_LIST_HEAD(&p->requests[i]);
+	INIT_LIST_HEAD(&p->requests);
+
 	rb_link_node(&p->node, rb, parent);
 	rb_insert_color_cached(&p->node, &execlists->queue, first);
-	p->used = 0;
 
-out:
-	p->used |= BIT(idx);
-	return &p->requests[idx];
+	return &p->requests;
 }
 
 void __i915_priolist_free(struct i915_priolist *p)
@@ -363,30 +349,6 @@ void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr)
 	spin_unlock_irq(&schedule_lock);
 }
 
-static void __bump_priority(struct i915_sched_node *node, unsigned int bump)
-{
-	struct i915_sched_attr attr = node->attr;
-
-	if (attr.priority & bump)
-		return;
-
-	attr.priority |= bump;
-	__i915_schedule(node, &attr);
-}
-
-void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
-{
-	unsigned long flags;
-
-	GEM_BUG_ON(bump & ~I915_PRIORITY_MASK);
-	if (READ_ONCE(rq->sched.attr.priority) & bump)
-		return;
-
-	spin_lock_irqsave(&schedule_lock, flags);
-	__bump_priority(&rq->sched, bump);
-	spin_unlock_irqrestore(&schedule_lock, flags);
-}
-
 void i915_sched_node_init(struct i915_sched_node *node)
 {
 	INIT_LIST_HEAD(&node->signalers_list);
@@ -570,8 +532,7 @@ int __init i915_global_scheduler_init(void)
 	if (!global.slab_dependencies)
 		return -ENOMEM;
 
-	global.slab_priorities = KMEM_CACHE(i915_priolist,
-					    SLAB_HWCACHE_ALIGN);
+	global.slab_priorities = KMEM_CACHE(i915_priolist, 0);
 	if (!global.slab_priorities)
 		goto err_priorities;
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 13432add8929..1b3c1e1a6ec5 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -13,17 +13,11 @@
 
 #include "i915_scheduler_types.h"
 
-#define priolist_for_each_request(it, plist, idx) \
-	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
-		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
-
-#define priolist_for_each_request_consume(it, n, plist, idx) \
-	for (; \
-	     (plist)->used ? (idx = __ffs((plist)->used)), 1 : 0; \
-	     (plist)->used &= ~BIT(idx)) \
-		list_for_each_entry_safe(it, n, \
-					 &(plist)->requests[idx], \
-					 sched.link)
+#define priolist_for_each_request(it, plist) \
+	list_for_each_entry(it, &(plist)->requests, sched.link)
+
+#define priolist_for_each_request_consume(it, n, plist) \
+	list_for_each_entry_safe(it, n, &(plist)->requests, sched.link)
 
 void i915_sched_node_init(struct i915_sched_node *node);
 void i915_sched_node_reinit(struct i915_sched_node *node);
@@ -45,8 +39,6 @@ void i915_sched_node_fini(struct i915_sched_node *node);
 void i915_schedule(struct i915_request *request,
 		   const struct i915_sched_attr *attr);
 
-void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
-
 struct list_head *
 i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 19/28] drm/i915: Remove I915_USER_PRIORITY_SHIFT
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (16 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 18/28] drm/i915: Strip out internal priorities Chris Wilson
@ 2020-06-07 22:20 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 20/28] drm/i915: Replace engine->schedule() with a known request operation Chris Wilson
                   ` (14 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:20 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

As we do not have any internal priority levels, the priority can be set
directed from the user values.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  4 +-
 drivers/gpu/drm/i915/gem/i915_gem_context.c   |  6 +--
 .../i915/gem/selftests/i915_gem_object_blt.c  |  4 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  6 +--
 drivers/gpu/drm/i915/gt/selftest_lrc.c        | 44 +++++++------------
 drivers/gpu/drm/i915/i915_priolist_types.h    |  3 --
 drivers/gpu/drm/i915/i915_scheduler.c         |  1 -
 7 files changed, 23 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index b16aca0fe5f0..511555d444e5 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -15890,9 +15890,7 @@ static void intel_plane_unpin_fb(struct intel_plane_state *old_plane_state)
 
 static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj)
 {
-	struct i915_sched_attr attr = {
-		.priority = I915_USER_PRIORITY(I915_PRIORITY_DISPLAY),
-	};
+	struct i915_sched_attr attr = { .priority = I915_PRIORITY_DISPLAY };
 
 	i915_gem_object_wait_priority(obj, 0, &attr);
 }
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index f5d59d18cd5b..ef76dff0e255 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -712,7 +712,7 @@ __create_context(struct drm_i915_private *i915)
 
 	kref_init(&ctx->ref);
 	ctx->i915 = i915;
-	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
+	ctx->sched.priority = I915_PRIORITY_NORMAL;
 	mutex_init(&ctx->mutex);
 
 	spin_lock_init(&ctx->stale.lock);
@@ -1999,7 +1999,7 @@ static int set_priority(struct i915_gem_context *ctx,
 	    !capable(CAP_SYS_NICE))
 		return -EPERM;
 
-	ctx->sched.priority = I915_USER_PRIORITY(priority);
+	ctx->sched.priority = priority;
 	context_apply_all(ctx, __apply_priority, ctx);
 
 	return 0;
@@ -2502,7 +2502,7 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 
 	case I915_CONTEXT_PARAM_PRIORITY:
 		args->size = 0;
-		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
+		args->value = ctx->sched.priority;
 		break;
 
 	case I915_CONTEXT_PARAM_SSEU:
diff --git a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
index 23b6e11bbc3e..c4c04fb97d14 100644
--- a/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
+++ b/drivers/gpu/drm/i915/gem/selftests/i915_gem_object_blt.c
@@ -220,7 +220,7 @@ static int igt_fill_blt_thread(void *arg)
 			return PTR_ERR(ctx);
 
 		prio = i915_prandom_u32_max_state(I915_PRIORITY_MAX, prng);
-		ctx->sched.priority = I915_USER_PRIORITY(prio);
+		ctx->sched.priority = prio;
 	}
 
 	ce = i915_gem_context_get_engine(ctx, 0);
@@ -338,7 +338,7 @@ static int igt_copy_blt_thread(void *arg)
 			return PTR_ERR(ctx);
 
 		prio = i915_prandom_u32_max_state(I915_PRIORITY_MAX, prng);
-		ctx->sched.priority = I915_USER_PRIORITY(prio);
+		ctx->sched.priority = prio;
 	}
 
 	ce = i915_gem_context_get_engine(ctx, 0);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index eecf666c772d..ee002eb796cb 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -57,9 +57,7 @@ static void show_heartbeat(const struct i915_request *rq,
 
 static void heartbeat(struct work_struct *wrk)
 {
-	struct i915_sched_attr attr = {
-		.priority = I915_USER_PRIORITY(I915_PRIORITY_MIN),
-	};
+	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MIN };
 	struct intel_engine_cs *engine =
 		container_of(wrk, typeof(*engine), heartbeat.work.work);
 	struct intel_context *ce = engine->kernel_context;
@@ -99,7 +97,7 @@ static void heartbeat(struct work_struct *wrk)
 			 */
 			attr.priority = 0;
 			if (rq->sched.attr.priority >= attr.priority)
-				attr.priority |= I915_USER_PRIORITY(I915_PRIORITY_HEARTBEAT);
+				attr.priority = I915_PRIORITY_HEARTBEAT;
 			if (rq->sched.attr.priority >= attr.priority)
 				attr.priority = I915_PRIORITY_BARRIER;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index a0248c47d7bd..15aaa1bf8943 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -360,7 +360,7 @@ static int live_unlite_switch(void *arg)
 
 static int live_unlite_preempt(void *arg)
 {
-	return live_unlite_restore(arg, I915_USER_PRIORITY(I915_PRIORITY_MAX));
+	return live_unlite_restore(arg, I915_PRIORITY_MAX);
 }
 
 static int live_pin_rewind(void *arg)
@@ -1193,9 +1193,7 @@ static int live_timeslice_queue(void *arg)
 		goto err_pin;
 
 	for_each_engine(engine, gt, id) {
-		struct i915_sched_attr attr = {
-			.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX),
-		};
+		struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 		struct i915_request *rq, *nop;
 
 		if (!intel_engine_has_preemption(engine))
@@ -1410,14 +1408,12 @@ static int live_busywait_preempt(void *arg)
 	ctx_hi = kernel_context(gt->i915);
 	if (!ctx_hi)
 		return -ENOMEM;
-	ctx_hi->sched.priority =
-		I915_USER_PRIORITY(I915_CONTEXT_MAX_USER_PRIORITY);
+	ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
 
 	ctx_lo = kernel_context(gt->i915);
 	if (!ctx_lo)
 		goto err_ctx_hi;
-	ctx_lo->sched.priority =
-		I915_USER_PRIORITY(I915_CONTEXT_MIN_USER_PRIORITY);
+	ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY;
 
 	obj = i915_gem_object_create_internal(gt->i915, PAGE_SIZE);
 	if (IS_ERR(obj)) {
@@ -1620,14 +1616,12 @@ static int live_preempt(void *arg)
 	ctx_hi = kernel_context(gt->i915);
 	if (!ctx_hi)
 		goto err_spin_lo;
-	ctx_hi->sched.priority =
-		I915_USER_PRIORITY(I915_CONTEXT_MAX_USER_PRIORITY);
+	ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
 
 	ctx_lo = kernel_context(gt->i915);
 	if (!ctx_lo)
 		goto err_ctx_hi;
-	ctx_lo->sched.priority =
-		I915_USER_PRIORITY(I915_CONTEXT_MIN_USER_PRIORITY);
+	ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY;
 
 	for_each_engine(engine, gt, id) {
 		struct igt_live_test t;
@@ -1723,7 +1717,7 @@ static int live_late_preempt(void *arg)
 		goto err_ctx_hi;
 
 	/* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */
-	ctx_lo->sched.priority = I915_USER_PRIORITY(1);
+	ctx_lo->sched.priority = 1;
 
 	for_each_engine(engine, gt, id) {
 		struct igt_live_test t;
@@ -1764,7 +1758,7 @@ static int live_late_preempt(void *arg)
 			goto err_wedged;
 		}
 
-		attr.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX);
+		attr.priority = I915_PRIORITY_MAX;
 		engine->schedule(rq, &attr);
 
 		if (!igt_wait_for_spinner(&spin_hi, rq)) {
@@ -1848,7 +1842,7 @@ static int live_nopreempt(void *arg)
 		return -ENOMEM;
 	if (preempt_client_init(gt, &b))
 		goto err_client_a;
-	b.ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX);
+	b.ctx->sched.priority = I915_PRIORITY_MAX;
 
 	for_each_engine(engine, gt, id) {
 		struct i915_request *rq_a, *rq_b;
@@ -2241,11 +2235,9 @@ static int live_preempt_cancel(void *arg)
 
 static int live_suppress_self_preempt(void *arg)
 {
+	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 	struct intel_gt *gt = arg;
 	struct intel_engine_cs *engine;
-	struct i915_sched_attr attr = {
-		.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX)
-	};
 	struct preempt_client a, b;
 	enum intel_engine_id id;
 	int err = -ENOMEM;
@@ -2382,9 +2374,7 @@ static int live_chain_preempt(void *arg)
 		goto err_client_hi;
 
 	for_each_engine(engine, gt, id) {
-		struct i915_sched_attr attr = {
-			.priority = I915_USER_PRIORITY(I915_PRIORITY_MAX),
-		};
+		struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 		struct igt_live_test t;
 		struct i915_request *rq;
 		int ring_size, count, i;
@@ -2640,9 +2630,7 @@ static int live_preempt_gang(void *arg)
 			return -EIO;
 
 		do {
-			struct i915_sched_attr attr = {
-				.priority = I915_USER_PRIORITY(prio++),
-			};
+			struct i915_sched_attr attr = { .priority = prio++ };
 
 			err = create_gang(engine, &rq);
 			if (err)
@@ -2679,7 +2667,7 @@ static int live_preempt_gang(void *arg)
 					drm_info_printer(engine->i915->drm.dev);
 
 				pr_err("Failed to flush chain of %d requests, at %d\n",
-				       prio, rq_prio(rq) >> I915_USER_PRIORITY_SHIFT);
+				       prio, rq_prio(rq));
 				intel_engine_dump(engine, &p,
 						  "%s\n", engine->name);
 
@@ -3053,14 +3041,12 @@ static int live_preempt_timeout(void *arg)
 	ctx_hi = kernel_context(gt->i915);
 	if (!ctx_hi)
 		goto err_spin_lo;
-	ctx_hi->sched.priority =
-		I915_USER_PRIORITY(I915_CONTEXT_MAX_USER_PRIORITY);
+	ctx_hi->sched.priority = I915_CONTEXT_MAX_USER_PRIORITY;
 
 	ctx_lo = kernel_context(gt->i915);
 	if (!ctx_lo)
 		goto err_ctx_hi;
-	ctx_lo->sched.priority =
-		I915_USER_PRIORITY(I915_CONTEXT_MIN_USER_PRIORITY);
+	ctx_lo->sched.priority = I915_CONTEXT_MIN_USER_PRIORITY;
 
 	for_each_engine(engine, gt, id) {
 		unsigned long saved_timeout;
diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index 9a7657bb002e..bc2fa84f98a8 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -24,9 +24,6 @@ enum {
 	I915_PRIORITY_DISPLAY,
 };
 
-#define I915_USER_PRIORITY_SHIFT 0
-#define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
-
 /* Smallest priority value that cannot be bumped. */
 #define I915_PRIORITY_INVALID (INT_MIN)
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7945cc161a12..7246ffbb3e33 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -71,7 +71,6 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	lockdep_assert_held(&engine->active.lock);
 	assert_priolists(execlists);
 
-	prio >>= I915_USER_PRIORITY_SHIFT;
 	if (unlikely(execlists->no_priolist))
 		prio = I915_PRIORITY_NORMAL;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 20/28] drm/i915: Replace engine->schedule() with a known request operation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (17 preceding siblings ...)
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 19/28] drm/i915: Remove I915_USER_PRIORITY_SHIFT Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: Do not suspend bonded requests if one hangs Chris Wilson
                   ` (13 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Looking to the future, we want to set the scheduling attributes
explicitly and so replace the generic engine->schedule() with the more
direct i915_request_set_priority()

What it loses in removing the 'schedule' name from the function, it
gains in having an explicit entry point with a stated goal.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/display/intel_display.c  |  9 +----
 drivers/gpu/drm/i915/gem/i915_gem_object.h    |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_wait.c      | 27 +++++----------
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  3 --
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |  4 +--
 drivers/gpu/drm/i915/gt/intel_engine_types.h  | 29 ++++++++--------
 drivers/gpu/drm/i915/gt/intel_engine_user.c   |  2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           |  3 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  | 11 +++----
 drivers/gpu/drm/i915/gt/selftest_lrc.c        | 33 +++++--------------
 drivers/gpu/drm/i915/i915_request.c           | 11 ++++---
 drivers/gpu/drm/i915/i915_scheduler.c         | 15 +++++----
 drivers/gpu/drm/i915/i915_scheduler.h         |  3 +-
 13 files changed, 57 insertions(+), 95 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 511555d444e5..797e3573d392 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -15888,13 +15888,6 @@ static void intel_plane_unpin_fb(struct intel_plane_state *old_plane_state)
 		intel_unpin_fb_vma(vma, old_plane_state->flags);
 }
 
-static void fb_obj_bump_render_priority(struct drm_i915_gem_object *obj)
-{
-	struct i915_sched_attr attr = { .priority = I915_PRIORITY_DISPLAY };
-
-	i915_gem_object_wait_priority(obj, 0, &attr);
-}
-
 /**
  * intel_prepare_plane_fb - Prepare fb for usage on plane
  * @_plane: drm plane to prepare for
@@ -15971,7 +15964,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 	if (ret)
 		return ret;
 
-	fb_obj_bump_render_priority(obj);
+	i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
 	i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB);
 
 	if (!new_plane_state->uapi.fence) { /* implicit fencing */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 2faa481cc18f..876c34982555 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -476,7 +476,7 @@ int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 			 long timeout);
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 				  unsigned int flags,
-				  const struct i915_sched_attr *attr);
+				  int prio);
 
 void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
 					 enum fb_op_origin origin);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index 8af55cd3e690..cefbbb3d9b52 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -93,28 +93,17 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
 	return timeout;
 }
 
-static void __fence_set_priority(struct dma_fence *fence,
-				 const struct i915_sched_attr *attr)
+static void __fence_set_priority(struct dma_fence *fence, int prio)
 {
-	struct i915_request *rq;
-	struct intel_engine_cs *engine;
-
 	if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence))
 		return;
 
-	rq = to_request(fence);
-	engine = rq->engine;
-
 	local_bh_disable();
-	rcu_read_lock(); /* RCU serialisation for set-wedged protection */
-	if (engine->schedule)
-		engine->schedule(rq, attr);
-	rcu_read_unlock();
+	i915_request_set_priority(to_request(fence), prio);
 	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
-static void fence_set_priority(struct dma_fence *fence,
-			       const struct i915_sched_attr *attr)
+static void fence_set_priority(struct dma_fence *fence, int prio)
 {
 	/* Recurse once into a fence-array */
 	if (dma_fence_is_array(fence)) {
@@ -122,16 +111,16 @@ static void fence_set_priority(struct dma_fence *fence,
 		int i;
 
 		for (i = 0; i < array->num_fences; i++)
-			__fence_set_priority(array->fences[i], attr);
+			__fence_set_priority(array->fences[i], prio);
 	} else {
-		__fence_set_priority(fence, attr);
+		__fence_set_priority(fence, prio);
 	}
 }
 
 int
 i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 			      unsigned int flags,
-			      const struct i915_sched_attr *attr)
+			      int prio)
 {
 	struct dma_fence *excl;
 
@@ -146,7 +135,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 			return ret;
 
 		for (i = 0; i < count; i++) {
-			fence_set_priority(shared[i], attr);
+			fence_set_priority(shared[i], prio);
 			dma_fence_put(shared[i]);
 		}
 
@@ -156,7 +145,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 	}
 
 	if (excl) {
-		fence_set_priority(excl, attr);
+		fence_set_priority(excl, prio);
 		dma_fence_put(excl);
 	}
 	return 0;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index e5141a897786..d79307d790da 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -334,9 +334,6 @@ static int intel_engine_setup(struct intel_gt *gt, enum intel_engine_id id)
 	if (engine->context_size)
 		DRIVER_CAPS(i915)->has_logical_contexts = true;
 
-	/* Nothing to do here, execute in order of dependencies */
-	engine->schedule = NULL;
-
 	ewma__engine_latency_init(&engine->latency);
 	seqlock_init(&engine->stats.lock);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index ee002eb796cb..5251860e952d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -87,7 +87,7 @@ static void heartbeat(struct work_struct *wrk)
 			 * but all other contexts, including the kernel
 			 * context are stuck waiting for the signal.
 			 */
-		} else if (engine->schedule &&
+		} else if (intel_engine_has_scheduler(engine) &&
 			   rq->sched.attr.priority < I915_PRIORITY_BARRIER) {
 			/*
 			 * Gradually raise the priority of the heartbeat to
@@ -102,7 +102,7 @@ static void heartbeat(struct work_struct *wrk)
 				attr.priority = I915_PRIORITY_BARRIER;
 
 			local_bh_disable();
-			engine->schedule(rq, &attr);
+			i915_request_set_priority(rq, attr.priority);
 			local_bh_enable();
 		} else {
 			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 073c3769e8cc..48e111f16dc5 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -482,14 +482,6 @@ struct intel_engine_cs {
 	void            (*bond_execute)(struct i915_request *rq,
 					struct dma_fence *signal);
 
-	/*
-	 * Call when the priority on a request has changed and it and its
-	 * dependencies may need rescheduling. Note the request itself may
-	 * not be ready to run!
-	 */
-	void		(*schedule)(struct i915_request *request,
-				    const struct i915_sched_attr *attr);
-
 	void		(*release)(struct intel_engine_cs *engine);
 
 	struct intel_engine_execlists execlists;
@@ -507,13 +499,14 @@ struct intel_engine_cs {
 
 #define I915_ENGINE_USING_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
-#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
-#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-#define I915_ENGINE_HAS_TIMESLICES   BIT(4)
-#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(5)
-#define I915_ENGINE_IS_VIRTUAL       BIT(6)
-#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(7)
-#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(8)
+#define I915_ENGINE_HAS_SCHEDULER    BIT(2)
+#define I915_ENGINE_HAS_PREEMPTION   BIT(3)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(4)
+#define I915_ENGINE_HAS_TIMESLICES   BIT(5)
+#define I915_ENGINE_NEEDS_BREADCRUMB_TASKLET BIT(6)
+#define I915_ENGINE_IS_VIRTUAL       BIT(7)
+#define I915_ENGINE_HAS_RELATIVE_MMIO BIT(8)
+#define I915_ENGINE_REQUIRES_CMD_PARSER BIT(9)
 	unsigned int flags;
 
 	/*
@@ -599,6 +592,12 @@ intel_engine_supports_stats(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
 }
 
+static inline bool
+intel_engine_has_scheduler(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SCHEDULER;
+}
+
 static inline bool
 intel_engine_has_preemption(const struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_user.c b/drivers/gpu/drm/i915/gt/intel_engine_user.c
index 848decee9066..1c0a7f3ec0bd 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_user.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_user.c
@@ -108,7 +108,7 @@ static void set_scheduler_caps(struct drm_i915_private *i915)
 	for_each_uabi_engine(engine, i915) { /* all engines must agree! */
 		int i;
 
-		if (engine->schedule)
+		if (intel_engine_has_scheduler(engine))
 			enabled |= (I915_SCHEDULER_CAP_ENABLED |
 				    I915_SCHEDULER_CAP_PRIORITY);
 		else
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 0ca3604ab846..3199c65fa7e8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -4938,7 +4938,6 @@ static void execlists_park(struct intel_engine_cs *engine)
 void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 {
 	engine->submit_request = execlists_submit_request;
-	engine->schedule = i915_schedule;
 	engine->execlists.tasklet.func = execlists_submission_tasklet;
 
 	engine->reset.prepare = execlists_reset_prepare;
@@ -4949,6 +4948,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->park = execlists_park;
 	engine->unpark = NULL;
 
+	engine->flags |= I915_ENGINE_HAS_SCHEDULER;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (!intel_vgpu_active(engine->i915)) {
 		engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
@@ -5682,7 +5682,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	ve->base.cops = &virtual_context_ops;
 	ve->base.request_alloc = execlists_request_alloc;
 
-	ve->base.schedule = i915_schedule;
 	ve->base.submit_request = virtual_submit_request;
 	ve->base.bond_execute = virtual_bond_execute;
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index 2af66f8ffbd2..afa4f88035ac 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -727,12 +727,11 @@ static int active_engine(void *data)
 		rq[idx] = i915_request_get(new);
 		i915_request_add(new);
 
-		if (engine->schedule && arg->flags & TEST_PRIORITY) {
-			struct i915_sched_attr attr = {
-				.priority =
-					i915_prandom_u32_max_state(512, &prng),
-			};
-			engine->schedule(rq[idx], &attr);
+		if (intel_engine_has_scheduler(engine) &&
+		    arg->flags & TEST_PRIORITY) {
+			int prio = i915_prandom_u32_max_state(512, &prng);
+
+			i915_request_set_priority(rq[idx], prio);
 		}
 
 		err = active_request_put(old);
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 15aaa1bf8943..052dcc59fcc5 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -308,12 +308,8 @@ static int live_unlite_restore(struct intel_gt *gt, int prio)
 		i915_request_put(rq[0]);
 
 		if (prio) {
-			struct i915_sched_attr attr = {
-				.priority = prio,
-			};
-
 			/* Alternatively preempt the spinner with ce[1] */
-			engine->schedule(rq[1], &attr);
+			i915_request_set_priority(rq[1], prio);
 		}
 
 		/* And switch back to ce[0] for good measure */
@@ -760,9 +756,6 @@ release_queue(struct intel_engine_cs *engine,
 	      struct i915_vma *vma,
 	      int idx, int prio)
 {
-	struct i915_sched_attr attr = {
-		.priority = prio,
-	};
 	struct i915_request *rq;
 	u32 *cs;
 
@@ -787,7 +780,7 @@ release_queue(struct intel_engine_cs *engine,
 	i915_request_add(rq);
 
 	local_bh_disable();
-	engine->schedule(rq, &attr);
+	i915_request_set_priority(rq, prio);
 	local_bh_enable(); /* kick tasklet */
 
 	i915_request_put(rq);
@@ -1193,7 +1186,6 @@ static int live_timeslice_queue(void *arg)
 		goto err_pin;
 
 	for_each_engine(engine, gt, id) {
-		struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 		struct i915_request *rq, *nop;
 
 		if (!intel_engine_has_preemption(engine))
@@ -1208,7 +1200,7 @@ static int live_timeslice_queue(void *arg)
 			err = PTR_ERR(rq);
 			goto err_heartbeat;
 		}
-		engine->schedule(rq, &attr);
+		i915_request_set_priority(rq, I915_PRIORITY_MAX);
 		err = wait_for_submit(engine, rq, HZ / 2);
 		if (err) {
 			pr_err("%s: Timed out trying to submit semaphores\n",
@@ -1695,7 +1687,6 @@ static int live_late_preempt(void *arg)
 	struct i915_gem_context *ctx_hi, *ctx_lo;
 	struct igt_spinner spin_hi, spin_lo;
 	struct intel_engine_cs *engine;
-	struct i915_sched_attr attr = {};
 	enum intel_engine_id id;
 	int err = -ENOMEM;
 
@@ -1758,8 +1749,7 @@ static int live_late_preempt(void *arg)
 			goto err_wedged;
 		}
 
-		attr.priority = I915_PRIORITY_MAX;
-		engine->schedule(rq, &attr);
+		i915_request_set_priority(rq, I915_PRIORITY_MAX);
 
 		if (!igt_wait_for_spinner(&spin_hi, rq)) {
 			pr_err("High priority context failed to preempt the low priority context\n");
@@ -2235,7 +2225,6 @@ static int live_preempt_cancel(void *arg)
 
 static int live_suppress_self_preempt(void *arg)
 {
-	struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 	struct intel_gt *gt = arg;
 	struct intel_engine_cs *engine;
 	struct preempt_client a, b;
@@ -2306,7 +2295,7 @@ static int live_suppress_self_preempt(void *arg)
 			i915_request_add(rq_b);
 
 			GEM_BUG_ON(i915_request_completed(rq_a));
-			engine->schedule(rq_a, &attr);
+			i915_request_set_priority(rq_a, I915_PRIORITY_MAX);
 			igt_spinner_end(&a.spin);
 
 			if (!igt_wait_for_spinner(&b.spin, rq_b)) {
@@ -2374,7 +2363,6 @@ static int live_chain_preempt(void *arg)
 		goto err_client_hi;
 
 	for_each_engine(engine, gt, id) {
-		struct i915_sched_attr attr = { .priority = I915_PRIORITY_MAX };
 		struct igt_live_test t;
 		struct i915_request *rq;
 		int ring_size, count, i;
@@ -2441,7 +2429,7 @@ static int live_chain_preempt(void *arg)
 
 			i915_request_get(rq);
 			i915_request_add(rq);
-			engine->schedule(rq, &attr);
+			i915_request_set_priority(rq, I915_PRIORITY_MAX);
 
 			igt_spinner_end(&hi.spin);
 			if (i915_request_wait(rq, 0, HZ / 5) < 0) {
@@ -2630,14 +2618,12 @@ static int live_preempt_gang(void *arg)
 			return -EIO;
 
 		do {
-			struct i915_sched_attr attr = { .priority = prio++ };
-
 			err = create_gang(engine, &rq);
 			if (err)
 				break;
 
 			/* Submit each spinner at increasing priority */
-			engine->schedule(rq, &attr);
+			i915_request_set_priority(rq, prio++);
 		} while (prio <= I915_PRIORITY_MAX &&
 			 !__igt_timeout(end_time, NULL));
 		pr_debug("%s: Preempt chain of %d requests\n",
@@ -2859,9 +2845,6 @@ static int preempt_user(struct intel_engine_cs *engine,
 			struct i915_vma *global,
 			int id)
 {
-	struct i915_sched_attr attr = {
-		.priority = I915_PRIORITY_MAX
-	};
 	struct i915_request *rq;
 	int err = 0;
 	u32 *cs;
@@ -2886,7 +2869,7 @@ static int preempt_user(struct intel_engine_cs *engine,
 	i915_request_get(rq);
 	i915_request_add(rq);
 
-	engine->schedule(rq, &attr);
+	i915_request_set_priority(rq, I915_PRIORITY_MAX);
 
 	if (i915_request_wait(rq, 0, HZ / 2) < 0)
 		err = -ETIME;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index f04f91b4d879..6c602b29026d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1182,7 +1182,7 @@ __i915_request_await_execution(struct i915_request *to,
 	}
 
 	/* Couple the dependency tree for PI on this exposed to->fence */
-	if (to->engine->schedule) {
+	if (intel_engine_has_scheduler(to->engine)) {
 		err = i915_sched_node_add_dependency(&to->sched,
 						     &from->sched,
 						     I915_DEPENDENCY_WEAK);
@@ -1453,7 +1453,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 	}
 
-	if (to->engine->schedule) {
+	if (intel_engine_has_scheduler(to->engine)) {
 		ret = i915_sched_node_add_dependency(&to->sched,
 						     &from->sched,
 						     I915_DEPENDENCY_EXTERNAL);
@@ -1663,7 +1663,7 @@ __i915_request_add_to_timeline(struct i915_request *rq)
 			__i915_sw_fence_await_dma_fence(&rq->submit,
 							&prev->fence,
 							&rq->dmaq);
-		if (rq->engine->schedule)
+		if (intel_engine_has_scheduler(rq->engine))
 			__i915_sched_node_add_dependency(&rq->sched,
 							 &prev->sched,
 							 &rq->dep,
@@ -1729,8 +1729,9 @@ void __i915_request_queue(struct i915_request *rq,
 	 * decide whether to preempt the entire chain so that it is ready to
 	 * run at the earliest possible convenience.
 	 */
-	if (attr && rq->engine->schedule)
-		rq->engine->schedule(rq, attr);
+	if (attr)
+		i915_request_set_priority(rq, attr->priority);
+
 	i915_sw_fence_commit(&rq->semaphore);
 	i915_sw_fence_commit(&rq->submit);
 }
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 7246ffbb3e33..9437e9d1d445 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -216,10 +216,8 @@ static void kick_submission(struct intel_engine_cs *engine,
 	rcu_read_unlock();
 }
 
-static void __i915_schedule(struct i915_sched_node *node,
-			    const struct i915_sched_attr *attr)
+static void __i915_schedule(struct i915_sched_node *node, int prio)
 {
-	const int prio = max(attr->priority, node->attr.priority);
 	struct intel_engine_cs *engine;
 	struct i915_dependency *dep, *p;
 	struct i915_dependency stack;
@@ -233,6 +231,8 @@ static void __i915_schedule(struct i915_sched_node *node,
 	if (node_signaled(node))
 		return;
 
+	prio = max(prio, node->attr.priority);
+
 	stack.signaler = node;
 	list_add(&stack.dfs_link, &dfs);
 
@@ -286,7 +286,7 @@ static void __i915_schedule(struct i915_sched_node *node,
 	 */
 	if (node->attr.priority == I915_PRIORITY_INVALID) {
 		GEM_BUG_ON(!list_empty(&node->link));
-		node->attr = *attr;
+		node->attr.priority = prio;
 
 		if (stack.dfs_link.next == stack.dfs_link.prev)
 			return;
@@ -341,10 +341,13 @@ static void __i915_schedule(struct i915_sched_node *node,
 	spin_unlock(&engine->active.lock);
 }
 
-void i915_schedule(struct i915_request *rq, const struct i915_sched_attr *attr)
+void i915_request_set_priority(struct i915_request *rq, int prio)
 {
+	if (!intel_engine_has_scheduler(rq->engine))
+		return;
+
 	spin_lock_irq(&schedule_lock);
-	__i915_schedule(&rq->sched, attr);
+	__i915_schedule(&rq->sched, prio);
 	spin_unlock_irq(&schedule_lock);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 1b3c1e1a6ec5..b8696edef446 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -36,8 +36,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 
 void i915_sched_node_fini(struct i915_sched_node *node);
 
-void i915_schedule(struct i915_request *request,
-		   const struct i915_sched_attr *attr);
+void i915_request_set_priority(struct i915_request *request, int prio);
 
 struct list_head *
 i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 21/28] drm/i915/gt: Do not suspend bonded requests if one hangs
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (18 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 20/28] drm/i915: Replace engine->schedule() with a known request operation Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 22/28] drm/i915: Teach the i915_dependency to use a double-lock Chris Wilson
                   ` (12 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Treat the dependency between bonded requests as weak and leave the
remainder of the pair on the GPU if one hangs.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3199c65fa7e8..af6f78eca9ad 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2726,6 +2726,9 @@ static void __execlists_hold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Leave semaphores spinning on the other engines */
 			if (w->engine != rq->engine)
 				continue;
@@ -2850,6 +2853,9 @@ static void __execlists_unhold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
+			if (p->flags & I915_DEPENDENCY_WEAK)
+				continue;
+
 			/* Propagate any change in error status */
 			if (rq->fence.error)
 				i915_request_set_error_once(w, rq->fence.error);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 22/28] drm/i915: Teach the i915_dependency to use a double-lock
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (19 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: Do not suspend bonded requests if one hangs Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 23/28] drm/i915: Restructure priority inheritance Chris Wilson
                   ` (11 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Currently, we construct and teardown the i915_dependency chains using a
global spinlock. As the lists are entirely local, it should be possible
to use an double-lock with an explicit nesting [signaler -> waiter,
always] and so avoid the costly convenience of a global spinlock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c         |  6 +--
 drivers/gpu/drm/i915/i915_request.c         |  2 +-
 drivers/gpu/drm/i915/i915_scheduler.c       | 44 +++++++++++++--------
 drivers/gpu/drm/i915/i915_scheduler.h       |  2 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h |  1 +
 5 files changed, 34 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index af6f78eca9ad..3fb1b4c67adb 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1884,7 +1884,7 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
-			if (p->flags & I915_DEPENDENCY_WEAK)
+			if (!p->waiter || p->flags & I915_DEPENDENCY_WEAK)
 				continue;
 
 			/* Leave semaphores spinning on the other engines */
@@ -2726,7 +2726,7 @@ static void __execlists_hold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
-			if (p->flags & I915_DEPENDENCY_WEAK)
+			if (!p->waiter || p->flags & I915_DEPENDENCY_WEAK)
 				continue;
 
 			/* Leave semaphores spinning on the other engines */
@@ -2853,7 +2853,7 @@ static void __execlists_unhold(struct i915_request *rq)
 			struct i915_request *w =
 				container_of(p->waiter, typeof(*w), sched);
 
-			if (p->flags & I915_DEPENDENCY_WEAK)
+			if (!p->waiter || p->flags & I915_DEPENDENCY_WEAK)
 				continue;
 
 			/* Propagate any change in error status */
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 6c602b29026d..a09fe74bb818 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -338,7 +338,7 @@ bool i915_request_retire(struct i915_request *rq)
 	intel_context_unpin(rq->context);
 
 	free_capture_list(rq);
-	i915_sched_node_fini(&rq->sched);
+	i915_sched_node_retire(&rq->sched);
 	i915_request_put(rq);
 
 	return true;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 9437e9d1d445..f9cd8baaefcd 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -353,6 +353,8 @@ void i915_request_set_priority(struct i915_request *rq, int prio)
 
 void i915_sched_node_init(struct i915_sched_node *node)
 {
+	spin_lock_init(&node->lock);
+
 	INIT_LIST_HEAD(&node->signalers_list);
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
@@ -390,7 +392,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 {
 	bool ret = false;
 
-	spin_lock_irq(&schedule_lock);
+	/* The signal->lock is always the outer lock in this double-lock. */
+	spin_lock_irq(&signal->lock);
 
 	if (!node_signaled(signal)) {
 		INIT_LIST_HEAD(&dep->dfs_link);
@@ -399,15 +402,17 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 		dep->flags = flags;
 
 		/* All set, now publish. Beware the lockless walkers. */
+		spin_lock_nested(&node->lock, SINGLE_DEPTH_NESTING);
 		list_add_rcu(&dep->signal_link, &node->signalers_list);
 		list_add_rcu(&dep->wait_link, &signal->waiters_list);
+		spin_unlock(&node->lock);
 
 		/* Propagate the chains */
 		node->flags |= signal->flags;
 		ret = true;
 	}
 
-	spin_unlock_irq(&schedule_lock);
+	spin_unlock_irq(&signal->lock);
 
 	return ret;
 }
@@ -474,39 +479,46 @@ bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
 	return result;
 }
 
-void i915_sched_node_fini(struct i915_sched_node *node)
+void i915_sched_node_retire(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
 
-	spin_lock_irq(&schedule_lock);
+	spin_lock_irq(&node->lock);
 
 	/*
 	 * Everyone we depended upon (the fences we wait to be signaled)
 	 * should retire before us and remove themselves from our list.
 	 * However, retirement is run independently on each timeline and
-	 * so we may be called out-of-order.
+	 * so we may be called out-of-order. As we need to avoid taking
+	 * the signaler's lock, just mark up our completion and be wary
+	 * in traversing the signalers->waiters_list.
 	 */
-	list_for_each_entry_safe(dep, tmp, &node->signalers_list, signal_link) {
-		GEM_BUG_ON(!list_empty(&dep->dfs_link));
-
-		list_del_rcu(&dep->wait_link);
-		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(dep);
+	list_for_each_entry(dep, &node->signalers_list, signal_link) {
+		GEM_BUG_ON(dep->waiter != node);
+		WRITE_ONCE(dep->waiter, NULL);
 	}
-	INIT_LIST_HEAD(&node->signalers_list);
+	INIT_LIST_HEAD_RCU(&node->signalers_list);
 
 	/* Remove ourselves from everyone who depends upon us */
 	list_for_each_entry_safe(dep, tmp, &node->waiters_list, wait_link) {
+		struct i915_sched_node *w;
+
 		GEM_BUG_ON(dep->signaler != node);
-		GEM_BUG_ON(!list_empty(&dep->dfs_link));
 
-		list_del_rcu(&dep->signal_link);
+		w = READ_ONCE(dep->waiter);
+		if (w) {
+			spin_lock_nested(&w->lock, SINGLE_DEPTH_NESTING);
+			if (READ_ONCE(dep->waiter))
+				list_del_rcu(&dep->signal_link);
+			spin_unlock(&w->lock);
+		}
+
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
 			i915_dependency_free(dep);
 	}
-	INIT_LIST_HEAD(&node->waiters_list);
+	INIT_LIST_HEAD_RCU(&node->waiters_list);
 
-	spin_unlock_irq(&schedule_lock);
+	spin_unlock_irq(&node->lock);
 }
 
 static void i915_global_scheduler_shrink(void)
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index b8696edef446..b26a13ef6feb 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -34,7 +34,7 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal,
 				   unsigned long flags);
 
-void i915_sched_node_fini(struct i915_sched_node *node);
+void i915_sched_node_retire(struct i915_sched_node *node);
 
 void i915_request_set_priority(struct i915_request *request, int prio);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 343ed44d5ed4..3246430eb1c1 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -60,6 +60,7 @@ struct i915_sched_attr {
  * others.
  */
 struct i915_sched_node {
+	spinlock_t lock; /* protect the lists */
 	struct list_head signalers_list; /* those before us, we depend upon */
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 23/28] drm/i915: Restructure priority inheritance
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (20 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 22/28] drm/i915: Teach the i915_dependency to use a double-lock Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 24/28] ipi-dag Chris Wilson
                   ` (10 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

In anticipation of wanting to be able to call pi from underneath an
engine's active.lock, rework the priority inheritance to primarily work
along an engine's priority queue, delegating any other engine that the
chain may traverse to a worker. This reduces the global spinlock from
governing the entire priority inheritance depth-first search, to a small
lock around a single list.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_scheduler.c       | 236 ++++++++++----------
 drivers/gpu/drm/i915/i915_scheduler_types.h |   6 +-
 2 files changed, 121 insertions(+), 121 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index f9cd8baaefcd..320d3720ba34 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -17,7 +17,46 @@ static struct i915_global_scheduler {
 	struct kmem_cache *slab_priorities;
 } global;
 
-static DEFINE_SPINLOCK(schedule_lock);
+static DEFINE_SPINLOCK(ipi_lock);
+static LIST_HEAD(ipi_list);
+
+static inline int rq_prio(const struct i915_request *rq)
+{
+	return READ_ONCE(rq->sched.attr.priority);
+}
+
+static void ipi_schedule(struct irq_work *wrk)
+{
+	rcu_read_lock();
+	do {
+		struct i915_dependency *p;
+		struct i915_request *rq;
+		unsigned long flags;
+		int prio;
+
+		spin_lock_irqsave(&ipi_lock, flags);
+		p = list_first_entry_or_null(&ipi_list, typeof(*p), ipi_link);
+		if (p) {
+			rq = container_of(p->signaler, typeof(*rq), sched);
+			list_del_init(&p->ipi_link);
+
+			prio = p->ipi_priority;
+			p->ipi_priority = I915_PRIORITY_INVALID;
+		}
+		spin_unlock_irqrestore(&ipi_lock, flags);
+		if (!p)
+			break;
+
+		if (i915_request_completed(rq))
+			continue;
+
+		if (prio > rq_prio(rq))
+			i915_request_set_priority(rq, prio);
+	} while (1);
+	rcu_read_unlock();
+}
+
+static DEFINE_IRQ_WORK(ipi_work, ipi_schedule);
 
 static const struct i915_request *
 node_to_request(const struct i915_sched_node *node)
@@ -126,42 +165,6 @@ void __i915_priolist_free(struct i915_priolist *p)
 	kmem_cache_free(global.slab_priorities, p);
 }
 
-struct sched_cache {
-	struct list_head *priolist;
-};
-
-static struct intel_engine_cs *
-sched_lock_engine(const struct i915_sched_node *node,
-		  struct intel_engine_cs *locked,
-		  struct sched_cache *cache)
-{
-	const struct i915_request *rq = node_to_request(node);
-	struct intel_engine_cs *engine;
-
-	GEM_BUG_ON(!locked);
-
-	/*
-	 * Virtual engines complicate acquiring the engine timeline lock,
-	 * as their rq->engine pointer is not stable until under that
-	 * engine lock. The simple ploy we use is to take the lock then
-	 * check that the rq still belongs to the newly locked engine.
-	 */
-	while (locked != (engine = READ_ONCE(rq->engine))) {
-		spin_unlock(&locked->active.lock);
-		memset(cache, 0, sizeof(*cache));
-		spin_lock(&engine->active.lock);
-		locked = engine;
-	}
-
-	GEM_BUG_ON(locked != engine);
-	return locked;
-}
-
-static inline int rq_prio(const struct i915_request *rq)
-{
-	return rq->sched.attr.priority;
-}
-
 static inline bool need_preempt(int prio, int active)
 {
 	/*
@@ -216,25 +219,15 @@ static void kick_submission(struct intel_engine_cs *engine,
 	rcu_read_unlock();
 }
 
-static void __i915_schedule(struct i915_sched_node *node, int prio)
+static void __i915_request_set_priority(struct i915_request *rq, int prio)
 {
-	struct intel_engine_cs *engine;
-	struct i915_dependency *dep, *p;
-	struct i915_dependency stack;
-	struct sched_cache cache;
+	struct intel_engine_cs *engine = rq->engine;
+	struct i915_request *rn;
+	struct list_head *plist;
 	LIST_HEAD(dfs);
 
-	/* Needed in order to use the temporary link inside i915_dependency */
-	lockdep_assert_held(&schedule_lock);
-	GEM_BUG_ON(prio == I915_PRIORITY_INVALID);
-
-	if (node_signaled(node))
-		return;
-
-	prio = max(prio, node->attr.priority);
-
-	stack.signaler = node;
-	list_add(&stack.dfs_link, &dfs);
+	lockdep_assert_held(&engine->active.lock);
+	list_add(&rq->sched.dfs, &dfs);
 
 	/*
 	 * Recursively bump all dependent priorities to match the new request.
@@ -254,66 +247,47 @@ static void __i915_schedule(struct i915_sched_node *node, int prio)
 	 * end result is a topological list of requests in reverse order, the
 	 * last element in the list is the request we must execute first.
 	 */
-	list_for_each_entry(dep, &dfs, dfs_link) {
-		struct i915_sched_node *node = dep->signaler;
+	list_for_each_entry(rq, &dfs, sched.dfs) {
+		struct i915_dependency *p;
 
-		/* If we are already flying, we know we have no signalers */
-		if (node_started(node))
-			continue;
+		/* Also release any children on this engine that are ready */
+		GEM_BUG_ON(rq->engine != engine);
 
-		/*
-		 * Within an engine, there can be no cycle, but we may
-		 * refer to the same dependency chain multiple times
-		 * (redundant dependencies are not eliminated) and across
-		 * engines.
-		 */
-		list_for_each_entry(p, &node->signalers_list, signal_link) {
-			GEM_BUG_ON(p == dep); /* no cycles! */
+		for_each_signaler(p, rq) {
+			struct i915_request *s =
+				container_of(p->signaler, typeof(*s), sched);
 
-			if (node_signaled(p->signaler))
-				continue;
+			GEM_BUG_ON(s == rq);
 
-			if (prio > READ_ONCE(p->signaler->attr.priority))
-				list_move_tail(&p->dfs_link, &dfs);
-		}
-	}
+			if (rq_prio(s) >= prio)
+				continue;
 
-	/*
-	 * If we didn't need to bump any existing priorities, and we haven't
-	 * yet submitted this request (i.e. there is no potential race with
-	 * execlists_submit_request()), we can set our own priority and skip
-	 * acquiring the engine locks.
-	 */
-	if (node->attr.priority == I915_PRIORITY_INVALID) {
-		GEM_BUG_ON(!list_empty(&node->link));
-		node->attr.priority = prio;
+			if (i915_request_completed(s))
+				continue;
 
-		if (stack.dfs_link.next == stack.dfs_link.prev)
-			return;
+			if (s->engine != rq->engine) {
+				spin_lock(&ipi_lock);
+				if (prio > p->ipi_priority) {
+					p->ipi_priority = prio;
+					list_move(&p->ipi_link, &ipi_list);
+					irq_work_queue(&ipi_work);
+				}
+				spin_unlock(&ipi_lock);
+				continue;
+			}
 
-		__list_del_entry(&stack.dfs_link);
+			list_move_tail(&s->sched.dfs, &dfs);
+		}
 	}
 
-	memset(&cache, 0, sizeof(cache));
-	engine = node_to_request(node)->engine;
-	spin_lock(&engine->active.lock);
-
-	/* Fifo and depth-first replacement ensure our deps execute before us */
-	engine = sched_lock_engine(node, engine, &cache);
-	list_for_each_entry_safe_reverse(dep, p, &dfs, dfs_link) {
-		INIT_LIST_HEAD(&dep->dfs_link);
-
-		node = dep->signaler;
-		engine = sched_lock_engine(node, engine, &cache);
-		lockdep_assert_held(&engine->active.lock);
-
-		/* Recheck after acquiring the engine->timeline.lock */
-		if (prio <= node->attr.priority || node_signaled(node))
-			continue;
+	plist = i915_sched_lookup_priolist(engine, prio);
 
-		GEM_BUG_ON(node_to_request(node)->engine != engine);
+	/* Fifo and depth-first replacement ensure our deps execute first */
+	list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) {
+		GEM_BUG_ON(rq->engine != engine);
 
-		WRITE_ONCE(node->attr.priority, prio);
+		INIT_LIST_HEAD(&rq->sched.dfs);
+		WRITE_ONCE(rq->sched.attr.priority, prio);
 
 		/*
 		 * Once the request is ready, it will be placed into the
@@ -323,32 +297,48 @@ static void __i915_schedule(struct i915_sched_node *node, int prio)
 		 * any preemption required, be dealt with upon submission.
 		 * See engine->submit_request()
 		 */
-		if (list_empty(&node->link))
+		if (!i915_request_is_ready(rq))
 			continue;
 
-		if (i915_request_in_priority_queue(node_to_request(node))) {
-			if (!cache.priolist)
-				cache.priolist =
-					i915_sched_lookup_priolist(engine,
-								   prio);
-			list_move_tail(&node->link, cache.priolist);
-		}
+		if (i915_request_in_priority_queue(rq))
+			list_move_tail(&rq->sched.link, plist);
 
-		/* Defer (tasklet) submission until after all of our updates. */
-		kick_submission(engine, node_to_request(node), prio);
+		/* Defer (tasklet) submission until after all updates. */
+		kick_submission(engine, rq, prio);
 	}
-
-	spin_unlock(&engine->active.lock);
 }
 
 void i915_request_set_priority(struct i915_request *rq, int prio)
 {
-	if (!intel_engine_has_scheduler(rq->engine))
+	struct intel_engine_cs *engine = READ_ONCE(rq->engine);
+	unsigned long flags;
+
+	if (!intel_engine_has_scheduler(engine))
 		return;
 
-	spin_lock_irq(&schedule_lock);
-	__i915_schedule(&rq->sched, prio);
-	spin_unlock_irq(&schedule_lock);
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	spin_lock_irqsave(&engine->active.lock, flags);
+	while (engine != READ_ONCE(rq->engine)) {
+		spin_unlock(&engine->active.lock);
+		engine = READ_ONCE(rq->engine);
+		spin_lock(&engine->active.lock);
+	}
+
+	if (i915_request_completed(rq))
+		goto unlock;
+
+	if (prio <= rq_prio(rq))
+		goto unlock;
+
+	__i915_request_set_priority(rq, prio);
+
+unlock:
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
 void i915_sched_node_init(struct i915_sched_node *node)
@@ -358,6 +348,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 	INIT_LIST_HEAD(&node->signalers_list);
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
+	INIT_LIST_HEAD(&node->dfs);
 
 	i915_sched_node_reinit(node);
 }
@@ -396,7 +387,8 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 	spin_lock_irq(&signal->lock);
 
 	if (!node_signaled(signal)) {
-		INIT_LIST_HEAD(&dep->dfs_link);
+		INIT_LIST_HEAD(&dep->ipi_link);
+		dep->ipi_priority = I915_PRIORITY_INVALID;
 		dep->signaler = signal;
 		dep->waiter = node;
 		dep->flags = flags;
@@ -505,6 +497,12 @@ void i915_sched_node_retire(struct i915_sched_node *node)
 
 		GEM_BUG_ON(dep->signaler != node);
 
+		if (unlikely(!list_empty(&dep->ipi_link))) {
+			spin_lock(&ipi_lock);
+			list_del(&dep->ipi_link);
+			spin_unlock(&ipi_lock);
+		}
+
 		w = READ_ONCE(dep->waiter);
 		if (w) {
 			spin_lock_nested(&w->lock, SINGLE_DEPTH_NESTING);
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index 3246430eb1c1..ce60577df2bf 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -63,7 +63,8 @@ struct i915_sched_node {
 	spinlock_t lock; /* protect the lists */
 	struct list_head signalers_list; /* those before us, we depend upon */
 	struct list_head waiters_list; /* those after us, they depend upon us */
-	struct list_head link;
+	struct list_head link; /* guarded by engine->active.lock */
+	struct list_head dfs; /* guarded by engine->active.lock */
 	struct i915_sched_attr attr;
 	unsigned int flags;
 #define I915_SCHED_HAS_EXTERNAL_CHAIN	BIT(0)
@@ -75,11 +76,12 @@ struct i915_dependency {
 	struct i915_sched_node *waiter;
 	struct list_head signal_link;
 	struct list_head wait_link;
-	struct list_head dfs_link;
+	struct list_head ipi_link;
 	unsigned long flags;
 #define I915_DEPENDENCY_ALLOC		BIT(0)
 #define I915_DEPENDENCY_EXTERNAL	BIT(1)
 #define I915_DEPENDENCY_WEAK		BIT(2)
+	int ipi_priority;
 };
 
 #define for_each_waiter(p__, rq__) \
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 24/28] ipi-dag
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (21 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 23/28] drm/i915: Restructure priority inheritance Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Check for a completed last request once Chris Wilson
                   ` (9 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

---
 drivers/gpu/drm/i915/i915_scheduler.c | 18 +++++++++---------
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 320d3720ba34..4c189b81cc62 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -436,17 +436,17 @@ bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
 	struct i915_dependency *dep, *p;
 	struct i915_dependency stack;
 	bool result = false;
-	LIST_HEAD(dfs);
+	LIST_HEAD(ipi);
 
 	if (list_empty(&waiter->waiters_list))
 		return true;
 
-	spin_lock_irq(&schedule_lock);
+	spin_lock_irq(&ipi_lock);
 
 	stack.signaler = signaler;
-	list_add(&stack.dfs_link, &dfs);
+	list_add(&stack.ipi_link, &ipi);
 
-	list_for_each_entry(dep, &dfs, dfs_link) {
+	list_for_each_entry(dep, &ipi, ipi_link) {
 		struct i915_sched_node *node = dep->signaler;
 
 		if (node_signaled(node))
@@ -456,17 +456,17 @@ bool i915_sched_node_verify_dag(struct i915_sched_node *waiter,
 			if (p->signaler == waiter)
 				goto out;
 
-			if (list_empty(&p->dfs_link))
-				list_add_tail(&p->dfs_link, &dfs);
+			if (list_empty(&p->ipi_link))
+				list_add_tail(&p->ipi_link, &ipi);
 		}
 	}
 
 	result = true;
 out:
-	list_for_each_entry_safe(dep, p, &dfs, dfs_link)
-		INIT_LIST_HEAD(&dep->dfs_link);
+	list_for_each_entry_safe(dep, p, &ipi, ipi_link)
+		INIT_LIST_HEAD(&dep->ipi_link);
 
-	spin_unlock_irq(&schedule_lock);
+	spin_unlock_irq(&ipi_lock);
 
 	return result;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 25/28] drm/i915/gt: Check for a completed last request once
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (22 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 24/28] ipi-dag Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling Chris Wilson
                   ` (8 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Pull the repeated check for the last active request being completed to a
single spot, when deciding whether or not execlist preemption is
required.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 14 ++++----------
 1 file changed, 4 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 3fb1b4c67adb..f9c095c79874 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2132,12 +2132,11 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * completed and barf.
 	 */
 	if ((last = *active)) {
-		if (need_preempt(engine, last, ve)) {
-			if (i915_request_completed(last)) {
-				tasklet_hi_schedule(&execlists->tasklet);
-				return;
-			}
+		if (i915_request_completed(last) &&
+		    !list_is_last(&last->sched.link, &engine->active.requests))
+			return;
 
+		if (need_preempt(engine, last, ve)) {
 			ENGINE_TRACE(engine,
 				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
 				     last->fence.context,
@@ -2165,11 +2164,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			last = NULL;
 		} else if (need_timeslice(engine, last, ve) &&
 			   timeslice_expired(execlists, last)) {
-			if (i915_request_completed(last)) {
-				tasklet_hi_schedule(&execlists->tasklet);
-				return;
-			}
-
 			ENGINE_TRACE(engine,
 				     "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n",
 				     last->fence.context,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (23 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Check for a completed last request once Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-16  9:07   ` Thomas Hellström (Intel)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Specify a deadline for the heartbeat Chris Wilson
                   ` (7 subsequent siblings)
  32 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The first "scheduler" was a topographical sorting of requests into
priority order. The execution order was deterministic, the earliest
submitted, highest priority request would be executed first. Priority
inherited ensured that inversions were kept at bay, and allowed us to
dynamically boost priorities (e.g. for interactive pageflips).

The minimalistic timeslicing scheme was an attempt to introduce fairness
between long running requests, by evicting the active request at the end
of a timeslice and moving it to the back of its priority queue (while
ensuring that dependencies were kept in order). For short running
requests from many clients of equal priority, the scheme is still very
much FIFO submission ordering, and as unfair as before.

To impose fairness, we need an external metric that ensures that clients
are interpersed, we don't execute one long chain from client A before
executing any of client B. This could be imposed by the clients by using
a fences based on an external clock, that is they only submit work for a
"frame" at frame-interval, instead of submitting as much work as they
are able to. The standard SwapBuffers approach is akin to double
bufferring, where as one frame is being executed, the next is being
submitted, such that there is always a maximum of two frames per client
in the pipeline. Even this scheme exhibits unfairness under load as a
single client will execute two frames back to back before the next, and
with enough clients, deadlines will be missed.

The idea introduced by BFS/MuQSS is that fairness is introduced by
metering with an external clock. Every request, when it becomes ready to
execute is assigned a virtual deadline, and execution order is then
determined by earliest deadline. Priority is used as a hint, rather than
strict ordering, where high priority requests have earlier deadlines,
but not necessarily earlier than outstanding work. Thus work is executed
in order of 'readiness', with timeslicing to demote long running work.

The Achille's heel of this scheduler is its strong preference for
low-latency and favouring of new queues. Whereas it was easy to dominate
the old scheduler by flooding it with many requests over a short period
of time, the new scheduler can be dominated by a 'synchronous' client
that waits for each of its requests to complete before submitting the
next. As such a client has no history, it is always considered
ready-to-run and receives an earlier deadline than the long running
requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  12 +-
 .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   4 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  24 --
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 328 +++++++-----------
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  43 ++-
 .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   6 +-
 drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +-
 drivers/gpu/drm/i915/i915_request.h           |   4 +-
 drivers/gpu/drm/i915/i915_scheduler.c         | 322 ++++++++++++-----
 drivers/gpu/drm/i915/i915_scheduler.h         |  22 +-
 drivers/gpu/drm/i915/i915_scheduler_types.h   |  17 +
 .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
 drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
 .../gpu/drm/i915/selftests/i915_scheduler.c   |  49 +++
 16 files changed, 484 insertions(+), 362 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index d79307d790da..b99b3332467d 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -513,7 +513,6 @@ void intel_engine_init_execlists(struct intel_engine_cs *engine)
 	execlists->active =
 		memset(execlists->inflight, 0, sizeof(execlists->inflight));
 
-	execlists->queue_priority_hint = INT_MIN;
 	execlists->queue = RB_ROOT_CACHED;
 }
 
@@ -1188,14 +1187,15 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
 	}
 }
 
-static int print_sched_attr(const struct i915_sched_attr *attr,
-			    char *buf, int x, int len)
+static int print_sched(const struct i915_sched_node *node,
+		       char *buf, int x, int len)
 {
-	if (attr->priority == I915_PRIORITY_INVALID)
+	if (node->attr.priority == I915_PRIORITY_INVALID)
 		return x;
 
 	x += snprintf(buf + x, len - x,
-		      " prio=%d", attr->priority);
+		      " prio=%d, dl=%llu",
+		      node->attr.priority, node->deadline);
 
 	return x;
 }
@@ -1208,7 +1208,7 @@ static void print_request(struct drm_printer *m,
 	char buf[80] = "";
 	int x = 0;
 
-	x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf));
+	x = print_sched(&rq->sched, buf, x, sizeof(buf));
 
 	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",
 		   prefix,
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index 5251860e952d..ba778c7b5d2b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -214,6 +214,7 @@ int intel_engine_pulse(struct intel_engine_cs *engine)
 
 	__set_bit(I915_FENCE_FLAG_SENTINEL, &rq->fence.flags);
 	idle_pulse(engine, rq);
+	rq->sched.deadline = 0;
 
 	__i915_request_commit(rq);
 	__i915_request_queue(rq, &attr);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index d0a1078ef632..ac9c777a6592 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -188,6 +188,7 @@ static bool switch_to_kernel_context(struct intel_engine_cs *engine)
 	i915_request_add_active_barriers(rq);
 
 	/* Install ourselves as a preemption barrier */
+	rq->sched.deadline = 0;
 	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
 	if (likely(!__i915_request_commit(rq))) { /* engine should be idle! */
 		/*
@@ -248,9 +249,6 @@ static int __engine_park(struct intel_wakeref *wf)
 	intel_engine_park_heartbeat(engine);
 	intel_engine_disarm_breadcrumbs(engine);
 
-	/* Must be reset upon idling, or we may miss the busy wakeup. */
-	GEM_BUG_ON(engine->execlists.queue_priority_hint != INT_MIN);
-
 	if (engine->park)
 		engine->park(engine);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 48e111f16dc5..a3c60038244c 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -231,30 +231,6 @@ struct intel_engine_execlists {
 	 */
 	unsigned int port_mask;
 
-	/**
-	 * @switch_priority_hint: Second context priority.
-	 *
-	 * We submit multiple contexts to the HW simultaneously and would
-	 * like to occasionally switch between them to emulate timeslicing.
-	 * To know when timeslicing is suitable, we track the priority of
-	 * the context submitted second.
-	 */
-	int switch_priority_hint;
-
-	/**
-	 * @queue_priority_hint: Highest pending priority.
-	 *
-	 * When we add requests into the queue, or adjust the priority of
-	 * executing requests, we compute the maximum priority of those
-	 * pending requests. We can then use this value to determine if
-	 * we need to preempt the executing requests to service the queue.
-	 * However, since the we may have recorded the priority of an inflight
-	 * request we wanted to preempt but since completed, at the time of
-	 * dequeuing the priority hint may no longer may match the highest
-	 * available request priority.
-	 */
-	int queue_priority_hint;
-
 	/**
 	 * @queue: queue of requests, in priority lists
 	 */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index f9c095c79874..0678dbb9b9fc 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -200,7 +200,7 @@ struct virtual_engine {
 	 */
 	struct ve_node {
 		struct rb_node rb;
-		int prio;
+		u64 deadline;
 	} nodes[I915_NUM_ENGINES];
 
 	/*
@@ -411,12 +411,17 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 
 static inline int rq_prio(const struct i915_request *rq)
 {
-	return READ_ONCE(rq->sched.attr.priority);
+	return rq->sched.attr.priority;
 }
 
-static int effective_prio(const struct i915_request *rq)
+static inline u64 rq_deadline(const struct i915_request *rq)
 {
-	int prio = rq_prio(rq);
+	return rq->sched.deadline;
+}
+
+static u64 effective_deadline(const struct i915_request *rq)
+{
+	u64 deadline = rq_deadline(rq);
 
 	/*
 	 * If this request is special and must not be interrupted at any
@@ -427,27 +432,27 @@ static int effective_prio(const struct i915_request *rq)
 	 * nopreempt for as long as desired).
 	 */
 	if (i915_request_has_nopreempt(rq))
-		prio = I915_PRIORITY_UNPREEMPTABLE;
+		deadline = 0;
 
-	return prio;
+	return deadline;
 }
 
-static int queue_prio(const struct intel_engine_execlists *execlists)
+static u64 queue_deadline(const struct intel_engine_execlists *execlists)
 {
 	struct rb_node *rb;
 
 	rb = rb_first_cached(&execlists->queue);
 	if (!rb)
-		return INT_MIN;
+		return I915_DEADLINE_NEVER;
 
-	return to_priolist(rb)->priority;
+	return to_priolist(rb)->deadline;
 }
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq,
 				struct virtual_engine *ve)
 {
-	int last_prio;
+	u64 last_deadline;
 
 	if (!intel_engine_has_semaphores(engine))
 		return false;
@@ -470,16 +475,14 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * priority level: the task that is running should remain running
 	 * to preserve FIFO ordering of dependencies.
 	 */
-	last_prio = max(effective_prio(rq), I915_PRIORITY_NORMAL - 1);
-	if (engine->execlists.queue_priority_hint <= last_prio)
-		return false;
+	last_deadline = effective_deadline(rq);
 
 	/*
 	 * Check against the first request in ELSP[1], it will, thanks to the
 	 * power of PI, be the highest priority of that context.
 	 */
 	if (!list_is_last(&rq->sched.link, &engine->active.requests) &&
-	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
+	    rq_deadline(list_next_entry(rq, sched.link)) < last_deadline)
 		return true;
 
 	if (ve) {
@@ -491,7 +494,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 			rcu_read_lock();
 			next = READ_ONCE(ve->request);
 			if (next)
-				preempt = rq_prio(next) > last_prio;
+				preempt = rq_deadline(next) < last_deadline;
 			rcu_read_unlock();
 		}
 
@@ -509,7 +512,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same
 	 * context, it's priority would not exceed ELSP[0] aka last_prio.
 	 */
-	return queue_prio(&engine->execlists) > last_prio;
+	return queue_deadline(&engine->execlists) < last_deadline;
 }
 
 __maybe_unused static inline bool
@@ -526,7 +529,7 @@ assert_priority_queue(const struct i915_request *prev,
 	if (i915_request_is_active(prev))
 		return true;
 
-	return rq_prio(prev) >= rq_prio(next);
+	return rq_deadline(prev) <= rq_deadline(next);
 }
 
 /*
@@ -1096,22 +1099,30 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
-	int prio = I915_PRIORITY_INVALID;
+	u64 deadline = I915_DEADLINE_NEVER;
 
 	lockdep_assert_held(&engine->active.lock);
 
 	list_for_each_entry_safe_reverse(rq, rn,
 					 &engine->active.requests,
 					 sched.link) {
-		if (i915_request_completed(rq))
+		if (i915_request_completed(rq)) {
+			list_del_init(&rq->sched.link);
 			continue; /* XXX */
+		}
 
 		__i915_request_unsubmit(rq);
 
-		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-		if (rq_prio(rq) != prio) {
-			prio = rq_prio(rq);
-			pl = i915_sched_lookup_priolist(engine, prio);
+		if (i915_request_started(rq)) {
+			u64 deadline =
+				i915_scheduler_next_virtual_deadline(rq_prio(rq));
+			rq->sched.deadline = min(rq_deadline(rq), deadline);
+		}
+
+		if (rq_deadline(rq) != deadline) {
+			deadline = rq_deadline(rq);
+			pl = i915_sched_lookup_priolist(engine, deadline);
+
 		}
 		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
@@ -1546,14 +1557,14 @@ dump_port(char *buf, int buflen, const char *prefix, struct i915_request *rq)
 	if (!rq)
 		return "";
 
-	snprintf(buf, buflen, "%sccid:%x %llx:%lld%s prio %d",
+	snprintf(buf, buflen, "%sccid:%x %llx:%lld%s dl %llu",
 		 prefix,
 		 rq->context->lrc.ccid,
 		 rq->fence.context, rq->fence.seqno,
 		 i915_request_completed(rq) ? "!" :
 		 i915_request_started(rq) ? "*" :
 		 "",
-		 rq_prio(rq));
+		 rq_deadline(rq));
 
 	return buf;
 }
@@ -1863,7 +1874,9 @@ static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
 	intel_engine_transfer_stale_breadcrumbs(ve->siblings[0], &ve->context);
 }
 
-static void defer_request(struct i915_request *rq, struct list_head * const pl)
+static void defer_request(struct i915_request *rq,
+			  struct list_head * const pl,
+			  u64 deadline)
 {
 	LIST_HEAD(list);
 
@@ -1878,6 +1891,7 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl)
 		struct i915_dependency *p;
 
 		GEM_BUG_ON(i915_request_is_active(rq));
+		rq->sched.deadline = deadline;
 		list_move_tail(&rq->sched.link, pl);
 
 		for_each_waiter(p, rq) {
@@ -1900,10 +1914,9 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl)
 			if (!i915_request_is_ready(w))
 				continue;
 
-			if (rq_prio(w) < rq_prio(rq))
+			if (rq_deadline(w) > deadline)
 				continue;
 
-			GEM_BUG_ON(rq_prio(w) > rq_prio(rq));
 			list_move_tail(&w->sched.link, &list);
 		}
 
@@ -1914,46 +1927,21 @@ static void defer_request(struct i915_request *rq, struct list_head * const pl)
 static void defer_active(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq;
+	u64 deadline;
 
 	rq = __unwind_incomplete_requests(engine);
 	if (!rq)
 		return;
 
-	defer_request(rq, i915_sched_lookup_priolist(engine, rq_prio(rq)));
-}
-
-static bool
-need_timeslice(const struct intel_engine_cs *engine,
-	       const struct i915_request *rq,
-	       struct virtual_engine *ve)
-{
-	int hint;
-
-	if (!intel_engine_has_timeslices(engine))
-		return false;
-
-	hint = engine->execlists.queue_priority_hint;
-
-	if (ve) {
-		const struct intel_engine_cs *inflight =
-			intel_context_inflight(&ve->context);
-
-		if (!inflight || inflight == engine) {
-			struct i915_request *next;
-
-			rcu_read_lock();
-			next = READ_ONCE(ve->request);
-			if (next)
-				hint = max(hint, rq_prio(next));
-			rcu_read_unlock();
-		}
-	}
-
-	if (!list_is_last(&rq->sched.link, &engine->active.requests))
-		hint = max(hint, rq_prio(list_next_entry(rq, sched.link)));
+	deadline = max(rq_deadline(rq),
+		       i915_scheduler_next_virtual_deadline(rq_prio(rq)));
+	ENGINE_TRACE(engine, "defer %llx:%lld, dl:%llu -> %llu\n",
+		     rq->fence.context, rq->fence.seqno,
+		     rq_deadline(rq), deadline);
 
-	GEM_BUG_ON(hint >= I915_PRIORITY_UNPREEMPTABLE);
-	return hint >= effective_prio(rq);
+	defer_request(rq,
+		      i915_sched_lookup_priolist(engine, deadline),
+		      deadline);
 }
 
 static bool
@@ -1976,42 +1964,56 @@ timeslice_yield(const struct intel_engine_execlists *el,
 }
 
 static bool
-timeslice_expired(const struct intel_engine_execlists *el,
-		  const struct i915_request *rq)
+timeslice_expired(struct intel_engine_cs *engine, const struct i915_request *rq)
 {
-	return timer_expired(&el->timer) || timeslice_yield(el, rq);
-}
+	const struct intel_engine_execlists *el = &engine->execlists;
 
-static int
-switch_prio(struct intel_engine_cs *engine, const struct i915_request *rq)
-{
-	if (list_is_last(&rq->sched.link, &engine->active.requests))
-		return engine->execlists.queue_priority_hint;
+	if (!intel_engine_has_timeslices(engine))
+		return false;
+
+	if (i915_request_has_nopreempt(rq) && i915_request_started(rq))
+		return false;
 
-	return rq_prio(list_next_entry(rq, sched.link));
+	return timer_expired(&el->timer) || timeslice_yield(el, rq);
 }
 
-static inline unsigned long
-timeslice(const struct intel_engine_cs *engine)
+static unsigned long timeslice(const struct intel_engine_cs *engine)
 {
 	return READ_ONCE(engine->props.timeslice_duration_ms);
 }
 
-static unsigned long active_timeslice(const struct intel_engine_cs *engine)
+static bool needs_timeslice(const struct intel_engine_cs *engine,
+			    const struct i915_request *rq)
 {
-	const struct intel_engine_execlists *execlists = &engine->execlists;
-	const struct i915_request *rq = *execlists->active;
-
+	/* If not currently active, or about to switch, wait for next event */
 	if (!rq || i915_request_completed(rq))
-		return 0;
+		return false;
+
+	/* We do not need to start the timeslice until after the ACK */
+	if (READ_ONCE(engine->execlists.pending[0]))
+		return false;
+
+	/* If ELSP[1] is occupied, always check to see if worth slicing */
+	if (!list_is_last(&rq->sched.link, &engine->active.requests))
+		return true;
+
+	/* Otherwise, ELSP[0] is by itself, but may be waiting in the queue */
+	if (rb_first_cached(&engine->execlists.queue))
+		return true;
 
-	if (READ_ONCE(execlists->switch_priority_hint) < effective_prio(rq))
+	return rb_first_cached(&engine->execlists.virtual);
+}
+
+static unsigned long active_timeslice(const struct intel_engine_cs *engine)
+{
+	/* Disable the timer if there is nothing to switch to */
+	if (!needs_timeslice(engine, execlists_active(&engine->execlists)))
 		return 0;
 
 	return timeslice(engine);
 }
 
-static void set_timeslice(struct intel_engine_cs *engine)
+static void start_timeslice(struct intel_engine_cs *engine)
 {
 	unsigned long duration;
 
@@ -2024,29 +2026,6 @@ static void set_timeslice(struct intel_engine_cs *engine)
 	set_timer_ms(&engine->execlists.timer, duration);
 }
 
-static void start_timeslice(struct intel_engine_cs *engine, int prio)
-{
-	struct intel_engine_execlists *execlists = &engine->execlists;
-	unsigned long duration;
-
-	if (!intel_engine_has_timeslices(engine))
-		return;
-
-	WRITE_ONCE(execlists->switch_priority_hint, prio);
-	if (prio == INT_MIN)
-		return;
-
-	if (timer_pending(&execlists->timer))
-		return;
-
-	duration = timeslice(engine);
-	ENGINE_TRACE(engine,
-		     "start timeslicing, prio:%d, interval:%lu",
-		     prio, duration);
-
-	set_timer_ms(&execlists->timer, duration);
-}
-
 static void record_preemption(struct intel_engine_execlists *execlists)
 {
 	(void)I915_SELFTEST_ONLY(execlists->preempt_hang.count++);
@@ -2138,11 +2117,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		if (need_preempt(engine, last, ve)) {
 			ENGINE_TRACE(engine,
-				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
+				     "preempting last=%llx:%llu, dl=%llu\n",
 				     last->fence.context,
 				     last->fence.seqno,
-				     last->sched.attr.priority,
-				     execlists->queue_priority_hint);
+				     rq_deadline(last));
 			record_preemption(execlists);
 
 			/*
@@ -2162,14 +2140,13 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			__unwind_incomplete_requests(engine);
 
 			last = NULL;
-		} else if (need_timeslice(engine, last, ve) &&
-			   timeslice_expired(execlists, last)) {
+		} else if (timeslice_expired(engine, last)) {
 			ENGINE_TRACE(engine,
-				     "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n",
-				     last->fence.context,
-				     last->fence.seqno,
-				     last->sched.attr.priority,
-				     execlists->queue_priority_hint,
+				     "expired:%s last=%llx:%llu, deadline=%llu, now=%llu, yield?=%s\n",
+				     yesno(timer_expired(&execlists->timer)),
+				     last->fence.context, last->fence.seqno,
+				     rq_deadline(last),
+				     i915_sched_to_ticks(ktime_get()),
 				     yesno(timeslice_yield(execlists, last)));
 
 			ring_set_paused(engine, 1);
@@ -2205,7 +2182,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * Even if ELSP[1] is occupied and not worthy
 				 * of timeslices, our queue might be.
 				 */
-				start_timeslice(engine, queue_prio(execlists));
 				return;
 			}
 		}
@@ -2224,7 +2200,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		GEM_BUG_ON(rq->engine != &ve->base);
 		GEM_BUG_ON(rq->context != &ve->context);
 
-		if (unlikely(rq_prio(rq) < queue_prio(execlists))) {
+		if (unlikely(rq_deadline(rq) > queue_deadline(execlists))) {
 			spin_unlock(&ve->base.active.lock);
 			break;
 		}
@@ -2233,7 +2209,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 		if (last && !can_merge_rq(last, rq)) {
 			spin_unlock(&ve->base.active.lock);
-			start_timeslice(engine, rq_prio(rq));
 			return; /* leave this for another sibling */
 		}
 
@@ -2245,10 +2220,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			     i915_request_started(rq) ? "*" :
 			     "",
 			     yesno(engine != ve->siblings[0]));
-
 		WRITE_ONCE(ve->request, NULL);
-		WRITE_ONCE(ve->base.execlists.queue_priority_hint,
-			   INT_MIN);
 
 		rb = &ve->nodes[engine->id].rb;
 		rb_erase_cached(rb, &execlists->virtual);
@@ -2391,28 +2363,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	}
 
 done:
-	/*
-	 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer.
-	 *
-	 * We choose the priority hint such that if we add a request of greater
-	 * priority than this, we kick the submission tasklet to decide on
-	 * the right order of submitting the requests to hardware. We must
-	 * also be prepared to reorder requests as they are in-flight on the
-	 * HW. We derive the priority hint then as the first "hole" in
-	 * the HW submission ports and if there are no available slots,
-	 * the priority of the lowest executing request, i.e. last.
-	 *
-	 * When we do receive a higher priority request ready to run from the
-	 * user, see queue_request(), the priority hint is bumped to that
-	 * request triggering preemption on the next dequeue (or subsequent
-	 * interrupt for secondary ports).
-	 */
-	execlists->queue_priority_hint = queue_prio(execlists);
-
 	if (submit) {
 		*port = execlists_schedule_in(last, port - execlists->pending);
-		execlists->switch_priority_hint =
-			switch_prio(engine, *execlists->pending);
 
 		/*
 		 * Skip if we ended up with exactly the same set of requests,
@@ -2432,7 +2384,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		set_preempt_timeout(engine, *active);
 		execlists_submit_ports(engine);
 	} else {
-		start_timeslice(engine, execlists->queue_priority_hint);
 skip_submit:
 		ring_set_paused(engine, 0);
 	}
@@ -2675,7 +2626,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	} while (head != tail);
 
 	execlists->csb_head = head;
-	set_timeslice(engine);
 
 	/*
 	 * Gen11 has proven to fail wrt global observation point between
@@ -2824,9 +2774,10 @@ static bool hold_request(const struct i915_request *rq)
 	return result;
 }
 
-static void __execlists_unhold(struct i915_request *rq)
+static bool __execlists_unhold(struct i915_request *rq)
 {
 	LIST_HEAD(list);
+	bool submit = false;
 
 	do {
 		struct i915_dependency *p;
@@ -2837,10 +2788,7 @@ static void __execlists_unhold(struct i915_request *rq)
 		GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
 
 		i915_request_clear_hold(rq);
-		list_move_tail(&rq->sched.link,
-			       i915_sched_lookup_priolist(rq->engine,
-							  rq_prio(rq)));
-		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+		submit |= intel_engine_queue_request(rq->engine, rq);
 
 		/* Also release any children on this engine that are ready */
 		for_each_waiter(p, rq) {
@@ -2869,6 +2817,8 @@ static void __execlists_unhold(struct i915_request *rq)
 
 		rq = list_first_entry_or_null(&list, typeof(*rq), sched.link);
 	} while (rq);
+
+	return submit;
 }
 
 static void execlists_unhold(struct intel_engine_cs *engine,
@@ -2880,12 +2830,8 @@ static void execlists_unhold(struct intel_engine_cs *engine,
 	 * Move this request back to the priority queue, and all of its
 	 * children and grandchildren that were suspended along with it.
 	 */
-	__execlists_unhold(rq);
-
-	if (rq_prio(rq) > engine->execlists.queue_priority_hint) {
-		engine->execlists.queue_priority_hint = rq_prio(rq);
+	if (__execlists_unhold(rq))
 		tasklet_hi_schedule(&engine->execlists.tasklet);
-	}
 
 	spin_unlock_irq(&engine->active.lock);
 }
@@ -3127,6 +3073,8 @@ static void execlists_submission_tasklet(unsigned long data)
 		if (unlikely(timeout && preempt_timeout(engine)))
 			execlists_reset(engine, "preemption time out");
 	}
+
+	start_timeslice(engine);
 }
 
 static void __execlists_kick(struct intel_engine_execlists *execlists)
@@ -3148,15 +3096,6 @@ static void execlists_preempt(struct timer_list *timer)
 	execlists_kick(timer, preempt);
 }
 
-static void queue_request(struct intel_engine_cs *engine,
-			  struct i915_request *rq)
-{
-	GEM_BUG_ON(!list_empty(&rq->sched.link));
-	list_add_tail(&rq->sched.link,
-		      i915_sched_lookup_priolist(engine, rq_prio(rq)));
-	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
-}
-
 static void __submit_queue_imm(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -3167,18 +3106,6 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 	__execlists_submission_tasklet(engine);
 }
 
-static void submit_queue(struct intel_engine_cs *engine,
-			 const struct i915_request *rq)
-{
-	struct intel_engine_execlists *execlists = &engine->execlists;
-
-	if (rq_prio(rq) <= execlists->queue_priority_hint)
-		return;
-
-	execlists->queue_priority_hint = rq_prio(rq);
-	__submit_queue_imm(engine);
-}
-
 static bool ancestor_on_hold(const struct intel_engine_cs *engine,
 			     const struct i915_request *rq)
 {
@@ -3213,12 +3140,9 @@ static void execlists_submit_request(struct i915_request *request)
 		list_add_tail(&request->sched.link, &engine->active.hold);
 		i915_request_set_hold(request);
 	} else {
-		queue_request(engine, request);
-
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
-		GEM_BUG_ON(list_empty(&request->sched.link));
-
-		submit_queue(engine, request);
+		if (intel_engine_queue_request(engine, request))
+			__submit_queue_imm(engine);
+		start_timeslice(engine);
 	}
 
 	spin_unlock_irqrestore(&engine->active.lock, flags);
@@ -4273,10 +4197,6 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled)
 
 static void nop_submission_tasklet(unsigned long data)
 {
-	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-
-	/* The driver is wedged; don't process any more events. */
-	WRITE_ONCE(engine->execlists.queue_priority_hint, INT_MIN);
 }
 
 static void execlists_reset_cancel(struct intel_engine_cs *engine)
@@ -4322,6 +4242,7 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 		rb_erase_cached(&p->node, &execlists->queue);
 		i915_priolist_free(p);
 	}
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&execlists->queue.rb_root));
 
 	/* On-hold requests will be flushed to timeline upon their release */
 	list_for_each_entry(rq, &engine->active.hold, sched.link)
@@ -4343,17 +4264,12 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 			rq->engine = engine;
 			__i915_request_submit(rq);
 			i915_request_put(rq);
-
-			ve->base.execlists.queue_priority_hint = INT_MIN;
 		}
 		spin_unlock(&ve->base.active.lock);
 	}
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	execlists->queue_priority_hint = INT_MIN;
-	execlists->queue = RB_ROOT_CACHED;
-
 	GEM_BUG_ON(__tasklet_is_enabled(&execlists->tasklet));
 	execlists->tasklet.func = nop_submission_tasklet;
 
@@ -5449,7 +5365,8 @@ static const struct intel_context_ops virtual_context_ops = {
 	.destroy = virtual_context_destroy,
 };
 
-static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
+static intel_engine_mask_t
+virtual_submission_mask(struct virtual_engine *ve, u64 *deadline)
 {
 	struct i915_request *rq;
 	intel_engine_mask_t mask;
@@ -5466,9 +5383,11 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
 		mask = ve->siblings[0]->mask;
 	}
 
-	ENGINE_TRACE(&ve->base, "rq=%llx:%lld, mask=%x, prio=%d\n",
+	*deadline = rq_deadline(rq);
+
+	ENGINE_TRACE(&ve->base, "rq=%llx:%llu, mask=%x, dl=%llu\n",
 		     rq->fence.context, rq->fence.seqno,
-		     mask, ve->base.execlists.queue_priority_hint);
+		     mask, *deadline);
 
 	return mask;
 }
@@ -5476,12 +5395,12 @@ static intel_engine_mask_t virtual_submission_mask(struct virtual_engine *ve)
 static void virtual_submission_tasklet(unsigned long data)
 {
 	struct virtual_engine * const ve = (struct virtual_engine *)data;
-	const int prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
 	intel_engine_mask_t mask;
+	u64 deadline;
 	unsigned int n;
 
 	rcu_read_lock();
-	mask = virtual_submission_mask(ve);
+	mask = virtual_submission_mask(ve, &deadline);
 	rcu_read_unlock();
 	if (unlikely(!mask))
 		return;
@@ -5514,7 +5433,8 @@ static void virtual_submission_tasklet(unsigned long data)
 			 */
 			first = rb_first_cached(&sibling->execlists.virtual) ==
 				&node->rb;
-			if (prio == node->prio || (prio > node->prio && first))
+			if (deadline == node->deadline ||
+			    (deadline < node->deadline && first))
 				goto submit_engine;
 
 			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
@@ -5528,7 +5448,7 @@ static void virtual_submission_tasklet(unsigned long data)
 
 			rb = *parent;
 			other = rb_entry(rb, typeof(*other), rb);
-			if (prio > other->prio) {
+			if (deadline < other->deadline) {
 				parent = &rb->rb_left;
 			} else {
 				parent = &rb->rb_right;
@@ -5543,8 +5463,8 @@ static void virtual_submission_tasklet(unsigned long data)
 
 submit_engine:
 		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
-		node->prio = prio;
-		if (first && prio > sibling->execlists.queue_priority_hint)
+		node->deadline = deadline;
+		if (first)
 			tasklet_hi_schedule(&sibling->execlists.tasklet);
 
 unlock_engine:
@@ -5578,11 +5498,11 @@ static void virtual_submit_request(struct i915_request *rq)
 
 	if (i915_request_completed(rq)) {
 		__i915_request_submit(rq);
-
-		ve->base.execlists.queue_priority_hint = INT_MIN;
 		ve->request = NULL;
 	} else {
-		ve->base.execlists.queue_priority_hint = rq_prio(rq);
+		rq->sched.deadline =
+			min(rq->sched.deadline,
+			    i915_scheduler_next_virtual_deadline(rq_prio(rq)));
 		ve->request = i915_request_get(rq);
 
 		GEM_BUG_ON(!list_empty(virtual_queue(ve)));
@@ -5686,7 +5606,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	ve->base.bond_execute = virtual_bond_execute;
 
 	INIT_LIST_HEAD(virtual_queue(ve));
-	ve->base.execlists.queue_priority_hint = INT_MIN;
 	tasklet_init(&ve->base.execlists.tasklet,
 		     virtual_submission_tasklet,
 		     (unsigned long)ve);
@@ -5873,13 +5792,6 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\tE ");
 	}
 
-	if (execlists->switch_priority_hint != INT_MIN)
-		drm_printf(m, "\t\tSwitch priority hint: %d\n",
-			   READ_ONCE(execlists->switch_priority_hint));
-	if (execlists->queue_priority_hint != INT_MIN)
-		drm_printf(m, "\t\tQueue priority hint: %d\n",
-			   READ_ONCE(execlists->queue_priority_hint));
-
 	last = NULL;
 	count = 0;
 	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index afa4f88035ac..01fca8acd4c4 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -879,7 +879,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
 					break;
 				}
 
-				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
+				/* With deadlines, no strict priority */
+				i915_request_set_deadline(rq, 0);
+
+				if (i915_request_wait(rq, 0, HZ / 2) < 0) {
 					struct drm_printer p =
 						drm_info_printer(gt->i915->drm.dev);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 052dcc59fcc5..b18276cf30ed 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -85,6 +85,9 @@ static int wait_for_submit(struct intel_engine_cs *engine,
 			   struct i915_request *rq,
 			   unsigned long timeout)
 {
+	/* Ignore our own attempts to suppress excess tasklets */
+	tasklet_hi_schedule(&engine->execlists.tasklet);
+
 	timeout += jiffies;
 	do {
 		bool done = time_after(jiffies, timeout);
@@ -754,7 +757,7 @@ semaphore_queue(struct intel_engine_cs *engine, struct i915_vma *vma, int idx)
 static int
 release_queue(struct intel_engine_cs *engine,
 	      struct i915_vma *vma,
-	      int idx, int prio)
+	      int idx, u64 deadline)
 {
 	struct i915_request *rq;
 	u32 *cs;
@@ -779,10 +782,7 @@ release_queue(struct intel_engine_cs *engine,
 	i915_request_get(rq);
 	i915_request_add(rq);
 
-	local_bh_disable();
-	i915_request_set_priority(rq, prio);
-	local_bh_enable(); /* kick tasklet */
-
+	i915_request_set_deadline(rq, deadline);
 	i915_request_put(rq);
 
 	return 0;
@@ -796,6 +796,7 @@ slice_semaphore_queue(struct intel_engine_cs *outer,
 	struct intel_engine_cs *engine;
 	struct i915_request *head;
 	enum intel_engine_id id;
+	long timeout;
 	int err, i, n = 0;
 
 	head = semaphore_queue(outer, vma, n++);
@@ -816,12 +817,16 @@ slice_semaphore_queue(struct intel_engine_cs *outer,
 		}
 	}
 
-	err = release_queue(outer, vma, n, I915_PRIORITY_BARRIER);
+	err = release_queue(outer, vma, n, 0);
 	if (err)
 		goto out;
 
-	if (i915_request_wait(head, 0,
-			      2 * RUNTIME_INFO(outer->i915)->num_engines * (count + 2) * (count + 3)) < 0) {
+	/* Expected number of pessimal slices required */
+	timeout = RUNTIME_INFO(outer->i915)->num_engines * (count + 2) * (count + 3);
+	timeout *= 4; /* safety factor, including bucketing */
+	timeout += HZ / 2; /* and include the request completion */
+
+	if (i915_request_wait(head, 0, timeout) < 0) {
 		pr_err("Failed to slice along semaphore chain of length (%d, %d)!\n",
 		       count, n);
 		GEM_TRACE_DUMP();
@@ -926,6 +931,8 @@ create_rewinder(struct intel_context *ce,
 		err = i915_request_await_dma_fence(rq, &wait->fence);
 		if (err)
 			goto err;
+
+		i915_request_set_deadline(rq, rq_deadline(wait));
 	}
 
 	cs = intel_ring_begin(rq, 14);
@@ -1200,7 +1207,7 @@ static int live_timeslice_queue(void *arg)
 			err = PTR_ERR(rq);
 			goto err_heartbeat;
 		}
-		i915_request_set_priority(rq, I915_PRIORITY_MAX);
+		i915_request_set_deadline(rq, 0);
 		err = wait_for_submit(engine, rq, HZ / 2);
 		if (err) {
 			pr_err("%s: Timed out trying to submit semaphores\n",
@@ -1223,10 +1230,9 @@ static int live_timeslice_queue(void *arg)
 		}
 
 		GEM_BUG_ON(i915_request_completed(rq));
-		GEM_BUG_ON(execlists_active(&engine->execlists) != rq);
 
 		/* Queue: semaphore signal, matching priority as semaphore */
-		err = release_queue(engine, vma, 1, effective_prio(rq));
+		err = release_queue(engine, vma, 1, effective_deadline(rq));
 		if (err)
 			goto err_rq;
 
@@ -1326,7 +1332,7 @@ static int live_timeslice_nopreempt(void *arg)
 
 		ce = intel_context_create(engine);
 		if (IS_ERR(ce)) {
-			err = PTR_ERR(rq);
+			err = PTR_ERR(ce);
 			goto out_spin;
 		}
 
@@ -1337,6 +1343,7 @@ static int live_timeslice_nopreempt(void *arg)
 			goto out_spin;
 		}
 
+		rq->sched.deadline = 0;
 		rq->sched.attr.priority = I915_PRIORITY_BARRIER;
 		i915_request_get(rq);
 		i915_request_add(rq);
@@ -1709,6 +1716,7 @@ static int live_late_preempt(void *arg)
 
 	/* Make sure ctx_lo stays before ctx_hi until we trigger preemption. */
 	ctx_lo->sched.priority = 1;
+	ctx_hi->sched.priority = I915_PRIORITY_MIN;
 
 	for_each_engine(engine, gt, id) {
 		struct igt_live_test t;
@@ -2648,6 +2656,9 @@ static int live_preempt_gang(void *arg)
 			struct i915_request *n =
 				list_next_entry(rq, client_link);
 
+			/* With deadlines, no strict priority ordering */
+			i915_request_set_deadline(rq, 0);
+
 			if (err == 0 && i915_request_wait(rq, 0, HZ / 5) < 0) {
 				struct drm_printer p =
 					drm_info_printer(engine->i915->drm.dev);
@@ -2869,7 +2880,7 @@ static int preempt_user(struct intel_engine_cs *engine,
 	i915_request_get(rq);
 	i915_request_add(rq);
 
-	i915_request_set_priority(rq, I915_PRIORITY_MAX);
+	i915_request_set_deadline(rq, 0);
 
 	if (i915_request_wait(rq, 0, HZ / 2) < 0)
 		err = -ETIME;
@@ -4402,6 +4413,7 @@ static int emit_semaphore_signal(struct intel_context *ce, void *slot)
 
 	intel_ring_advance(rq, cs);
 
+	rq->sched.deadline = 0;
 	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
 	i915_request_add(rq);
 	return 0;
@@ -4911,6 +4923,10 @@ static int __live_lrc_gpr(struct intel_engine_cs *engine,
 		err = emit_semaphore_signal(engine->kernel_context, slot);
 		if (err)
 			goto err_rq;
+
+		err = wait_for_submit(engine, rq, HZ / 2);
+		if (err)
+			goto err_rq;
 	} else {
 		slot[0] = 1;
 		wmb();
@@ -5468,6 +5484,7 @@ static int poison_registers(struct intel_context *ce, u32 poison, u32 *sema)
 
 	intel_ring_advance(rq, cs);
 
+	rq->sched.deadline = 0;
 	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
 err_rq:
 	i915_request_add(rq);
diff --git a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
index 0c42e8b0c211..6da465c7c4f5 100644
--- a/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/gt/uc/intel_guc_submission.c
@@ -333,8 +333,6 @@ static void __guc_dequeue(struct intel_engine_cs *engine)
 		i915_priolist_free(p);
 	}
 done:
-	execlists->queue_priority_hint =
-		rb ? to_priolist(rb)->priority : INT_MIN;
 	if (submit) {
 		*port = schedule_in(last, port - execlists->inflight);
 		*++port = NULL;
@@ -473,12 +471,10 @@ static void guc_reset_cancel(struct intel_engine_cs *engine)
 		rb_erase_cached(&p->node, &execlists->queue);
 		i915_priolist_free(p);
 	}
+	GEM_BUG_ON(!RB_EMPTY_ROOT(&execlists->queue.rb_root));
 
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
-	execlists->queue_priority_hint = INT_MIN;
-	execlists->queue = RB_ROOT_CACHED;
-
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index bc2fa84f98a8..43a0ac45295f 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -22,6 +22,8 @@ enum {
 
 	/* Interactive workload, scheduled for immediate pageflipping */
 	I915_PRIORITY_DISPLAY,
+
+	__I915_PRIORITY_KERNEL__
 };
 
 /* Smallest priority value that cannot be bumped. */
@@ -35,13 +37,12 @@ enum {
  * i.e. nothing can have higher priority and force us to usurp the
  * active request.
  */
-#define I915_PRIORITY_UNPREEMPTABLE INT_MAX
-#define I915_PRIORITY_BARRIER (I915_PRIORITY_UNPREEMPTABLE - 1)
+#define I915_PRIORITY_BARRIER INT_MAX
 
 struct i915_priolist {
 	struct list_head requests;
 	struct rb_node node;
-	int priority;
+	u64 deadline;
 };
 
 #endif /* _I915_PRIOLIST_TYPES_H_ */
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 118ab6650d1f..23594e712292 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -561,7 +561,7 @@ static inline void i915_request_clear_hold(struct i915_request *rq)
 }
 
 static inline struct intel_timeline *
-i915_request_timeline(struct i915_request *rq)
+i915_request_timeline(const struct i915_request *rq)
 {
 	/* Valid only while the request is being constructed (or retired). */
 	return rcu_dereference_protected(rq->timeline,
@@ -576,7 +576,7 @@ i915_request_gem_context(struct i915_request *rq)
 }
 
 static inline struct intel_timeline *
-i915_request_active_timeline(struct i915_request *rq)
+i915_request_active_timeline(const struct i915_request *rq)
 {
 	/*
 	 * When in use during submission, we are protected by a guarantee that
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 4c189b81cc62..30bcb6f9d99f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -20,6 +20,11 @@ static struct i915_global_scheduler {
 static DEFINE_SPINLOCK(ipi_lock);
 static LIST_HEAD(ipi_list);
 
+static inline u64 rq_deadline(const struct i915_request *rq)
+{
+	return READ_ONCE(rq->sched.deadline);
+}
+
 static inline int rq_prio(const struct i915_request *rq)
 {
 	return READ_ONCE(rq->sched.attr.priority);
@@ -32,6 +37,7 @@ static void ipi_schedule(struct irq_work *wrk)
 		struct i915_dependency *p;
 		struct i915_request *rq;
 		unsigned long flags;
+		u64 deadline;
 		int prio;
 
 		spin_lock_irqsave(&ipi_lock, flags);
@@ -40,7 +46,10 @@ static void ipi_schedule(struct irq_work *wrk)
 			rq = container_of(p->signaler, typeof(*rq), sched);
 			list_del_init(&p->ipi_link);
 
+			deadline = p->ipi_deadline;
 			prio = p->ipi_priority;
+
+			p->ipi_deadline = I915_DEADLINE_NEVER;
 			p->ipi_priority = I915_PRIORITY_INVALID;
 		}
 		spin_unlock_irqrestore(&ipi_lock, flags);
@@ -52,6 +61,8 @@ static void ipi_schedule(struct irq_work *wrk)
 
 		if (prio > rq_prio(rq))
 			i915_request_set_priority(rq, prio);
+		if (deadline < rq_deadline(rq))
+			i915_request_set_deadline(rq, deadline);
 	} while (1);
 	rcu_read_unlock();
 }
@@ -79,28 +90,8 @@ static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 	return rb_entry(rb, struct i915_priolist, node);
 }
 
-static void assert_priolists(struct intel_engine_execlists * const execlists)
-{
-	struct rb_node *rb;
-	long last_prio;
-
-	if (!IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
-		return;
-
-	GEM_BUG_ON(rb_first_cached(&execlists->queue) !=
-		   rb_first(&execlists->queue.rb_root));
-
-	last_prio = INT_MAX;
-	for (rb = rb_first_cached(&execlists->queue); rb; rb = rb_next(rb)) {
-		const struct i915_priolist *p = to_priolist(rb);
-
-		GEM_BUG_ON(p->priority > last_prio);
-		last_prio = p->priority;
-	}
-}
-
 struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
+i915_sched_lookup_priolist(struct intel_engine_cs *engine, u64 deadline)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_priolist *p;
@@ -108,10 +99,9 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	bool first = true;
 
 	lockdep_assert_held(&engine->active.lock);
-	assert_priolists(execlists);
 
 	if (unlikely(execlists->no_priolist))
-		prio = I915_PRIORITY_NORMAL;
+		deadline = 0;
 
 find_priolist:
 	/* most positive priority is scheduled first, equal priorities fifo */
@@ -120,9 +110,9 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	while (*parent) {
 		rb = *parent;
 		p = to_priolist(rb);
-		if (prio > p->priority) {
+		if (deadline < p->deadline) {
 			parent = &rb->rb_left;
-		} else if (prio < p->priority) {
+		} else if (deadline > p->deadline) {
 			parent = &rb->rb_right;
 			first = false;
 		} else {
@@ -130,13 +120,13 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 		}
 	}
 
-	if (prio == I915_PRIORITY_NORMAL) {
+	if (!deadline) {
 		p = &execlists->default_priolist;
 	} else {
 		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
 		/* Convert an allocation failure to a priority bump */
 		if (unlikely(!p)) {
-			prio = I915_PRIORITY_NORMAL; /* recurses just once */
+			deadline = 0; /* recurses just once */
 
 			/* To maintain ordering with all rendering, after an
 			 * allocation failure we have to disable all scheduling.
@@ -151,7 +141,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 		}
 	}
 
-	p->priority = prio;
+	p->deadline = deadline;
 	INIT_LIST_HEAD(&p->requests);
 
 	rb_link_node(&p->node, rb, parent);
@@ -160,70 +150,221 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	return &p->requests;
 }
 
-void __i915_priolist_free(struct i915_priolist *p)
+void i915_priolist_free(struct i915_priolist *p)
+{
+	if (p->deadline)
+		kmem_cache_free(global.slab_priorities, p);
+}
+
+static bool kick_submission(const struct intel_engine_cs *engine, u64 deadline)
 {
-	kmem_cache_free(global.slab_priorities, p);
+	const struct i915_request *inflight;
+	bool kick = true;
+
+	rcu_read_lock();
+	inflight = execlists_active(&engine->execlists);
+	if (inflight)
+		kick = deadline < rq_deadline(inflight);
+	rcu_read_unlock();
+
+	return kick;
+}
+
+static bool __i915_request_set_deadline(struct i915_request *rq, u64 deadline)
+{
+	struct intel_engine_cs *engine = rq->engine;
+	struct i915_request *rn;
+	struct list_head *plist;
+	LIST_HEAD(dfs);
+
+	lockdep_assert_held(&engine->active.lock);
+	list_add(&rq->sched.dfs, &dfs);
+
+	list_for_each_entry(rq, &dfs, sched.dfs) {
+		struct i915_dependency *p;
+
+		GEM_BUG_ON(rq->engine != engine);
+
+		for_each_signaler(p, rq) {
+			struct i915_request *s =
+				container_of(p->signaler, typeof(*s), sched);
+
+			GEM_BUG_ON(s == rq);
+
+			if (rq_deadline(s) <= deadline)
+				continue;
+
+			if (i915_request_completed(s))
+				continue;
+
+			if (s->engine != rq->engine) {
+				spin_lock(&ipi_lock);
+				if (deadline < p->ipi_deadline) {
+					p->ipi_deadline = deadline;
+					list_move(&p->ipi_link, &ipi_list);
+					irq_work_queue(&ipi_work);
+				}
+				spin_unlock(&ipi_lock);
+				continue;
+			}
+
+			list_move_tail(&s->sched.dfs, &dfs);
+		}
+	}
+
+	plist = i915_sched_lookup_priolist(engine, deadline);
+
+	/* Fifo and depth-first replacement ensure our deps execute first */
+	list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) {
+		GEM_BUG_ON(rq->engine != engine);
+		GEM_BUG_ON(deadline > rq_deadline(rq));
+
+		INIT_LIST_HEAD(&rq->sched.dfs);
+		WRITE_ONCE(rq->sched.deadline, deadline);
+		RQ_TRACE(rq, "set-deadline:%llu\n", deadline);
+
+		/*
+		 * Once the request is ready, it will be placed into the
+		 * priority lists and then onto the HW runlist. Before the
+		 * request is ready, it does not contribute to our preemption
+		 * decisions and we can safely ignore it, as it will, and
+		 * any preemption required, be dealt with upon submission.
+		 * See engine->submit_request()
+		 */
+
+		if (i915_request_in_priority_queue(rq))
+			list_move_tail(&rq->sched.link, plist);
+	}
+
+	return kick_submission(engine, deadline);
 }
 
-static inline bool need_preempt(int prio, int active)
+void i915_request_set_deadline(struct i915_request *rq, u64 deadline)
 {
+	struct intel_engine_cs *engine = READ_ONCE(rq->engine);
+	unsigned long flags;
+
+	if (!intel_engine_has_scheduler(engine))
+		return;
+
 	/*
-	 * Allow preemption of low -> normal -> high, but we do
-	 * not allow low priority tasks to preempt other low priority
-	 * tasks under the impression that latency for low priority
-	 * tasks does not matter (as much as background throughput),
-	 * so kiss.
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
 	 */
-	return prio >= max(I915_PRIORITY_NORMAL, active);
+	spin_lock_irqsave(&engine->active.lock, flags);
+	while (engine != READ_ONCE(rq->engine)) {
+		spin_unlock(&engine->active.lock);
+		engine = READ_ONCE(rq->engine);
+		spin_lock(&engine->active.lock);
+	}
+
+	if (i915_request_completed(rq))
+		goto unlock;
+
+	if (deadline >= rq_deadline(rq))
+		goto unlock;
+
+	if (__i915_request_set_deadline(rq, deadline))
+		tasklet_hi_schedule(&engine->execlists.tasklet);
+
+unlock:
+	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
-static void kick_submission(struct intel_engine_cs *engine,
-			    const struct i915_request *rq,
-			    int prio)
+static u64 prio_slice(int prio)
 {
-	const struct i915_request *inflight;
+	u64 slice;
+	int sf;
 
 	/*
-	 * We only need to kick the tasklet once for the high priority
-	 * new context we add into the queue.
+	 * With a 1ms scheduling quantum:
+	 *
+	 *   MAX USER:  ~32us deadline
+	 *   0:         ~16ms deadline
+	 *   MIN_USER: 1000ms deadline
 	 */
-	if (prio <= engine->execlists.queue_priority_hint)
-		return;
 
-	rcu_read_lock();
+	if (prio >= __I915_PRIORITY_KERNEL__)
+		return INT_MAX - prio;
 
-	/* Nothing currently active? We're overdue for a submission! */
-	inflight = execlists_active(&engine->execlists);
-	if (!inflight)
-		goto unlock;
+	slice = __I915_PRIORITY_KERNEL__ - prio;
+	if (prio >= 0)
+		sf = 20 - 6;
+	else
+		sf = 20 - 1;
+
+	return slice << sf;
+}
+
+u64 i915_scheduler_virtual_deadline(u64 kt, int priority)
+{
+	return i915_sched_to_ticks(kt + prio_slice(priority));
+}
+
+u64 i915_scheduler_next_virtual_deadline(int priority)
+{
+	return i915_scheduler_virtual_deadline(ktime_get(), priority);
+}
+
+static u64 signal_deadline(const struct i915_request *rq)
+{
+	u64 last = ktime_to_ns(ktime_get());
+	const struct i915_dependency *p;
 
 	/*
-	 * If we are already the currently executing context, don't
-	 * bother evaluating if we should preempt ourselves.
+	 * Find the earliest point at which we will become 'ready',
+	 * which we infer from the deadline of all active signalers.
+	 * We will position ourselves at the end of that chain of work.
 	 */
-	if (inflight->context == rq->context)
-		goto unlock;
 
-	ENGINE_TRACE(engine,
-		     "bumping queue-priority-hint:%d for rq:%llx:%lld, inflight:%llx:%lld prio %d\n",
-		     prio,
-		     rq->fence.context, rq->fence.seqno,
-		     inflight->fence.context, inflight->fence.seqno,
-		     inflight->sched.attr.priority);
+	rcu_read_lock();
+	for_each_signaler(p, rq) {
+		const struct i915_request *s =
+			container_of(p->signaler, typeof(*s), sched);
+		u64 deadline;
 
-	engine->execlists.queue_priority_hint = prio;
-	if (need_preempt(prio, rq_prio(inflight)))
-		tasklet_hi_schedule(&engine->execlists.tasklet);
+		if (i915_request_completed(s))
+			continue;
 
-unlock:
+		if (rq_prio(s) < rq_prio(rq))
+			continue;
+
+		deadline = i915_sched_to_ns(rq_deadline(s));
+		if (p->flags & I915_DEPENDENCY_WEAK)
+			deadline -= prio_slice(rq_prio(s));
+
+		last = max(last, deadline);
+	}
 	rcu_read_unlock();
+
+	return last;
 }
 
-static void __i915_request_set_priority(struct i915_request *rq, int prio)
+static u64 earliest_deadline(const struct i915_request *rq)
+{
+	return i915_scheduler_virtual_deadline(signal_deadline(rq),
+					       rq_prio(rq));
+}
+
+static bool set_earliest_deadline(struct i915_request *rq, u64 old)
+{
+	u64 dl;
+
+	/* Recompute our deadlines and promote after a priority change */
+	dl = min(earliest_deadline(rq), rq_deadline(rq));
+	if (dl >= old)
+		return false;
+
+	return __i915_request_set_deadline(rq, dl);
+}
+
+static bool __i915_request_set_priority(struct i915_request *rq, int prio)
 {
 	struct intel_engine_cs *engine = rq->engine;
 	struct i915_request *rn;
-	struct list_head *plist;
+	bool kick = false;
 	LIST_HEAD(dfs);
 
 	lockdep_assert_held(&engine->active.lock);
@@ -280,32 +421,20 @@ static void __i915_request_set_priority(struct i915_request *rq, int prio)
 		}
 	}
 
-	plist = i915_sched_lookup_priolist(engine, prio);
-
-	/* Fifo and depth-first replacement ensure our deps execute first */
 	list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) {
 		GEM_BUG_ON(rq->engine != engine);
+		GEM_BUG_ON(prio < rq_prio(rq));
 
 		INIT_LIST_HEAD(&rq->sched.dfs);
 		WRITE_ONCE(rq->sched.attr.priority, prio);
+		RQ_TRACE(rq, "set-priority:%d\n", prio);
 
-		/*
-		 * Once the request is ready, it will be placed into the
-		 * priority lists and then onto the HW runlist. Before the
-		 * request is ready, it does not contribute to our preemption
-		 * decisions and we can safely ignore it, as it will, and
-		 * any preemption required, be dealt with upon submission.
-		 * See engine->submit_request()
-		 */
-		if (!i915_request_is_ready(rq))
-			continue;
-
-		if (i915_request_in_priority_queue(rq))
-			list_move_tail(&rq->sched.link, plist);
-
-		/* Defer (tasklet) submission until after all updates. */
-		kick_submission(engine, rq, prio);
+		if (i915_request_is_ready(rq) &&
+		    set_earliest_deadline(rq, rq_deadline(rq)))
+			kick = true;
 	}
+
+	return kick;
 }
 
 void i915_request_set_priority(struct i915_request *rq, int prio)
@@ -316,12 +445,6 @@ void i915_request_set_priority(struct i915_request *rq, int prio)
 	if (!intel_engine_has_scheduler(engine))
 		return;
 
-	/*
-	 * Virtual engines complicate acquiring the engine timeline lock,
-	 * as their rq->engine pointer is not stable until under that
-	 * engine lock. The simple ploy we use is to take the lock then
-	 * check that the rq still belongs to the newly locked engine.
-	 */
 	spin_lock_irqsave(&engine->active.lock, flags);
 	while (engine != READ_ONCE(rq->engine)) {
 		spin_unlock(&engine->active.lock);
@@ -335,12 +458,21 @@ void i915_request_set_priority(struct i915_request *rq, int prio)
 	if (prio <= rq_prio(rq))
 		goto unlock;
 
-	__i915_request_set_priority(rq, prio);
+	if (__i915_request_set_priority(rq, prio))
+		tasklet_hi_schedule(&engine->execlists.tasklet);
 
 unlock:
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
+bool intel_engine_queue_request(struct intel_engine_cs *engine,
+				struct i915_request *rq)
+{
+	lockdep_assert_held(&engine->active.lock);
+	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	return set_earliest_deadline(rq, I915_DEADLINE_NEVER);
+}
+
 void i915_sched_node_init(struct i915_sched_node *node)
 {
 	spin_lock_init(&node->lock);
@@ -356,6 +488,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 void i915_sched_node_reinit(struct i915_sched_node *node)
 {
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->deadline = I915_DEADLINE_NEVER;
 	node->semaphores = 0;
 	node->flags = 0;
 
@@ -388,6 +521,7 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 
 	if (!node_signaled(signal)) {
 		INIT_LIST_HEAD(&dep->ipi_link);
+		dep->ipi_deadline = I915_DEADLINE_NEVER;
 		dep->ipi_priority = I915_PRIORITY_INVALID;
 		dep->signaler = signal;
 		dep->waiter = node;
@@ -519,6 +653,10 @@ void i915_sched_node_retire(struct i915_sched_node *node)
 	spin_unlock_irq(&node->lock);
 }
 
+#if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
+#include "selftests/i915_scheduler.c"
+#endif
+
 static void i915_global_scheduler_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index b26a13ef6feb..62265108230f 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -37,15 +37,27 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 void i915_sched_node_retire(struct i915_sched_node *node);
 
 void i915_request_set_priority(struct i915_request *request, int prio);
+void i915_request_set_deadline(struct i915_request *request, u64 deadline);
+
+u64 i915_scheduler_virtual_deadline(u64 kt, int priority);
+u64 i915_scheduler_next_virtual_deadline(int priority);
+
+bool intel_engine_queue_request(struct intel_engine_cs *engine,
+				struct i915_request *rq);
 
 struct list_head *
-i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
+i915_sched_lookup_priolist(struct intel_engine_cs *engine, u64 deadline);
+
+void i915_priolist_free(struct i915_priolist *p);
+
+static inline u64 i915_sched_to_ticks(ktime_t kt)
+{
+	return ktime_to_ns(kt) >> I915_SCHED_DEADLINE_SHIFT;
+}
 
-void __i915_priolist_free(struct i915_priolist *p);
-static inline void i915_priolist_free(struct i915_priolist *p)
+static inline u64 i915_sched_to_ns(u64 deadline)
 {
-	if (p->priority != I915_PRIORITY_NORMAL)
-		__i915_priolist_free(p);
+	return deadline << I915_SCHED_DEADLINE_SHIFT;
 }
 
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index ce60577df2bf..ae7ca78a88c8 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -69,6 +69,22 @@ struct i915_sched_node {
 	unsigned int flags;
 #define I915_SCHED_HAS_EXTERNAL_CHAIN	BIT(0)
 	intel_engine_mask_t semaphores;
+
+	/**
+	 * @deadline: [virtual] deadline
+	 *
+	 * When the request is ready for execution, it is given a quota
+	 * (the engine's timeslice) and a virtual deadline. The virtual
+	 * deadline is derived from the current time:
+	 *     ktime_get() + (prio_ratio * timeslice)
+	 *
+	 * Requests are then executed in order of deadline completion.
+	 * Requests with earlier deadlines than currently executing on
+	 * the engine will preempt the active requests.
+	 */
+	u64 deadline;
+#define I915_SCHED_DEADLINE_SHIFT 19 /* i.e. roughly 500us buckets */
+#define I915_DEADLINE_NEVER U64_MAX
 };
 
 struct i915_dependency {
@@ -81,6 +97,7 @@ struct i915_dependency {
 #define I915_DEPENDENCY_ALLOC		BIT(0)
 #define I915_DEPENDENCY_EXTERNAL	BIT(1)
 #define I915_DEPENDENCY_WEAK		BIT(2)
+	u64 ipi_deadline;
 	int ipi_priority;
 };
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
index 1929feba4e8e..29ff6b669cc2 100644
--- a/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
+++ b/drivers/gpu/drm/i915/selftests/i915_mock_selftests.h
@@ -24,6 +24,7 @@ selftest(uncore, intel_uncore_mock_selftests)
 selftest(engine, intel_engine_cs_mock_selftests)
 selftest(timelines, intel_timeline_mock_selftests)
 selftest(requests, i915_request_mock_selftests)
+selftest(scheduler, i915_scheduler_mock_selftests)
 selftest(objects, i915_gem_object_mock_selftests)
 selftest(phys, i915_gem_phys_mock_selftests)
 selftest(dmabuf, i915_gem_dmabuf_mock_selftests)
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 92c628f18c60..db91e639918e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -2124,6 +2124,7 @@ static int measure_preemption(struct intel_context *ce)
 
 		intel_ring_advance(rq, cs);
 		rq->sched.attr.priority = I915_PRIORITY_BARRIER;
+		rq->sched.deadline = 0;
 
 		elapsed[i - 1] = ENGINE_READ_FW(ce->engine, RING_TIMESTAMP);
 		i915_request_add(rq);
diff --git a/drivers/gpu/drm/i915/selftests/i915_scheduler.c b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
new file mode 100644
index 000000000000..9ca50db81034
--- /dev/null
+++ b/drivers/gpu/drm/i915/selftests/i915_scheduler.c
@@ -0,0 +1,49 @@
+// SPDX-License-Identifier: MIT
+/*
+ * Copyright © 2020 Intel Corporation
+ */
+
+#include "i915_selftest.h"
+
+static int mock_scheduler_slices(void *dummy)
+{
+	u64 min, max, normal, kernel;
+
+	min = prio_slice(I915_PRIORITY_MIN);
+	pr_info("%8s slice: %lluus\n", "min", min >> 10);
+
+	normal = prio_slice(0);
+	pr_info("%8s slice: %lluus\n", "normal", normal >> 10);
+
+	max = prio_slice(I915_PRIORITY_MAX);
+	pr_info("%8s slice: %lluus\n", "max", max >> 10);
+
+	kernel = prio_slice(I915_PRIORITY_BARRIER);
+	pr_info("%8s slice: %lluus\n", "kernel", kernel >> 10);
+
+	if (kernel != 0) {
+		pr_err("kernel prio slice should be 0\n");
+		return -EINVAL;
+	}
+
+	if (max >= normal) {
+		pr_err("maximum prio slice should be shorter than normal\n");
+		return -EINVAL;
+	}
+
+	if (min <= normal) {
+		pr_err("minimum prio slice should be longer than normal\n");
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+int i915_scheduler_mock_selftests(void)
+{
+	static const struct i915_subtest tests[] = {
+		SUBTEST(mock_scheduler_slices),
+	};
+
+	return i915_subtests(tests, NULL);
+}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 27/28] drm/i915/gt: Specify a deadline for the heartbeat
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (24 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 28/28] drm/i915: Replace the priority boosting for the display with a deadline Chris Wilson
                   ` (6 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

As we know when we expect the heartbeat to be checked for completion,
pass this information along as its deadline. We still do not complain if
the deadline is missed, at least until we have tried a few times, but it
will allow for quicker hang detection on systems where deadlines are
adhered to.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c | 13 +++++++++++++
 1 file changed, 13 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index ba778c7b5d2b..4e5b8146bee0 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -42,6 +42,16 @@ static void idle_pulse(struct intel_engine_cs *engine, struct i915_request *rq)
 	i915_request_add_active_barriers(rq);
 }
 
+static void set_heartbeat_deadline(struct intel_engine_cs *engine,
+				   struct i915_request *rq)
+{
+	unsigned long interval;
+
+	interval = READ_ONCE(engine->props.heartbeat_interval_ms);
+	if (interval)
+		i915_request_set_deadline(rq, ktime_get() + (interval << 20));
+}
+
 static void show_heartbeat(const struct i915_request *rq,
 			   struct intel_engine_cs *engine)
 {
@@ -103,6 +113,8 @@ static void heartbeat(struct work_struct *wrk)
 
 			local_bh_disable();
 			i915_request_set_priority(rq, attr.priority);
+			if (attr.priority == I915_PRIORITY_BARRIER)
+				i915_request_set_deadline(rq, 0);
 			local_bh_enable();
 		} else {
 			if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
@@ -133,6 +145,7 @@ static void heartbeat(struct work_struct *wrk)
 
 	__i915_request_commit(rq);
 	__i915_request_queue(rq, &attr);
+	set_heartbeat_deadline(engine, rq);
 
 unlock:
 	mutex_unlock(&ce->timeline->mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 28/28] drm/i915: Replace the priority boosting for the display with a deadline
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (25 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Specify a deadline for the heartbeat Chris Wilson
@ 2020-06-07 22:21 ` Chris Wilson
  2020-06-07 22:49 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation Patchwork
                   ` (5 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-07 22:21 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

For a modeset/pageflip, there is a very precise deadline by which the
frame must be completed in order to hit the vblank and be shown. While
we don't pass along that exact information, we can at least inform the
scheduler that this request-chain needs to be completed asap.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/display/intel_display.c |  2 +-
 drivers/gpu/drm/i915/gem/i915_gem_object.h   |  4 ++--
 drivers/gpu/drm/i915/gem/i915_gem_wait.c     | 21 ++++++++++----------
 drivers/gpu/drm/i915/i915_priolist_types.h   |  3 ---
 4 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/display/intel_display.c b/drivers/gpu/drm/i915/display/intel_display.c
index 797e3573d392..5bb20f701a44 100644
--- a/drivers/gpu/drm/i915/display/intel_display.c
+++ b/drivers/gpu/drm/i915/display/intel_display.c
@@ -15964,7 +15964,7 @@ intel_prepare_plane_fb(struct drm_plane *_plane,
 	if (ret)
 		return ret;
 
-	i915_gem_object_wait_priority(obj, 0, I915_PRIORITY_DISPLAY);
+	i915_gem_object_wait_deadline(obj, 0, ktime_get() /* next vblank? */);
 	i915_gem_object_flush_frontbuffer(obj, ORIGIN_DIRTYFB);
 
 	if (!new_plane_state->uapi.fence) { /* implicit fencing */
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_object.h b/drivers/gpu/drm/i915/gem/i915_gem_object.h
index 876c34982555..7bcd2661de4c 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_object.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_object.h
@@ -474,9 +474,9 @@ static inline void __start_cpu_write(struct drm_i915_gem_object *obj)
 int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 			 unsigned int flags,
 			 long timeout);
-int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
+int i915_gem_object_wait_deadline(struct drm_i915_gem_object *obj,
 				  unsigned int flags,
-				  int prio);
+				  ktime_t deadline);
 
 void __i915_gem_object_flush_frontbuffer(struct drm_i915_gem_object *obj,
 					 enum fb_op_origin origin);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_wait.c b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
index cefbbb3d9b52..5224d4363ea3 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_wait.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_wait.c
@@ -93,17 +93,18 @@ i915_gem_object_wait_reservation(struct dma_resv *resv,
 	return timeout;
 }
 
-static void __fence_set_priority(struct dma_fence *fence, int prio)
+static void __fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 {
 	if (dma_fence_is_signaled(fence) || !dma_fence_is_i915(fence))
 		return;
 
 	local_bh_disable();
-	i915_request_set_priority(to_request(fence), prio);
+	i915_request_set_deadline(to_request(fence),
+				  i915_sched_to_ticks(deadline));
 	local_bh_enable(); /* kick the tasklets if queues were reprioritised */
 }
 
-static void fence_set_priority(struct dma_fence *fence, int prio)
+static void fence_set_deadline(struct dma_fence *fence, ktime_t deadline)
 {
 	/* Recurse once into a fence-array */
 	if (dma_fence_is_array(fence)) {
@@ -111,16 +112,16 @@ static void fence_set_priority(struct dma_fence *fence, int prio)
 		int i;
 
 		for (i = 0; i < array->num_fences; i++)
-			__fence_set_priority(array->fences[i], prio);
+			__fence_set_deadline(array->fences[i], deadline);
 	} else {
-		__fence_set_priority(fence, prio);
+		__fence_set_deadline(fence, deadline);
 	}
 }
 
 int
-i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
+i915_gem_object_wait_deadline(struct drm_i915_gem_object *obj,
 			      unsigned int flags,
-			      int prio)
+			      ktime_t deadline)
 {
 	struct dma_fence *excl;
 
@@ -130,12 +131,12 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 		int ret;
 
 		ret = dma_resv_get_fences_rcu(obj->base.resv,
-							&excl, &count, &shared);
+					      &excl, &count, &shared);
 		if (ret)
 			return ret;
 
 		for (i = 0; i < count; i++) {
-			fence_set_priority(shared[i], prio);
+			fence_set_deadline(shared[i], deadline);
 			dma_fence_put(shared[i]);
 		}
 
@@ -145,7 +146,7 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 	}
 
 	if (excl) {
-		fence_set_priority(excl, prio);
+		fence_set_deadline(excl, deadline);
 		dma_fence_put(excl);
 	}
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_priolist_types.h b/drivers/gpu/drm/i915/i915_priolist_types.h
index 43a0ac45295f..ac6d9614ea23 100644
--- a/drivers/gpu/drm/i915/i915_priolist_types.h
+++ b/drivers/gpu/drm/i915/i915_priolist_types.h
@@ -20,9 +20,6 @@ enum {
 	/* A preemptive pulse used to monitor the health of each engine */
 	I915_PRIORITY_HEARTBEAT,
 
-	/* Interactive workload, scheduled for immediate pageflipping */
-	I915_PRIORITY_DISPLAY,
-
 	__I915_PRIORITY_KERNEL__
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (26 preceding siblings ...)
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 28/28] drm/i915: Replace the priority boosting for the display with a deadline Chris Wilson
@ 2020-06-07 22:49 ` Patchwork
  2020-06-07 22:51 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (4 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-06-07 22:49 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
URL   : https://patchwork.freedesktop.org/series/78103/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
f23f25433228 drm/i915: Adjust the sentinel assert to match implementation
b94a954251d4 drm/i915/selftests: Make the hanging request non-preemptible
854873908649 drm/i915/selftests: Teach hang-self to target only itself
f2f769eafd3d drm/i915/selftests: Remove live_suppress_wait_preempt
a4fea0425495 drm/i915/selftests: Trim execlists runtime
7253a1d1220b drm/i915/gt: Use virtual_engine during execlists_dequeue
61f3b28c89b9 drm/i915/gt: Decouple inflight virtual engines
7e90a16cc7ce drm/i915/gt: Resubmit the virtual engine on schedule-out
85be36e07c69 drm/i915: Add list_for_each_entry_safe_continue_reverse
-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'pos' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'n' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

-:21: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#21: FILE: drivers/gpu/drm/i915/i915_utils.h:269:
+#define list_for_each_entry_safe_continue_reverse(pos, n, head, member)	\
+	for (pos = list_prev_entry(pos, member),			\
+	     n = list_prev_entry(pos, member);				\
+	     &pos->member != (head);					\
+	     pos = n, n = list_prev_entry(n, member))

total: 0 errors, 0 warnings, 3 checks, 12 lines checked
3fa210c03e00 drm/i915/gem: Separate reloc validation into an earlier step
6e281d4c956c drm/i915/gem: Lift GPU relocation allocation
06166bdaf5a2 drm/i915/gem: Build the reloc request first
b03d9c429a48 drm/i915/gem: Add all GPU reloc awaits/signals en masse
78cb8746679e dma-buf: Proxy fence, an unsignaled fence placeholder
-:45: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#45: 
new file mode 100644

-:438: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#438: FILE: drivers/dma-buf/st-dma-fence-proxy.c:20:
+	spinlock_t lock;

total: 0 errors, 1 warnings, 1 checks, 1158 lines checked
0cf43f808ad7 drm/i915: Lift waiter/signaler iterators
54c54ac54e10 drm/i915: Unpeel awaits on a proxy fence
097cde1f158a drm/i915/gem: Make relocations atomic within execbuf
50d452bcc3eb drm/i915: Strip out internal priorities
40a48b3a4cd5 drm/i915: Remove I915_USER_PRIORITY_SHIFT
e82c48f763c2 drm/i915: Replace engine->schedule() with a known request operation
95ddc88b38ef drm/i915/gt: Do not suspend bonded requests if one hangs
f9a12aff6156 drm/i915: Teach the i915_dependency to use a double-lock
2084c4ff504b drm/i915: Restructure priority inheritance
36de84a8bd2b ipi-dag
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

-:55: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 43 lines checked
956673f1ee95 drm/i915/gt: Check for a completed last request once
62d3790caaf1 drm/i915: Fair low-latency scheduling
-:1728: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#1728: 
new file mode 100644

total: 0 errors, 1 warnings, 0 checks, 1571 lines checked
29754bd08fd1 drm/i915/gt: Specify a deadline for the heartbeat
1dad2fcca4f7 drm/i915: Replace the priority boosting for the display with a deadline

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (27 preceding siblings ...)
  2020-06-07 22:49 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation Patchwork
@ 2020-06-07 22:51 ` Patchwork
  2020-06-07 23:12 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
                   ` (3 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-06-07 22:51 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
URL   : https://patchwork.freedesktop.org/series/78103/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
+drivers/gpu/drm/i915/intel_wakeref.c:137:19: warning: context imbalance in 'wakeref_auto_timeout' - unexpected unlock
+drivers/gpu/drm/i915/selftests/i915_syncmap.c:80:54: warning: dubious: x | !y

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (28 preceding siblings ...)
  2020-06-07 22:51 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-06-07 23:12 ` Patchwork
  2020-06-08  0:58 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
                   ` (2 subsequent siblings)
  32 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-06-07 23:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
URL   : https://patchwork.freedesktop.org/series/78103/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8597 -> Patchwork_17902
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/index.html

New tests
---------

  New tests have been introduced between CI_DRM_8597 and Patchwork_17902:

### New IGT tests (1) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 40 pass(s)
    - Exec time: [0.03, 0.11] s

  

Known issues
------------

  Here are the changes found in Patchwork_17902 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_busy@basic@flip:
    - fi-kbl-x1275:       [PASS][1] -> [DMESG-WARN][2] ([i915#62] / [i915#92] / [i915#95])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-kbl-x1275/igt@kms_busy@basic@flip.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-kbl-x1275/igt@kms_busy@basic@flip.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-bsw-n3050:       [PASS][3] -> [DMESG-WARN][4] ([i915#1982])
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-bsw-n3050/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_cursor_legacy@basic-flip-after-cursor-legacy:
    - fi-icl-u2:          [PASS][5] -> [DMESG-WARN][6] ([i915#1982])
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-icl-u2/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-icl-u2/igt@kms_cursor_legacy@basic-flip-after-cursor-legacy.html

  
#### Possible fixes ####

  * igt@i915_module_load@reload:
    - {fi-tgl-dsi}:       [DMESG-WARN][7] ([i915#1982]) -> [PASS][8] +2 similar issues
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-tgl-dsi/igt@i915_module_load@reload.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-tgl-dsi/igt@i915_module_load@reload.html
    - fi-icl-y:           [DMESG-WARN][9] ([i915#1982]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-icl-y/igt@i915_module_load@reload.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-icl-y/igt@i915_module_load@reload.html

  * igt@i915_pm_rpm@module-reload:
    - fi-icl-guc:         [DMESG-WARN][11] ([i915#1982]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-icl-guc/igt@i915_pm_rpm@module-reload.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-icl-guc/igt@i915_pm_rpm@module-reload.html

  
#### Warnings ####

  * igt@gem_exec_suspend@basic-s0:
    - fi-kbl-x1275:       [DMESG-WARN][13] ([i915#62] / [i915#92]) -> [DMESG-WARN][14] ([i915#62] / [i915#92] / [i915#95]) +2 similar issues
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-kbl-x1275/igt@gem_exec_suspend@basic-s0.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-kbl-x1275/igt@gem_exec_suspend@basic-s0.html

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-x1275:       [SKIP][15] ([fdo#109271]) -> [DMESG-FAIL][16] ([i915#62])
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html

  * igt@kms_force_connector_basic@prune-stale-modes:
    - fi-kbl-x1275:       [DMESG-WARN][17] ([i915#62] / [i915#92] / [i915#95]) -> [DMESG-WARN][18] ([i915#62] / [i915#92]) +6 similar issues
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/fi-kbl-x1275/igt@kms_force_connector_basic@prune-stale-modes.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/fi-kbl-x1275/igt@kms_force_connector_basic@prune-stale-modes.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#92]: https://gitlab.freedesktop.org/drm/intel/issues/92
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (48 -> 42)
------------------------------

  Missing    (6): fi-ilk-m540 fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8597 -> Patchwork_17902

  CI-20190529: 20190529
  CI_DRM_8597: aadd3cf12a7c515bca8752da797ded56a003617b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5696: 8d1744239f4300eb12d5bab14a30b79d9c8dd364 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17902: 1dad2fcca4f707aa870be1a45bb28bfb4c2b0f73 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

1dad2fcca4f7 drm/i915: Replace the priority boosting for the display with a deadline
29754bd08fd1 drm/i915/gt: Specify a deadline for the heartbeat
62d3790caaf1 drm/i915: Fair low-latency scheduling
956673f1ee95 drm/i915/gt: Check for a completed last request once
36de84a8bd2b ipi-dag
2084c4ff504b drm/i915: Restructure priority inheritance
f9a12aff6156 drm/i915: Teach the i915_dependency to use a double-lock
95ddc88b38ef drm/i915/gt: Do not suspend bonded requests if one hangs
e82c48f763c2 drm/i915: Replace engine->schedule() with a known request operation
40a48b3a4cd5 drm/i915: Remove I915_USER_PRIORITY_SHIFT
50d452bcc3eb drm/i915: Strip out internal priorities
097cde1f158a drm/i915/gem: Make relocations atomic within execbuf
54c54ac54e10 drm/i915: Unpeel awaits on a proxy fence
0cf43f808ad7 drm/i915: Lift waiter/signaler iterators
78cb8746679e dma-buf: Proxy fence, an unsignaled fence placeholder
b03d9c429a48 drm/i915/gem: Add all GPU reloc awaits/signals en masse
06166bdaf5a2 drm/i915/gem: Build the reloc request first
6e281d4c956c drm/i915/gem: Lift GPU relocation allocation
3fa210c03e00 drm/i915/gem: Separate reloc validation into an earlier step
85be36e07c69 drm/i915: Add list_for_each_entry_safe_continue_reverse
7e90a16cc7ce drm/i915/gt: Resubmit the virtual engine on schedule-out
61f3b28c89b9 drm/i915/gt: Decouple inflight virtual engines
7253a1d1220b drm/i915/gt: Use virtual_engine during execlists_dequeue
a4fea0425495 drm/i915/selftests: Trim execlists runtime
f2f769eafd3d drm/i915/selftests: Remove live_suppress_wait_preempt
854873908649 drm/i915/selftests: Teach hang-self to target only itself
b94a954251d4 drm/i915/selftests: Make the hanging request non-preemptible
f23f25433228 drm/i915: Adjust the sentinel assert to match implementation

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (29 preceding siblings ...)
  2020-06-07 23:12 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-06-08  0:58 ` Patchwork
  2020-06-08  7:44 ` [Intel-gfx] [PATCH 01/28] " Tvrtko Ursulin
  2020-06-08 20:43 ` Mika Kuoppala
  32 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-06-08  0:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation
URL   : https://patchwork.freedesktop.org/series/78103/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8597_full -> Patchwork_17902_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17902_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17902_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17902_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_ctx_exec@basic-nohangcheck:
    - shard-glk:          [PASS][1] -> [FAIL][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk1/igt@gem_ctx_exec@basic-nohangcheck.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk2/igt@gem_ctx_exec@basic-nohangcheck.html
    - shard-tglb:         [PASS][3] -> [FAIL][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb8/igt@gem_ctx_exec@basic-nohangcheck.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb8/igt@gem_ctx_exec@basic-nohangcheck.html
    - shard-iclb:         [PASS][5] -> [FAIL][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb5/igt@gem_ctx_exec@basic-nohangcheck.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb8/igt@gem_ctx_exec@basic-nohangcheck.html
    - shard-skl:          [PASS][7] -> [FAIL][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl7/igt@gem_ctx_exec@basic-nohangcheck.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl9/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@gem_exec_balancer@smoke:
    - shard-iclb:         [PASS][9] -> [TIMEOUT][10] +1 similar issue
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb6/igt@gem_exec_balancer@smoke.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb2/igt@gem_exec_balancer@smoke.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-tglb:         [PASS][11] -> [INCOMPLETE][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb1/igt@i915_module_load@reload-with-fault-injection.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb2/igt@i915_module_load@reload-with-fault-injection.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@gem_exec_params@invalid-batch-start-offset}:
    - shard-iclb:         [PASS][13] -> [TIMEOUT][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb6/igt@gem_exec_params@invalid-batch-start-offset.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb2/igt@gem_exec_params@invalid-batch-start-offset.html

  * {igt@gem_exec_schedule@preempt-engines@bcs0}:
    - shard-kbl:          [PASS][15] -> [INCOMPLETE][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl7/igt@gem_exec_schedule@preempt-engines@bcs0.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl2/igt@gem_exec_schedule@preempt-engines@bcs0.html

  * {igt@gem_exec_schedule@preempt-engines@rcs0}:
    - shard-skl:          [PASS][17] -> [INCOMPLETE][18]
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl7/igt@gem_exec_schedule@preempt-engines@rcs0.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl9/igt@gem_exec_schedule@preempt-engines@rcs0.html

  * {igt@gem_exec_schedule@reorder-wide@bcs0}:
    - shard-skl:          [PASS][19] -> [FAIL][20] +3 similar issues
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl3/igt@gem_exec_schedule@reorder-wide@bcs0.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl5/igt@gem_exec_schedule@reorder-wide@bcs0.html

  * {igt@gem_exec_schedule@reorder-wide@rcs0}:
    - shard-apl:          [PASS][21] -> [FAIL][22] +3 similar issues
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl2/igt@gem_exec_schedule@reorder-wide@rcs0.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl1/igt@gem_exec_schedule@reorder-wide@rcs0.html
    - shard-glk:          [PASS][23] -> [FAIL][24] +3 similar issues
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk5/igt@gem_exec_schedule@reorder-wide@rcs0.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk5/igt@gem_exec_schedule@reorder-wide@rcs0.html

  * {igt@gem_exec_schedule@reorder-wide@vcs0}:
    - shard-iclb:         [PASS][25] -> [FAIL][26] +4 similar issues
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb1/igt@gem_exec_schedule@reorder-wide@vcs0.html
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb1/igt@gem_exec_schedule@reorder-wide@vcs0.html

  * {igt@gem_exec_schedule@reorder-wide@vcs1}:
    - shard-kbl:          [PASS][27] -> [FAIL][28] +4 similar issues
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl6/igt@gem_exec_schedule@reorder-wide@vcs1.html
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl1/igt@gem_exec_schedule@reorder-wide@vcs1.html
    - shard-tglb:         [PASS][29] -> [FAIL][30] +4 similar issues
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb5/igt@gem_exec_schedule@reorder-wide@vcs1.html
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb6/igt@gem_exec_schedule@reorder-wide@vcs1.html

  * {igt@gem_userptr_blits@invalid-mmap-offset-unsync}:
    - shard-iclb:         NOTRUN -> [TIMEOUT][31]
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb2/igt@gem_userptr_blits@invalid-mmap-offset-unsync.html

  
New tests
---------

  New tests have been introduced between CI_DRM_8597_full and Patchwork_17902_full:

### New IGT tests (2) ###

  * igt@dmabuf@all@dma_fence_proxy:
    - Statuses : 8 pass(s)
    - Exec time: [0.03, 0.10] s

  * igt@i915_selftest@mock@scheduler:
    - Statuses : 7 pass(s)
    - Exec time: [0.11, 1.08] s

  

Known issues
------------

  Here are the changes found in Patchwork_17902_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_persistence@engines-mixed-process@vcs0:
    - shard-apl:          [PASS][32] -> [FAIL][33] ([i915#1528])
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl3/igt@gem_ctx_persistence@engines-mixed-process@vcs0.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl1/igt@gem_ctx_persistence@engines-mixed-process@vcs0.html

  * igt@gem_exec_create@basic:
    - shard-kbl:          [PASS][34] -> [DMESG-WARN][35] ([i915#93] / [i915#95])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl2/igt@gem_exec_create@basic.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl7/igt@gem_exec_create@basic.html

  * igt@gem_exec_whisper@basic-queues-forked:
    - shard-glk:          [PASS][36] -> [DMESG-WARN][37] ([i915#118] / [i915#95]) +1 similar issue
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk1/igt@gem_exec_whisper@basic-queues-forked.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk2/igt@gem_exec_whisper@basic-queues-forked.html

  * igt@gem_workarounds@suspend-resume:
    - shard-skl:          [PASS][38] -> [INCOMPLETE][39] ([i915#69])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl3/igt@gem_workarounds@suspend-resume.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl4/igt@gem_workarounds@suspend-resume.html

  * igt@i915_pm_rps@waitboost:
    - shard-hsw:          [PASS][40] -> [FAIL][41] ([i915#39]) +1 similar issue
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-hsw1/igt@i915_pm_rps@waitboost.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-hsw8/igt@i915_pm_rps@waitboost.html

  * igt@kms_big_fb@x-tiled-32bpp-rotate-0:
    - shard-apl:          [PASS][42] -> [DMESG-WARN][43] ([i915#1982])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl3/igt@kms_big_fb@x-tiled-32bpp-rotate-0.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl8/igt@kms_big_fb@x-tiled-32bpp-rotate-0.html

  * igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions:
    - shard-kbl:          [PASS][44] -> [DMESG-WARN][45] ([i915#1982])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl1/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl4/igt@kms_cursor_legacy@short-flip-after-cursor-atomic-transitions.html

  * igt@kms_draw_crc@draw-method-xrgb8888-pwrite-xtiled:
    - shard-apl:          [PASS][46] -> [DMESG-WARN][47] ([i915#95]) +20 similar issues
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl8/igt@kms_draw_crc@draw-method-xrgb8888-pwrite-xtiled.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl1/igt@kms_draw_crc@draw-method-xrgb8888-pwrite-xtiled.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-wc:
    - shard-iclb:         [PASS][48] -> [DMESG-WARN][49] ([i915#1982])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-wc.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb1/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-wc.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-apl:          [PASS][50] -> [DMESG-WARN][51] ([i915#180]) +1 similar issue
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl2/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl1/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes.html

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-kbl:          [PASS][52] -> [DMESG-WARN][53] ([i915#180]) +5 similar issues
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl2/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl7/igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes.html

  * igt@kms_psr@psr2_cursor_plane_onoff:
    - shard-iclb:         [PASS][54] -> [SKIP][55] ([fdo#109441]) +2 similar issues
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb2/igt@kms_psr@psr2_cursor_plane_onoff.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb8/igt@kms_psr@psr2_cursor_plane_onoff.html

  * igt@kms_universal_plane@disable-primary-vs-flip-pipe-b:
    - shard-skl:          [PASS][56] -> [DMESG-WARN][57] ([i915#1982]) +9 similar issues
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl6/igt@kms_universal_plane@disable-primary-vs-flip-pipe-b.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl2/igt@kms_universal_plane@disable-primary-vs-flip-pipe-b.html

  * igt@kms_vblank@pipe-b-wait-forked:
    - shard-hsw:          [PASS][58] -> [INCOMPLETE][59] ([i915#61])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-hsw6/igt@kms_vblank@pipe-b-wait-forked.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-hsw7/igt@kms_vblank@pipe-b-wait-forked.html

  * igt@syncobj_wait@invalid-multi-wait-all-unsubmitted-signaled:
    - shard-tglb:         [PASS][60] -> [DMESG-WARN][61] ([i915#402])
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb6/igt@syncobj_wait@invalid-multi-wait-all-unsubmitted-signaled.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb7/igt@syncobj_wait@invalid-multi-wait-all-unsubmitted-signaled.html

  
#### Possible fixes ####

  * {igt@gem_exec_reloc@basic-concurrent0}:
    - shard-tglb:         [FAIL][62] ([i915#1930]) -> [PASS][63]
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb8/igt@gem_exec_reloc@basic-concurrent0.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb8/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-glk:          [FAIL][64] ([i915#1930]) -> [PASS][65]
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk6/igt@gem_exec_reloc@basic-concurrent0.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk8/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-apl:          [FAIL][66] ([i915#1930]) -> [PASS][67]
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl1/igt@gem_exec_reloc@basic-concurrent0.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl4/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-kbl:          [FAIL][68] ([i915#1930]) -> [PASS][69]
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl3/igt@gem_exec_reloc@basic-concurrent0.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl6/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-hsw:          [FAIL][70] ([i915#1930]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-hsw2/igt@gem_exec_reloc@basic-concurrent0.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-hsw4/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-iclb:         [FAIL][72] ([i915#1930]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb3/igt@gem_exec_reloc@basic-concurrent0.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb7/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-snb:          [FAIL][74] ([i915#1930]) -> [PASS][75]
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-snb5/igt@gem_exec_reloc@basic-concurrent0.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-snb2/igt@gem_exec_reloc@basic-concurrent0.html
    - shard-skl:          [FAIL][76] ([i915#1930]) -> [PASS][77]
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl6/igt@gem_exec_reloc@basic-concurrent0.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl10/igt@gem_exec_reloc@basic-concurrent0.html

  * {igt@gem_exec_reloc@basic-concurrent16}:
    - shard-snb:          [TIMEOUT][78] ([i915#1958]) -> [PASS][79] +2 similar issues
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-snb2/igt@gem_exec_reloc@basic-concurrent16.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-snb2/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-iclb:         [INCOMPLETE][80] ([i915#1958]) -> [PASS][81]
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb7/igt@gem_exec_reloc@basic-concurrent16.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb3/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-hsw:          [TIMEOUT][82] ([i915#1958]) -> [PASS][83] +3 similar issues
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-hsw8/igt@gem_exec_reloc@basic-concurrent16.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-hsw8/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-skl:          [INCOMPLETE][84] ([i915#1958]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl2/igt@gem_exec_reloc@basic-concurrent16.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl1/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-kbl:          [INCOMPLETE][86] ([i915#1958]) -> [PASS][87]
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl1/igt@gem_exec_reloc@basic-concurrent16.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl4/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-apl:          [INCOMPLETE][88] ([i915#1635] / [i915#1958]) -> [PASS][89]
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl8/igt@gem_exec_reloc@basic-concurrent16.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl7/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-tglb:         [INCOMPLETE][90] ([i915#1958]) -> [PASS][91]
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb3/igt@gem_exec_reloc@basic-concurrent16.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb5/igt@gem_exec_reloc@basic-concurrent16.html
    - shard-glk:          [INCOMPLETE][92] ([i915#1958] / [i915#58] / [k.org#198133]) -> [PASS][93]
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk2/igt@gem_exec_reloc@basic-concurrent16.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk4/igt@gem_exec_reloc@basic-concurrent16.html

  * igt@gem_exec_whisper@basic-contexts-all:
    - shard-glk:          [DMESG-WARN][94] ([i915#118] / [i915#95]) -> [PASS][95] +1 similar issue
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk8/igt@gem_exec_whisper@basic-contexts-all.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk4/igt@gem_exec_whisper@basic-contexts-all.html

  * igt@i915_suspend@sysfs-reader:
    - shard-apl:          [DMESG-WARN][96] ([i915#180]) -> [PASS][97] +3 similar issues
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl6/igt@i915_suspend@sysfs-reader.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl2/igt@i915_suspend@sysfs-reader.html

  * igt@kms_big_fb@x-tiled-64bpp-rotate-180:
    - shard-glk:          [DMESG-FAIL][98] ([i915#118] / [i915#95]) -> [PASS][99] +2 similar issues
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk8/igt@kms_big_fb@x-tiled-64bpp-rotate-180.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk6/igt@kms_big_fb@x-tiled-64bpp-rotate-180.html

  * igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy:
    - shard-glk:          [FAIL][100] ([i915#72]) -> [PASS][101]
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk6/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk8/igt@kms_cursor_legacy@2x-long-flip-vs-cursor-legacy.html

  * igt@kms_cursor_legacy@cursora-vs-flipa-atomic-transitions-varying-size:
    - shard-skl:          [DMESG-WARN][102] ([i915#1982]) -> [PASS][103]
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl9/igt@kms_cursor_legacy@cursora-vs-flipa-atomic-transitions-varying-size.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl1/igt@kms_cursor_legacy@cursora-vs-flipa-atomic-transitions-varying-size.html

  * igt@kms_draw_crc@draw-method-rgb565-blt-untiled:
    - shard-apl:          [DMESG-WARN][104] ([i915#95]) -> [PASS][105] +22 similar issues
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl6/igt@kms_draw_crc@draw-method-rgb565-blt-untiled.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl2/igt@kms_draw_crc@draw-method-rgb565-blt-untiled.html

  * {igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a1}:
    - shard-glk:          [FAIL][106] ([i915#79]) -> [PASS][107]
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-glk6/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a1.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-glk8/igt@kms_flip@flip-vs-expired-vblank-interruptible@a-hdmi-a1.html

  * {igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1}:
    - shard-skl:          [FAIL][108] ([i915#1928]) -> [PASS][109]
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl4/igt@kms_flip@plain-flip-fb-recreate-interruptible@a-edp1.html

  * igt@kms_flip_tiling@flip-changes-tiling-yf:
    - shard-skl:          [FAIL][110] ([i915#699]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl5/igt@kms_flip_tiling@flip-changes-tiling-yf.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl3/igt@kms_flip_tiling@flip-changes-tiling-yf.html

  * igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary:
    - shard-tglb:         [DMESG-WARN][112] ([i915#1982]) -> [PASS][113]
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-tglb6/igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-tglb3/igt@kms_frontbuffer_tracking@fbc-indfb-scaledprimary.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [SKIP][114] ([fdo#109441]) -> [PASS][115] +1 similar issue
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-iclb6/igt@kms_psr@psr2_sprite_plane_move.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-kbl:          [DMESG-WARN][116] ([i915#180]) -> [PASS][117] +4 similar issues
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl1/igt@kms_vblank@pipe-a-ts-continuation-suspend.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl1/igt@kms_vblank@pipe-a-ts-continuation-suspend.html

  * {igt@perf@blocking-parameterized}:
    - shard-hsw:          [FAIL][118] ([i915#1542]) -> [PASS][119]
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-hsw6/igt@perf@blocking-parameterized.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-hsw1/igt@perf@blocking-parameterized.html

  
#### Warnings ####

  * igt@gem_ctx_exec@basic-nohangcheck:
    - shard-kbl:          [DMESG-WARN][120] ([i915#93] / [i915#95]) -> [DMESG-FAIL][121] ([i915#95])
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl2/igt@gem_ctx_exec@basic-nohangcheck.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl3/igt@gem_ctx_exec@basic-nohangcheck.html
    - shard-apl:          [DMESG-WARN][122] ([i915#95]) -> [DMESG-FAIL][123] ([i915#95])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl1/igt@gem_ctx_exec@basic-nohangcheck.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl4/igt@gem_ctx_exec@basic-nohangcheck.html

  * igt@kms_content_protection@atomic:
    - shard-kbl:          [DMESG-FAIL][124] ([fdo#110321] / [i915#95]) -> [TIMEOUT][125] ([i915#1319])
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl2/igt@kms_content_protection@atomic.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl3/igt@kms_content_protection@atomic.html

  * igt@kms_content_protection@atomic-dpms:
    - shard-apl:          [FAIL][126] ([fdo#110321] / [fdo#110336]) -> [TIMEOUT][127] ([i915#1319] / [i915#1635]) +1 similar issue
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl3/igt@kms_content_protection@atomic-dpms.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl8/igt@kms_content_protection@atomic-dpms.html

  * igt@kms_content_protection@legacy:
    - shard-kbl:          [TIMEOUT][128] ([i915#1319]) -> [TIMEOUT][129] ([i915#1319] / [i915#1958])
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl1/igt@kms_content_protection@legacy.html
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl4/igt@kms_content_protection@legacy.html

  * igt@kms_content_protection@lic:
    - shard-apl:          [TIMEOUT][130] ([i915#1319] / [i915#1635]) -> [FAIL][131] ([fdo#110321]) +1 similar issue
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-apl6/igt@kms_content_protection@lic.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-apl2/igt@kms_content_protection@lic.html
    - shard-kbl:          [TIMEOUT][132] ([i915#1319] / [i915#1958]) -> [TIMEOUT][133] ([i915#1319])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl4/igt@kms_content_protection@lic.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl3/igt@kms_content_protection@lic.html

  * igt@kms_cursor_crc@pipe-a-cursor-suspend:
    - shard-kbl:          [DMESG-WARN][134] ([i915#93] / [i915#95]) -> [DMESG-WARN][135] ([i915#180] / [i915#93] / [i915#95])
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-kbl1/igt@kms_cursor_crc@pipe-a-cursor-suspend.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-kbl4/igt@kms_cursor_crc@pipe-a-cursor-suspend.html

  * igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size:
    - shard-skl:          [FAIL][136] ([IGT#5]) -> [DMESG-WARN][137] ([i915#1982])
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-skl7/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-skl9/igt@kms_cursor_legacy@flip-vs-cursor-atomic-transitions-varying-size.html

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-gtt:
    - shard-snb:          [TIMEOUT][138] ([i915#1958]) -> [SKIP][139] ([fdo#109271]) +2 similar issues
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-snb2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-gtt.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-snb2/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-gtt.html
    - shard-hsw:          [TIMEOUT][140] ([i915#1958]) -> [SKIP][141] ([fdo#109271])
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8597/shard-hsw8/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-gtt.html
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/shard-hsw8/igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-draw-mmap-gtt.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [IGT#5]: https://gitlab.freedesktop.org/drm/igt-gpu-tools/issues/5
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#110321]: https://bugs.freedesktop.org/show_bug.cgi?id=110321
  [fdo#110336]: https://bugs.freedesktop.org/show_bug.cgi?id=110336
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1319]: https://gitlab.freedesktop.org/drm/intel/issues/1319
  [i915#1528]: https://gitlab.freedesktop.org/drm/intel/issues/1528
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1928]: https://gitlab.freedesktop.org/drm/intel/issues/1928
  [i915#1930]: https://gitlab.freedesktop.org/drm/intel/issues/1930
  [i915#1958]: https://gitlab.freedesktop.org/drm/intel/issues/1958
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#39]: https://gitlab.freedesktop.org/drm/intel/issues/39
  [i915#402]: https://gitlab.freedesktop.org/drm/intel/issues/402
  [i915#58]: https://gitlab.freedesktop.org/drm/intel/issues/58
  [i915#61]: https://gitlab.freedesktop.org/drm/intel/issues/61
  [i915#69]: https://gitlab.freedesktop.org/drm/intel/issues/69
  [i915#699]: https://gitlab.freedesktop.org/drm/intel/issues/699
  [i915#72]: https://gitlab.freedesktop.org/drm/intel/issues/72
  [i915#79]: https://gitlab.freedesktop.org/drm/intel/issues/79
  [i915#93]: https://gitlab.freedesktop.org/drm/intel/issues/93
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (11 -> 11)
------------------------------

  No changes in participating hosts


Build changes
-------------

  * Linux: CI_DRM_8597 -> Patchwork_17902

  CI-20190529: 20190529
  CI_DRM_8597: aadd3cf12a7c515bca8752da797ded56a003617b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5696: 8d1744239f4300eb12d5bab14a30b79d9c8dd364 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17902: 1dad2fcca4f707aa870be1a45bb28bfb4c2b0f73 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17902/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (30 preceding siblings ...)
  2020-06-08  0:58 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
@ 2020-06-08  7:44 ` Tvrtko Ursulin
  2020-06-08  9:33   ` Chris Wilson
  2020-06-08 20:43 ` Mika Kuoppala
  32 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-06-08  7:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/06/2020 23:20, Chris Wilson wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Sentinels are supposed to be last reqeusts in the elsp queue, not the
> only one, so adjust the assert accordingly.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
>   1 file changed, 3 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index d55a5e0466e5..db8a170b0e5c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>   		ccid = ce->lrc.ccid;
>   
>   		/*
> -		 * Sentinels are supposed to be lonely so they flush the
> -		 * current exection off the HW. Check that they are the
> -		 * only request in the pending submission.
> +		 * Sentinels are supposed to be the last request so they flush
> +		 * the current exection off the HW. Check that they are the only
> +		 * request in the pending submission.
>   		 */
>   		if (sentinel) {
>   			GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>   				      port - execlists->pending);
>   			return false;
>   		}
> -
>   		sentinel = i915_request_has_sentinel(rq);

FWIW I was changing it to "sentinel |= ..." so it keeps working if we 
decide to use more than 2 elsp ports on Icelake one day.

Regards,

Tvrtko

> -		if (sentinel && port != execlists->pending) {
> -			GEM_TRACE_ERR("%s: sentinel context:%llx not in prime position[%zd]\n",
> -				      engine->name,
> -				      ce->timeline->fence_context,
> -				      port - execlists->pending);
> -			return false;
> -		}
>   
>   		/* Hold tightly onto the lock to prevent concurrent retires! */
>   		if (!spin_trylock_irqsave(&rq->lock, flags))
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-08  7:44 ` [Intel-gfx] [PATCH 01/28] " Tvrtko Ursulin
@ 2020-06-08  9:33   ` Chris Wilson
  2020-06-09  6:59     ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-08  9:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-06-08 08:44:01)
> 
> On 07/06/2020 23:20, Chris Wilson wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Sentinels are supposed to be last reqeusts in the elsp queue, not the
> > only one, so adjust the assert accordingly.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
> >   1 file changed, 3 insertions(+), 11 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index d55a5e0466e5..db8a170b0e5c 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >               ccid = ce->lrc.ccid;
> >   
> >               /*
> > -              * Sentinels are supposed to be lonely so they flush the
> > -              * current exection off the HW. Check that they are the
> > -              * only request in the pending submission.
> > +              * Sentinels are supposed to be the last request so they flush
> > +              * the current exection off the HW. Check that they are the only
> > +              * request in the pending submission.
> >                */
> >               if (sentinel) {
> >                       GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
> > @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >                                     port - execlists->pending);
> >                       return false;
> >               }
> > -
> >               sentinel = i915_request_has_sentinel(rq);
> 
> FWIW I was changing it to "sentinel |= ..." so it keeps working if we 
> decide to use more than 2 elsp ports on Icelake one day.

But it will always fail on the next port...
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
                   ` (31 preceding siblings ...)
  2020-06-08  7:44 ` [Intel-gfx] [PATCH 01/28] " Tvrtko Ursulin
@ 2020-06-08 20:43 ` Mika Kuoppala
  32 siblings, 0 replies; 52+ messages in thread
From: Mika Kuoppala @ 2020-06-08 20:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Sentinels are supposed to be last reqeusts in the elsp queue, not the
> only one, so adjust the assert accordingly.

s/reqeusts/requests

>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
>  1 file changed, 3 insertions(+), 11 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index d55a5e0466e5..db8a170b0e5c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>  		ccid = ce->lrc.ccid;
>  
>  		/*
> -		 * Sentinels are supposed to be lonely so they flush the
> -		 * current exection off the HW. Check that they are the
> -		 * only request in the pending submission.
> +		 * Sentinels are supposed to be the last request so they flush
> +		 * the current exection off the HW. Check that they are the only

s/exection/exeqution

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> +		 * request in the pending submission.
>  		 */
>  		if (sentinel) {
>  			GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>  				      port - execlists->pending);
>  			return false;
>  		}
> -
>  		sentinel = i915_request_has_sentinel(rq);
> -		if (sentinel && port != execlists->pending) {
> -			GEM_TRACE_ERR("%s: sentinel context:%llx not in prime position[%zd]\n",
> -				      engine->name,
> -				      ce->timeline->fence_context,
> -				      port - execlists->pending);
> -			return false;
> -		}
>  
>  		/* Hold tightly onto the lock to prevent concurrent retires! */
>  		if (!spin_trylock_irqsave(&rq->lock, flags))
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible Chris Wilson
@ 2020-06-08 20:58   ` Mika Kuoppala
  0 siblings, 0 replies; 52+ messages in thread
From: Mika Kuoppala @ 2020-06-08 20:58 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> In some of our hangtests, we try to reset an active engine while it is
> spinning inside the recursive spinner. However, we also try to flood the
> engine with requests that preempt the hang, and so should disable the
> preemption to be sure that we reset the right request.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 36 ++++++++++++++------
>  1 file changed, 26 insertions(+), 10 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 4aa4cc917d8b..035f363fb0f8 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -203,12 +203,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
>  		*batch++ = lower_32_bits(hws_address(hws, rq));
>  		*batch++ = upper_32_bits(hws_address(hws, rq));
>  		*batch++ = rq->fence.seqno;
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  
>  		memset(batch, 0, 1024);
>  		batch += 1024 / sizeof(*batch);
>  
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  		*batch++ = MI_BATCH_BUFFER_START | 1 << 8 | 1;
>  		*batch++ = lower_32_bits(vma->node.start);
>  		*batch++ = upper_32_bits(vma->node.start);
> @@ -217,12 +217,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
>  		*batch++ = 0;
>  		*batch++ = lower_32_bits(hws_address(hws, rq));
>  		*batch++ = rq->fence.seqno;
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  
>  		memset(batch, 0, 1024);
>  		batch += 1024 / sizeof(*batch);
>  
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  		*batch++ = MI_BATCH_BUFFER_START | 1 << 8;
>  		*batch++ = lower_32_bits(vma->node.start);
>  	} else if (INTEL_GEN(gt->i915) >= 4) {
> @@ -230,24 +230,24 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
>  		*batch++ = 0;
>  		*batch++ = lower_32_bits(hws_address(hws, rq));
>  		*batch++ = rq->fence.seqno;
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  
>  		memset(batch, 0, 1024);
>  		batch += 1024 / sizeof(*batch);
>  
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  		*batch++ = MI_BATCH_BUFFER_START | 2 << 6;
>  		*batch++ = lower_32_bits(vma->node.start);
>  	} else {
>  		*batch++ = MI_STORE_DWORD_IMM | MI_MEM_VIRTUAL;
>  		*batch++ = lower_32_bits(hws_address(hws, rq));
>  		*batch++ = rq->fence.seqno;
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  
>  		memset(batch, 0, 1024);
>  		batch += 1024 / sizeof(*batch);
>  
> -		*batch++ = MI_ARB_CHECK;
> +		*batch++ = MI_NOOP;
>  		*batch++ = MI_BATCH_BUFFER_START | 2 << 6;
>  		*batch++ = lower_32_bits(vma->node.start);
>  	}
> @@ -866,13 +866,29 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			count++;
>  
>  			if (rq) {
> +				if (rq->fence.error != -EIO) {
> +					pr_err("i915_reset_engine(%s:%s):"
> +					       " failed to reset request %llx:%lld\n",
> +					       engine->name, test_name,
> +					       rq->fence.context,
> +					       rq->fence.seqno);
> +					i915_request_put(rq);
> +
> +					GEM_TRACE_DUMP();
> +					intel_gt_set_wedged(gt);
> +					err = -EIO;
> +					break;
> +				}
> +
>  				if (i915_request_wait(rq, 0, HZ / 5) < 0) {
>  					struct drm_printer p =
>  						drm_info_printer(gt->i915->drm.dev);
>  
>  					pr_err("i915_reset_engine(%s:%s):"
> -					       " failed to complete request after reset\n",
> -					       engine->name, test_name);
> +					       " failed to complete request %llx:%lld after reset\n",
> +					       engine->name, test_name,
> +					       rq->fence.context,
> +					       rq->fence.seqno);
>  					intel_engine_dump(engine, &p,
>  							  "%s\n", engine->name);
>  					i915_request_put(rq);
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-08  9:33   ` Chris Wilson
@ 2020-06-09  6:59     ` Tvrtko Ursulin
  2020-06-09 10:29       ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-06-09  6:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

666
On 08/06/2020 10:33, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-08 08:44:01)
>>
>> On 07/06/2020 23:20, Chris Wilson wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> Sentinels are supposed to be last reqeusts in the elsp queue, not the
>>> only one, so adjust the assert accordingly.
>>>
>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
>>>    1 file changed, 3 insertions(+), 11 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index d55a5e0466e5..db8a170b0e5c 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>>>                ccid = ce->lrc.ccid;
>>>    
>>>                /*
>>> -              * Sentinels are supposed to be lonely so they flush the
>>> -              * current exection off the HW. Check that they are the
>>> -              * only request in the pending submission.
>>> +              * Sentinels are supposed to be the last request so they flush
>>> +              * the current exection off the HW. Check that they are the only
>>> +              * request in the pending submission.
>>>                 */
>>>                if (sentinel) {
>>>                        GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
>>> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>>>                                      port - execlists->pending);
>>>                        return false;
>>>                }
>>> -
>>>                sentinel = i915_request_has_sentinel(rq);
>>
>> FWIW I was changing it to "sentinel |= ..." so it keeps working if we
>> decide to use more than 2 elsp ports on Icelake one day.
> 
> But it will always fail on the next port...

I don't follow. Sentinel has to be last so if it fails on the next port 
it is correct to do so, no?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
@ 2020-06-09  7:47   ` Tvrtko Ursulin
  2020-06-09 10:48     ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-06-09  7:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/06/2020 23:20, Chris Wilson wrote:
> Over the next couple of patches, we will want to lock all the modified
> vma for relocation processing under a single ww_mutex. We neither want
> to have to include the vma that are skipped (due to no modifications
> required) nor do we want those to be marked as written too. So separate
> out the reloc validation into an early step, which we can use both to
> reject the execbuf before committing to making our changes, and to
> filter out the unmodified vma.
> 
> This does introduce a second pass through the reloc[], but only if we
> need to emit relocations.
> 
> v2: reuse the outer loop, not cut'n'paste.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 145 +++++++++++-------
>   1 file changed, 86 insertions(+), 59 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> index 23db79b806db..01ab1e15a142 100644
> --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> @@ -911,9 +911,9 @@ static void eb_destroy(const struct i915_execbuffer *eb)
>   
>   static inline u64
>   relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
> -		  const struct i915_vma *target)
> +		  u64 target)
>   {
> -	return gen8_canonical_addr((int)reloc->delta + target->node.start);
> +	return gen8_canonical_addr((int)reloc->delta + target);
>   }
>   
>   static void reloc_cache_init(struct reloc_cache *cache,
> @@ -1292,26 +1292,11 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
>   	return 0;
>   }
>   
> -static u64
> -relocate_entry(struct i915_execbuffer *eb,
> -	       struct i915_vma *vma,
> -	       const struct drm_i915_gem_relocation_entry *reloc,
> -	       const struct i915_vma *target)
> -{
> -	u64 target_addr = relocation_target(reloc, target);
> -	int err;
> -
> -	err = __reloc_entry_gpu(eb, vma, reloc->offset, target_addr);
> -	if (err)
> -		return err;
> -
> -	return target->node.start | UPDATE;
> -}
> -
> -static u64
> -eb_relocate_entry(struct i915_execbuffer *eb,
> -		  struct eb_vma *ev,
> -		  const struct drm_i915_gem_relocation_entry *reloc)
> +static int
> +eb_reloc_prepare(struct i915_execbuffer *eb,
> +		 struct eb_vma *ev,
> +		 const struct drm_i915_gem_relocation_entry *reloc,
> +		 struct drm_i915_gem_relocation_entry __user *user)
>   {
>   	struct drm_i915_private *i915 = eb->i915;
>   	struct eb_vma *target;
> @@ -1389,6 +1374,32 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>   		return -EINVAL;
>   	}
>   
> +	return 1;
> +}
> +
> +static int
> +eb_reloc_entry(struct i915_execbuffer *eb,
> +	       struct eb_vma *ev,
> +	       const struct drm_i915_gem_relocation_entry *reloc,
> +	       struct drm_i915_gem_relocation_entry __user *user)
> +{
> +	struct eb_vma *target;
> +	u64 offset;
> +	int err;
> +
> +	/* we've already hold a reference to all valid objects */
> +	target = eb_get_vma(eb, reloc->target_handle);
> +	if (unlikely(!target))
> +		return -ENOENT;
> +
> +	/*
> +	 * If the relocation already has the right value in it, no
> +	 * more work needs to be done.
> +	 */
> +	offset = gen8_canonical_addr(target->vma->node.start);
> +	if (offset == reloc->presumed_offset) > +		return 0;
> +

Haven't these reloc entries been removed from the list in the prepare phase?

Regards,

Tvrtko

>   	/*
>   	 * If we write into the object, we need to force the synchronisation
>   	 * barrier, either with an asynchronous clflush or if we executed the
> @@ -1399,11 +1410,41 @@ eb_relocate_entry(struct i915_execbuffer *eb,
>   	 */
>   	ev->flags &= ~EXEC_OBJECT_ASYNC;
>   
> -	/* and update the user's relocation entry */
> -	return relocate_entry(eb, ev->vma, reloc, target->vma);
> +	err = __reloc_entry_gpu(eb, ev->vma, reloc->offset,
> +				relocation_target(reloc, offset));
> +	if (err)
> +		return err;
> +
> +	/*
> +	 * Note that reporting an error now
> +	 * leaves everything in an inconsistent
> +	 * state as we have *already* changed
> +	 * the relocation value inside the
> +	 * object. As we have not changed the
> +	 * reloc.presumed_offset or will not
> +	 * change the execobject.offset, on the
> +	 * call we may not rewrite the value
> +	 * inside the object, leaving it
> +	 * dangling and causing a GPU hang. Unless
> +	 * userspace dynamically rebuilds the
> +	 * relocations on each execbuf rather than
> +	 * presume a static tree.
> +	 *
> +	 * We did previously check if the relocations
> +	 * were writable (access_ok), an error now
> +	 * would be a strange race with mprotect,
> +	 * having already demonstrated that we
> +	 * can read from this userspace address.
> +	 */
> +	__put_user(offset, &user->presumed_offset);
> +	return 0;
>   }
>   
> -static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
> +static long eb_reloc_vma(struct i915_execbuffer *eb, struct eb_vma *ev,
> +			 int (*fn)(struct i915_execbuffer *eb,
> +				   struct eb_vma *ev,
> +				   const struct drm_i915_gem_relocation_entry *reloc,
> +				   struct drm_i915_gem_relocation_entry __user *user))
>   {
>   #define N_RELOC(x) ((x) / sizeof(struct drm_i915_gem_relocation_entry))
>   	struct drm_i915_gem_relocation_entry stack[N_RELOC(512)];
> @@ -1411,6 +1452,7 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
>   	struct drm_i915_gem_relocation_entry __user *urelocs =
>   		u64_to_user_ptr(entry->relocs_ptr);
>   	unsigned long remain = entry->relocation_count;
> +	int required = 0;
>   
>   	if (unlikely(remain > N_RELOC(ULONG_MAX)))
>   		return -EINVAL;
> @@ -1443,42 +1485,18 @@ static int eb_relocate_vma(struct i915_execbuffer *eb, struct eb_vma *ev)
>   
>   		remain -= count;
>   		do {
> -			u64 offset = eb_relocate_entry(eb, ev, r);
> +			int ret;
>   
> -			if (likely(offset == 0)) {
> -			} else if ((s64)offset < 0) {
> -				return (int)offset;
> -			} else {
> -				/*
> -				 * Note that reporting an error now
> -				 * leaves everything in an inconsistent
> -				 * state as we have *already* changed
> -				 * the relocation value inside the
> -				 * object. As we have not changed the
> -				 * reloc.presumed_offset or will not
> -				 * change the execobject.offset, on the
> -				 * call we may not rewrite the value
> -				 * inside the object, leaving it
> -				 * dangling and causing a GPU hang. Unless
> -				 * userspace dynamically rebuilds the
> -				 * relocations on each execbuf rather than
> -				 * presume a static tree.
> -				 *
> -				 * We did previously check if the relocations
> -				 * were writable (access_ok), an error now
> -				 * would be a strange race with mprotect,
> -				 * having already demonstrated that we
> -				 * can read from this userspace address.
> -				 */
> -				offset = gen8_canonical_addr(offset & ~UPDATE);
> -				__put_user(offset,
> -					   &urelocs[r - stack].presumed_offset);
> -			}
> +			ret = fn(eb, ev, r, &urelocs[r - stack]);
> +			if (ret < 0)
> +				return ret;
> +
> +			required |= ret;
>   		} while (r++, --count);
>   		urelocs += ARRAY_SIZE(stack);
>   	} while (remain);
>   
> -	return 0;
> +	return required;
>   }
>   
>   static int eb_relocate(struct i915_execbuffer *eb)
> @@ -1497,12 +1515,21 @@ static int eb_relocate(struct i915_execbuffer *eb)
>   
>   	/* The objects are in their final locations, apply the relocations. */
>   	if (eb->args->flags & __EXEC_HAS_RELOC) {
> -		struct eb_vma *ev;
> +		struct eb_vma *ev, *en;
>   		int flush;
>   
> +		list_for_each_entry_safe(ev, en, &eb->relocs, reloc_link) {
> +			err = eb_reloc_vma(eb, ev, eb_reloc_prepare);
> +			if (err < 0)
> +				return err;
> +
> +			if (err == 0)
> +				list_del_init(&ev->reloc_link);
> +		}
> +
>   		list_for_each_entry(ev, &eb->relocs, reloc_link) {
> -			err = eb_relocate_vma(eb, ev);
> -			if (err)
> +			err = eb_reloc_vma(eb, ev, eb_reloc_entry);
> +			if (err < 0)
>   				break;
>   		}
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-09  6:59     ` Tvrtko Ursulin
@ 2020-06-09 10:29       ` Chris Wilson
  2020-06-09 10:39         ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-09 10:29 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-06-09 07:59:27)
> 666
> On 08/06/2020 10:33, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-08 08:44:01)
> >>
> >> On 07/06/2020 23:20, Chris Wilson wrote:
> >>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>
> >>> Sentinels are supposed to be last reqeusts in the elsp queue, not the
> >>> only one, so adjust the assert accordingly.
> >>>
> >>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
> >>>    1 file changed, 3 insertions(+), 11 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index d55a5e0466e5..db8a170b0e5c 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >>>                ccid = ce->lrc.ccid;
> >>>    
> >>>                /*
> >>> -              * Sentinels are supposed to be lonely so they flush the
> >>> -              * current exection off the HW. Check that they are the
> >>> -              * only request in the pending submission.
> >>> +              * Sentinels are supposed to be the last request so they flush
> >>> +              * the current exection off the HW. Check that they are the only
> >>> +              * request in the pending submission.
> >>>                 */
> >>>                if (sentinel) {
> >>>                        GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
> >>> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >>>                                      port - execlists->pending);
> >>>                        return false;
> >>>                }
> >>> -
> >>>                sentinel = i915_request_has_sentinel(rq);
> >>
> >> FWIW I was changing it to "sentinel |= ..." so it keeps working if we
> >> decide to use more than 2 elsp ports on Icelake one day.
> > 
> > But it will always fail on the next port...
> 
> I don't follow. Sentinel has to be last so if it fails on the next port 
> it is correct to do so, no?

Exactly. We only check the first port after setting sentinel, if that
port is occupied we fail. Hence why we don't need |=, since there is no
continuation.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-09 10:29       ` Chris Wilson
@ 2020-06-09 10:39         ` Tvrtko Ursulin
  2020-06-09 10:47           ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-06-09 10:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 09/06/2020 11:29, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-09 07:59:27)
>> 666
>> On 08/06/2020 10:33, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-06-08 08:44:01)
>>>>
>>>> On 07/06/2020 23:20, Chris Wilson wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> Sentinels are supposed to be last reqeusts in the elsp queue, not the
>>>>> only one, so adjust the assert accordingly.
>>>>>
>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
>>>>>     1 file changed, 3 insertions(+), 11 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>> index d55a5e0466e5..db8a170b0e5c 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>>>>>                 ccid = ce->lrc.ccid;
>>>>>     
>>>>>                 /*
>>>>> -              * Sentinels are supposed to be lonely so they flush the
>>>>> -              * current exection off the HW. Check that they are the
>>>>> -              * only request in the pending submission.
>>>>> +              * Sentinels are supposed to be the last request so they flush
>>>>> +              * the current exection off the HW. Check that they are the only
>>>>> +              * request in the pending submission.
>>>>>                  */
>>>>>                 if (sentinel) {
>>>>>                         GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
>>>>> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>>>>>                                       port - execlists->pending);
>>>>>                         return false;
>>>>>                 }
>>>>> -
>>>>>                 sentinel = i915_request_has_sentinel(rq);
>>>>
>>>> FWIW I was changing it to "sentinel |= ..." so it keeps working if we
>>>> decide to use more than 2 elsp ports on Icelake one day.
>>>
>>> But it will always fail on the next port...
>>
>> I don't follow. Sentinel has to be last so if it fails on the next port
>> it is correct to do so, no?
> 
> Exactly. We only check the first port after setting sentinel, if that
> port is occupied we fail. Hence why we don't need |=, since there is no
> continuation.

But if more than two ports we also overwrite the bools so: sentinel, 
non-sentinel, sentinel would not catch. I was just future proofing it. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-09 10:39         ` Tvrtko Ursulin
@ 2020-06-09 10:47           ` Chris Wilson
  2020-06-09 11:45             ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-09 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-06-09 11:39:11)
> 
> On 09/06/2020 11:29, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-06-09 07:59:27)
> >> 666
> >> On 08/06/2020 10:33, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2020-06-08 08:44:01)
> >>>>
> >>>> On 07/06/2020 23:20, Chris Wilson wrote:
> >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>
> >>>>> Sentinels are supposed to be last reqeusts in the elsp queue, not the
> >>>>> only one, so adjust the assert accordingly.
> >>>>>
> >>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
> >>>>>     1 file changed, 3 insertions(+), 11 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>>> index d55a5e0466e5..db8a170b0e5c 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>>> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >>>>>                 ccid = ce->lrc.ccid;
> >>>>>     
> >>>>>                 /*
> >>>>> -              * Sentinels are supposed to be lonely so they flush the
> >>>>> -              * current exection off the HW. Check that they are the
> >>>>> -              * only request in the pending submission.
> >>>>> +              * Sentinels are supposed to be the last request so they flush
> >>>>> +              * the current exection off the HW. Check that they are the only
> >>>>> +              * request in the pending submission.
> >>>>>                  */
> >>>>>                 if (sentinel) {
> >>>>>                         GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
> >>>>> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
> >>>>>                                       port - execlists->pending);
> >>>>>                         return false;
> >>>>>                 }
> >>>>> -
> >>>>>                 sentinel = i915_request_has_sentinel(rq);
> >>>>
> >>>> FWIW I was changing it to "sentinel |= ..." so it keeps working if we
> >>>> decide to use more than 2 elsp ports on Icelake one day.
> >>>
> >>> But it will always fail on the next port...
> >>
> >> I don't follow. Sentinel has to be last so if it fails on the next port
> >> it is correct to do so, no?
> > 
> > Exactly. We only check the first port after setting sentinel, if that
> > port is occupied we fail. Hence why we don't need |=, since there is no
> > continuation.
> 
> But if more than two ports we also overwrite the bools so: sentinel, 
> non-sentinel, sentinel would not catch. I was just future proofing it. :)

[0] -> sentinel
[1] != NULL -> ERROR

[0] -> not sentinel
[1] -> sentinel
[2] != NULL -> ERROR

We fail if anything comes after a sentinel.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step
  2020-06-09  7:47   ` Tvrtko Ursulin
@ 2020-06-09 10:48     ` Chris Wilson
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-09 10:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-06-09 08:47:00)
> 
> On 07/06/2020 23:20, Chris Wilson wrote:
> > Over the next couple of patches, we will want to lock all the modified
> > vma for relocation processing under a single ww_mutex. We neither want
> > to have to include the vma that are skipped (due to no modifications
> > required) nor do we want those to be marked as written too. So separate
> > out the reloc validation into an early step, which we can use both to
> > reject the execbuf before committing to making our changes, and to
> > filter out the unmodified vma.
> > 
> > This does introduce a second pass through the reloc[], but only if we
> > need to emit relocations.
> > 
> > v2: reuse the outer loop, not cut'n'paste.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   .../gpu/drm/i915/gem/i915_gem_execbuffer.c    | 145 +++++++++++-------
> >   1 file changed, 86 insertions(+), 59 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > index 23db79b806db..01ab1e15a142 100644
> > --- a/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > +++ b/drivers/gpu/drm/i915/gem/i915_gem_execbuffer.c
> > @@ -911,9 +911,9 @@ static void eb_destroy(const struct i915_execbuffer *eb)
> >   
> >   static inline u64
> >   relocation_target(const struct drm_i915_gem_relocation_entry *reloc,
> > -               const struct i915_vma *target)
> > +               u64 target)
> >   {
> > -     return gen8_canonical_addr((int)reloc->delta + target->node.start);
> > +     return gen8_canonical_addr((int)reloc->delta + target);
> >   }
> >   
> >   static void reloc_cache_init(struct reloc_cache *cache,
> > @@ -1292,26 +1292,11 @@ static int __reloc_entry_gpu(struct i915_execbuffer *eb,
> >       return 0;
> >   }
> >   
> > -static u64
> > -relocate_entry(struct i915_execbuffer *eb,
> > -            struct i915_vma *vma,
> > -            const struct drm_i915_gem_relocation_entry *reloc,
> > -            const struct i915_vma *target)
> > -{
> > -     u64 target_addr = relocation_target(reloc, target);
> > -     int err;
> > -
> > -     err = __reloc_entry_gpu(eb, vma, reloc->offset, target_addr);
> > -     if (err)
> > -             return err;
> > -
> > -     return target->node.start | UPDATE;
> > -}
> > -
> > -static u64
> > -eb_relocate_entry(struct i915_execbuffer *eb,
> > -               struct eb_vma *ev,
> > -               const struct drm_i915_gem_relocation_entry *reloc)
> > +static int
> > +eb_reloc_prepare(struct i915_execbuffer *eb,
> > +              struct eb_vma *ev,
> > +              const struct drm_i915_gem_relocation_entry *reloc,
> > +              struct drm_i915_gem_relocation_entry __user *user)
> >   {
> >       struct drm_i915_private *i915 = eb->i915;
> >       struct eb_vma *target;
> > @@ -1389,6 +1374,32 @@ eb_relocate_entry(struct i915_execbuffer *eb,
> >               return -EINVAL;
> >       }
> >   
> > +     return 1;
> > +}
> > +
> > +static int
> > +eb_reloc_entry(struct i915_execbuffer *eb,
> > +            struct eb_vma *ev,
> > +            const struct drm_i915_gem_relocation_entry *reloc,
> > +            struct drm_i915_gem_relocation_entry __user *user)
> > +{
> > +     struct eb_vma *target;
> > +     u64 offset;
> > +     int err;
> > +
> > +     /* we've already hold a reference to all valid objects */
> > +     target = eb_get_vma(eb, reloc->target_handle);
> > +     if (unlikely(!target))
> > +             return -ENOENT;
> > +
> > +     /*
> > +      * If the relocation already has the right value in it, no
> > +      * more work needs to be done.
> > +      */
> > +     offset = gen8_canonical_addr(target->vma->node.start);
> > +     if (offset == reloc->presumed_offset) > +               return 0;
> > +
> 
> Haven't these reloc entries been removed from the list in the prepare phase?

No, we don't adjust the user reloc arrays, we only skip entire objects
that do not require relocs.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation
  2020-06-09 10:47           ` Chris Wilson
@ 2020-06-09 11:45             ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-06-09 11:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 09/06/2020 11:47, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-06-09 11:39:11)
>>
>> On 09/06/2020 11:29, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-06-09 07:59:27)
>>>> 666
>>>> On 08/06/2020 10:33, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2020-06-08 08:44:01)
>>>>>>
>>>>>> On 07/06/2020 23:20, Chris Wilson wrote:
>>>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>>
>>>>>>> Sentinels are supposed to be last reqeusts in the elsp queue, not the
>>>>>>> only one, so adjust the assert accordingly.
>>>>>>>
>>>>>>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/gt/intel_lrc.c | 14 +++-----------
>>>>>>>      1 file changed, 3 insertions(+), 11 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> index d55a5e0466e5..db8a170b0e5c 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> @@ -1635,9 +1635,9 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>>>>>>>                  ccid = ce->lrc.ccid;
>>>>>>>      
>>>>>>>                  /*
>>>>>>> -              * Sentinels are supposed to be lonely so they flush the
>>>>>>> -              * current exection off the HW. Check that they are the
>>>>>>> -              * only request in the pending submission.
>>>>>>> +              * Sentinels are supposed to be the last request so they flush
>>>>>>> +              * the current exection off the HW. Check that they are the only
>>>>>>> +              * request in the pending submission.
>>>>>>>                   */
>>>>>>>                  if (sentinel) {
>>>>>>>                          GEM_TRACE_ERR("%s: context:%llx after sentinel in pending[%zd]\n",
>>>>>>> @@ -1646,15 +1646,7 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
>>>>>>>                                        port - execlists->pending);
>>>>>>>                          return false;
>>>>>>>                  }
>>>>>>> -
>>>>>>>                  sentinel = i915_request_has_sentinel(rq);
>>>>>>
>>>>>> FWIW I was changing it to "sentinel |= ..." so it keeps working if we
>>>>>> decide to use more than 2 elsp ports on Icelake one day.
>>>>>
>>>>> But it will always fail on the next port...
>>>>
>>>> I don't follow. Sentinel has to be last so if it fails on the next port
>>>> it is correct to do so, no?
>>>
>>> Exactly. We only check the first port after setting sentinel, if that
>>> port is occupied we fail. Hence why we don't need |=, since there is no
>>> continuation.
>>
>> But if more than two ports we also overwrite the bools so: sentinel,
>> non-sentinel, sentinel would not catch. I was just future proofing it. :)
> 
> [0] -> sentinel
> [1] != NULL -> ERROR
> 
> [0] -> not sentinel
> [1] -> sentinel
> [2] != NULL -> ERROR
> 
> We fail if anything comes after a sentinel.

:) Joke is on me.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 03/28] drm/i915/selftests: Teach hang-self to target only itself
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 03/28] drm/i915/selftests: Teach hang-self to target only itself Chris Wilson
@ 2020-06-10 13:21   ` Mika Kuoppala
  0 siblings, 0 replies; 52+ messages in thread
From: Mika Kuoppala @ 2020-06-10 13:21 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Chris Wilson

Chris Wilson <chris@chris-wilson.co.uk> writes:

> We have a test case to exercise resetting an engine while the other
> engines are busy, all the TEST_SELF adds on top is that the target
> engine also has background activity. In this case it is useful to first
> test resetting the engine while there is background activity, as a
> separate flag from exercising all others.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>


Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>

> ---
>  drivers/gpu/drm/i915/gt/selftest_hangcheck.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> index 035f363fb0f8..2af66f8ffbd2 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
> @@ -805,10 +805,10 @@ static int __igt_reset_engines(struct intel_gt *gt,
>  			threads[tmp].resets =
>  				i915_reset_engine_count(global, other);
>  
> -			if (!(flags & TEST_OTHERS))
> +			if (other == engine && !(flags & TEST_SELF))
>  				continue;
>  
> -			if (other == engine && !(flags & TEST_SELF))
> +			if (other != engine && !(flags & TEST_OTHERS))
>  				continue;
>  
>  			threads[tmp].engine = other;
> @@ -999,7 +999,7 @@ static int igt_reset_engines(void *arg)
>  		},
>  		{
>  			"self-priority",
> -			TEST_OTHERS | TEST_ACTIVE | TEST_PRIORITY | TEST_SELF,
> +			TEST_ACTIVE | TEST_PRIORITY | TEST_SELF,
>  		},
>  		{ }
>  	};
> -- 
> 2.20.1
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 04/28] drm/i915/selftests: Remove live_suppress_wait_preempt
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 04/28] drm/i915/selftests: Remove live_suppress_wait_preempt Chris Wilson
@ 2020-06-11 11:38   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-06-11 11:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 07/06/2020 23:20, Chris Wilson wrote:
> With the removal of the internal wait-priority boosting, we can also
> remove the selftest to ensure that those waits were being suppressed
> from causing preemptions.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/gt/selftest_lrc.c | 178 -------------------------
>   1 file changed, 178 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 67d74e6432a8..e838e38a262c 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -2379,183 +2379,6 @@ static int live_suppress_self_preempt(void *arg)
>   	goto err_client_b;
>   }
>   
> -static int __i915_sw_fence_call
> -dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> -{
> -	return NOTIFY_DONE;
> -}
> -
> -static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> -{
> -	struct i915_request *rq;
> -
> -	rq = kzalloc(sizeof(*rq), GFP_KERNEL);
> -	if (!rq)
> -		return NULL;
> -
> -	rq->engine = engine;
> -
> -	spin_lock_init(&rq->lock);
> -	INIT_LIST_HEAD(&rq->fence.cb_list);
> -	rq->fence.lock = &rq->lock;
> -	rq->fence.ops = &i915_fence_ops;
> -
> -	i915_sched_node_init(&rq->sched);
> -
> -	/* mark this request as permanently incomplete */
> -	rq->fence.seqno = 1;
> -	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> -	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> -	GEM_BUG_ON(i915_request_completed(rq));
> -
> -	i915_sw_fence_init(&rq->submit, dummy_notify);
> -	set_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags);
> -
> -	spin_lock_init(&rq->lock);
> -	rq->fence.lock = &rq->lock;
> -	INIT_LIST_HEAD(&rq->fence.cb_list);
> -
> -	return rq;
> -}
> -
> -static void dummy_request_free(struct i915_request *dummy)
> -{
> -	/* We have to fake the CS interrupt to kick the next request */
> -	i915_sw_fence_commit(&dummy->submit);
> -
> -	i915_request_mark_complete(dummy);
> -	dma_fence_signal(&dummy->fence);
> -
> -	i915_sched_node_fini(&dummy->sched);
> -	i915_sw_fence_fini(&dummy->submit);
> -
> -	dma_fence_free(&dummy->fence);
> -}
> -
> -static int live_suppress_wait_preempt(void *arg)
> -{
> -	struct intel_gt *gt = arg;
> -	struct preempt_client client[4];
> -	struct i915_request *rq[ARRAY_SIZE(client)] = {};
> -	struct intel_engine_cs *engine;
> -	enum intel_engine_id id;
> -	int err = -ENOMEM;
> -	int i;
> -
> -	/*
> -	 * Waiters are given a little priority nudge, but not enough
> -	 * to actually cause any preemption. Double check that we do
> -	 * not needlessly generate preempt-to-idle cycles.
> -	 */
> -
> -	if (!HAS_LOGICAL_RING_PREEMPTION(gt->i915))
> -		return 0;
> -
> -	if (preempt_client_init(gt, &client[0])) /* ELSP[0] */
> -		return -ENOMEM;
> -	if (preempt_client_init(gt, &client[1])) /* ELSP[1] */
> -		goto err_client_0;
> -	if (preempt_client_init(gt, &client[2])) /* head of queue */
> -		goto err_client_1;
> -	if (preempt_client_init(gt, &client[3])) /* bystander */
> -		goto err_client_2;
> -
> -	for_each_engine(engine, gt, id) {
> -		int depth;
> -
> -		if (!intel_engine_has_preemption(engine))
> -			continue;
> -
> -		if (!engine->emit_init_breadcrumb)
> -			continue;
> -
> -		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
> -			struct i915_request *dummy;
> -
> -			engine->execlists.preempt_hang.count = 0;
> -
> -			dummy = dummy_request(engine);
> -			if (!dummy)
> -				goto err_client_3;
> -
> -			for (i = 0; i < ARRAY_SIZE(client); i++) {
> -				struct i915_request *this;
> -
> -				this = spinner_create_request(&client[i].spin,
> -							      client[i].ctx, engine,
> -							      MI_NOOP);
> -				if (IS_ERR(this)) {
> -					err = PTR_ERR(this);
> -					goto err_wedged;
> -				}
> -
> -				/* Disable NEWCLIENT promotion */
> -				__i915_active_fence_set(&i915_request_timeline(this)->last_request,
> -							&dummy->fence);
> -
> -				rq[i] = i915_request_get(this);
> -				i915_request_add(this);
> -			}
> -
> -			dummy_request_free(dummy);
> -
> -			GEM_BUG_ON(i915_request_completed(rq[0]));
> -			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
> -				pr_err("%s: First client failed to start\n",
> -				       engine->name);
> -				goto err_wedged;
> -			}
> -			GEM_BUG_ON(!i915_request_started(rq[0]));
> -
> -			if (i915_request_wait(rq[depth],
> -					      I915_WAIT_PRIORITY,
> -					      1) != -ETIME) {
> -				pr_err("%s: Waiter depth:%d completed!\n",
> -				       engine->name, depth);
> -				goto err_wedged;
> -			}
> -
> -			for (i = 0; i < ARRAY_SIZE(client); i++) {
> -				igt_spinner_end(&client[i].spin);
> -				i915_request_put(rq[i]);
> -				rq[i] = NULL;
> -			}
> -
> -			if (igt_flush_test(gt->i915))
> -				goto err_wedged;
> -
> -			if (engine->execlists.preempt_hang.count) {
> -				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
> -				       engine->name,
> -				       engine->execlists.preempt_hang.count,
> -				       depth);
> -				err = -EINVAL;
> -				goto err_client_3;
> -			}
> -		}
> -	}
> -
> -	err = 0;
> -err_client_3:
> -	preempt_client_fini(&client[3]);
> -err_client_2:
> -	preempt_client_fini(&client[2]);
> -err_client_1:
> -	preempt_client_fini(&client[1]);
> -err_client_0:
> -	preempt_client_fini(&client[0]);
> -	return err;
> -
> -err_wedged:
> -	for (i = 0; i < ARRAY_SIZE(client); i++) {
> -		igt_spinner_end(&client[i].spin);
> -		i915_request_put(rq[i]);
> -	}
> -	intel_gt_set_wedged(gt);
> -	err = -EIO;
> -	goto err_client_3;
> -}
> -
>   static int live_chain_preempt(void *arg)
>   {
>   	struct intel_gt *gt = arg;
> @@ -4592,7 +4415,6 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_nopreempt),
>   		SUBTEST(live_preempt_cancel),
>   		SUBTEST(live_suppress_self_preempt),
> -		SUBTEST(live_suppress_wait_preempt),
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_gang),
>   		SUBTEST(live_preempt_timeout),
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 05/28] drm/i915/selftests: Trim execlists runtime
  2020-06-07 22:20 ` [Intel-gfx] [PATCH 05/28] drm/i915/selftests: Trim execlists runtime Chris Wilson
@ 2020-06-12 23:05   ` Andi Shyti
  0 siblings, 0 replies; 52+ messages in thread
From: Andi Shyti @ 2020-06-12 23:05 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

Hi Chris,

On Sun, Jun 07, 2020 at 11:20:45PM +0100, Chris Wilson wrote:
> Reduce the smoke depth by trimming the number of contexts, repetitions
> and wait times. This is in preparation for a less greedy scheduler that
> tries to be fair across contexts, resulting in a great many more context
> switches. A thousand context switches may be 50-100ms, causing us to
> timeout as the HW is not fast enough to complete the deep smoketests.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>

looks all right to me:

Reviewed-by: Andi Shyti <andi.shyti@intel.com>

Andi
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
  2020-06-07 22:21 ` [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling Chris Wilson
@ 2020-06-16  9:07   ` Thomas Hellström (Intel)
  2020-06-16 10:12     ` Chris Wilson
  2020-06-16 10:54     ` Chris Wilson
  0 siblings, 2 replies; 52+ messages in thread
From: Thomas Hellström (Intel) @ 2020-06-16  9:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Hi, Chris,

Some comments and questions:

On 6/8/20 12:21 AM, Chris Wilson wrote:
> The first "scheduler" was a topographical sorting of requests into
> priority order. The execution order was deterministic, the earliest
> submitted, highest priority request would be executed first. Priority
> inherited ensured that inversions were kept at bay, and allowed us to
> dynamically boost priorities (e.g. for interactive pageflips).
>
> The minimalistic timeslicing scheme was an attempt to introduce fairness
> between long running requests, by evicting the active request at the end
> of a timeslice and moving it to the back of its priority queue (while
> ensuring that dependencies were kept in order). For short running
> requests from many clients of equal priority, the scheme is still very
> much FIFO submission ordering, and as unfair as before.
>
> To impose fairness, we need an external metric that ensures that clients
> are interpersed, we don't execute one long chain from client A before
> executing any of client B. This could be imposed by the clients by using
> a fences based on an external clock, that is they only submit work for a
> "frame" at frame-interval, instead of submitting as much work as they
> are able to. The standard SwapBuffers approach is akin to double
> bufferring, where as one frame is being executed, the next is being
> submitted, such that there is always a maximum of two frames per client
> in the pipeline. Even this scheme exhibits unfairness under load as a
> single client will execute two frames back to back before the next, and
> with enough clients, deadlines will be missed.
>
> The idea introduced by BFS/MuQSS is that fairness is introduced by
> metering with an external clock. Every request, when it becomes ready to
> execute is assigned a virtual deadline, and execution order is then
> determined by earliest deadline. Priority is used as a hint, rather than
> strict ordering, where high priority requests have earlier deadlines,
> but not necessarily earlier than outstanding work. Thus work is executed
> in order of 'readiness', with timeslicing to demote long running work.
>
> The Achille's heel of this scheduler is its strong preference for
> low-latency and favouring of new queues. Whereas it was easy to dominate
> the old scheduler by flooding it with many requests over a short period
> of time, the new scheduler can be dominated by a 'synchronous' client
> that waits for each of its requests to complete before submitting the
> next. As such a client has no history, it is always considered
> ready-to-run and receives an earlier deadline than the long running
> requests.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  12 +-
>   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   4 +-
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  24 --
>   drivers/gpu/drm/i915/gt/intel_lrc.c           | 328 +++++++-----------
>   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
>   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  43 ++-
>   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   6 +-
>   drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +-
>   drivers/gpu/drm/i915/i915_request.h           |   4 +-
>   drivers/gpu/drm/i915/i915_scheduler.c         | 322 ++++++++++++-----
>   drivers/gpu/drm/i915/i915_scheduler.h         |  22 +-
>   drivers/gpu/drm/i915/i915_scheduler_types.h   |  17 +
>   .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
>   drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
>   .../gpu/drm/i915/selftests/i915_scheduler.c   |  49 +++
>   16 files changed, 484 insertions(+), 362 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c

Do we have timings to back this change up? Would it make sense to have a 
configurable scheduler choice?

> @@ -1096,22 +1099,30 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   {
>   	struct i915_request *rq, *rn, *active = NULL;
>   	struct list_head *uninitialized_var(pl);
> -	int prio = I915_PRIORITY_INVALID;
> +	u64 deadline = I915_DEADLINE_NEVER;
>   
>   	lockdep_assert_held(&engine->active.lock);
>   
>   	list_for_each_entry_safe_reverse(rq, rn,
>   					 &engine->active.requests,
>   					 sched.link) {
> -		if (i915_request_completed(rq))
> +		if (i915_request_completed(rq)) {
> +			list_del_init(&rq->sched.link);
>   			continue; /* XXX */
> +		}

Is this an unrelated change? If so separate patch?

...


> @@ -2162,14 +2140,13 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			__unwind_incomplete_requests(engine);
>   
>   			last = NULL;
> -		} else if (need_timeslice(engine, last, ve) &&
> -			   timeslice_expired(execlists, last)) {
> +		} else if (timeslice_expired(engine, last)) {
>   			ENGINE_TRACE(engine,
> -				     "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n",
> -				     last->fence.context,
> -				     last->fence.seqno,
> -				     last->sched.attr.priority,
> -				     execlists->queue_priority_hint,
> +				     "expired:%s last=%llx:%llu, deadline=%llu, now=%llu, yield?=%s\n",
> +				     yesno(timer_expired(&execlists->timer)),
> +				     last->fence.context, last->fence.seqno,
> +				     rq_deadline(last),
> +				     i915_sched_to_ticks(ktime_get()),
>   				     yesno(timeslice_yield(execlists, last)));

There are multiple introductions of ktime_get() in the patch. Perhaps 
use monotonic clock source like ktime_get_raw()? Also immediately 
convert to ns.

...

> @@ -2837,10 +2788,7 @@ static void __execlists_unhold(struct i915_request *rq)
>   		GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
>   
>   		i915_request_clear_hold(rq);
> -		list_move_tail(&rq->sched.link,
> -			       i915_sched_lookup_priolist(rq->engine,
> -							  rq_prio(rq)));
> -		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> +		submit |= intel_engine_queue_request(rq->engine, rq);

As new to this codebase, I immediately wonder whether that bitwise or is 
intentional and whether you got the short-circuiting right. It looks 
correct to me.

...

> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 118ab6650d1f..23594e712292 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -561,7 +561,7 @@ static inline void i915_request_clear_hold(struct i915_request *rq)
>   }
>   
>   static inline struct intel_timeline *
> -i915_request_timeline(struct i915_request *rq)
> +i915_request_timeline(const struct i915_request *rq)
>   {
>   	/* Valid only while the request is being constructed (or retired). */
>   	return rcu_dereference_protected(rq->timeline,
> @@ -576,7 +576,7 @@ i915_request_gem_context(struct i915_request *rq)
>   }
>   
>   static inline struct intel_timeline *
> -i915_request_active_timeline(struct i915_request *rq)
> +i915_request_active_timeline(const struct i915_request *rq)

Are these unrelated? Separate patch?



>   {
>   	/*
>   	 * When in use during submission, we are protected by a guarantee that
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 4c189b81cc62..30bcb6f9d99f 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -20,6 +20,11 @@ static struct i915_global_scheduler {
>   static DEFINE_SPINLOCK(ipi_lock);
>   static LIST_HEAD(ipi_list);
>   
> +static inline u64 rq_deadline(const struct i915_request *rq)
> +{
> +	return READ_ONCE(rq->sched.deadline);
> +}
> +

Does this need a release barrier paired with an acquire barrier in 
__i915_request_set_deadline below?

> +
> +static bool __i915_request_set_deadline(struct i915_request *rq, u64 deadline)
> +{
> +	struct intel_engine_cs *engine = rq->engine;
> +	struct i915_request *rn;
> +	struct list_head *plist;
> +	LIST_HEAD(dfs);
> +
> +	lockdep_assert_held(&engine->active.lock);
> +	list_add(&rq->sched.dfs, &dfs);
> +
> +	list_for_each_entry(rq, &dfs, sched.dfs) {
> +		struct i915_dependency *p;
> +
> +		GEM_BUG_ON(rq->engine != engine);
> +
> +		for_each_signaler(p, rq) {
> +			struct i915_request *s =
> +				container_of(p->signaler, typeof(*s), sched);
> +
> +			GEM_BUG_ON(s == rq);
> +
> +			if (rq_deadline(s) <= deadline)
> +				continue;
> +
> +			if (i915_request_completed(s))
> +				continue;
> +
> +			if (s->engine != rq->engine) {
> +				spin_lock(&ipi_lock);
> +				if (deadline < p->ipi_deadline) {
> +					p->ipi_deadline = deadline;
> +					list_move(&p->ipi_link, &ipi_list);
> +					irq_work_queue(&ipi_work);
> +				}
> +				spin_unlock(&ipi_lock);
> +				continue;
> +			}
> +
> +			list_move_tail(&s->sched.dfs, &dfs);
> +		}
> +	}
> +
> +	plist = i915_sched_lookup_priolist(engine, deadline);
> +
> +	/* Fifo and depth-first replacement ensure our deps execute first */
> +	list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) {
> +		GEM_BUG_ON(rq->engine != engine);
> +		GEM_BUG_ON(deadline > rq_deadline(rq));
> +
> +		INIT_LIST_HEAD(&rq->sched.dfs);
> +		WRITE_ONCE(rq->sched.deadline, deadline);

An smp barrier needed?

...

> +static u64 prio_slice(int prio)
>   {
> -	const struct i915_request *inflight;
> +	u64 slice;
> +	int sf;
>   
>   	/*
> -	 * We only need to kick the tasklet once for the high priority
> -	 * new context we add into the queue.
> +	 * With a 1ms scheduling quantum:
> +	 *
> +	 *   MAX USER:  ~32us deadline
> +	 *   0:         ~16ms deadline
> +	 *   MIN_USER: 1000ms deadline
>   	 */
> -	if (prio <= engine->execlists.queue_priority_hint)
> -		return;
>   
> -	rcu_read_lock();
> +	if (prio >= __I915_PRIORITY_KERNEL__)
> +		return INT_MAX - prio;
>   
> -	/* Nothing currently active? We're overdue for a submission! */
> -	inflight = execlists_active(&engine->execlists);
> -	if (!inflight)
> -		goto unlock;
> +	slice = __I915_PRIORITY_KERNEL__ - prio;
> +	if (prio >= 0)
> +		sf = 20 - 6;
> +	else
> +		sf = 20 - 1;
> +
> +	return slice << sf;
> +}
> +

Is this the same deadline calculation as used in the BFS? Could you 
perhaps add a pointer to some documentation?


/Thomas


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
  2020-06-16  9:07   ` Thomas Hellström (Intel)
@ 2020-06-16 10:12     ` Chris Wilson
  2020-06-16 12:11       ` Thomas Hellström (Intel)
  2020-06-16 10:54     ` Chris Wilson
  1 sibling, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-06-16 10:12 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx

Quoting Thomas Hellström (Intel) (2020-06-16 10:07:28)
> Hi, Chris,
> 
> Some comments and questions:
> 
> On 6/8/20 12:21 AM, Chris Wilson wrote:
> > The first "scheduler" was a topographical sorting of requests into
> > priority order. The execution order was deterministic, the earliest
> > submitted, highest priority request would be executed first. Priority
> > inherited ensured that inversions were kept at bay, and allowed us to
> > dynamically boost priorities (e.g. for interactive pageflips).
> >
> > The minimalistic timeslicing scheme was an attempt to introduce fairness
> > between long running requests, by evicting the active request at the end
> > of a timeslice and moving it to the back of its priority queue (while
> > ensuring that dependencies were kept in order). For short running
> > requests from many clients of equal priority, the scheme is still very
> > much FIFO submission ordering, and as unfair as before.
> >
> > To impose fairness, we need an external metric that ensures that clients
> > are interpersed, we don't execute one long chain from client A before
> > executing any of client B. This could be imposed by the clients by using
> > a fences based on an external clock, that is they only submit work for a
> > "frame" at frame-interval, instead of submitting as much work as they
> > are able to. The standard SwapBuffers approach is akin to double
> > bufferring, where as one frame is being executed, the next is being
> > submitted, such that there is always a maximum of two frames per client
> > in the pipeline. Even this scheme exhibits unfairness under load as a
> > single client will execute two frames back to back before the next, and
> > with enough clients, deadlines will be missed.
> >
> > The idea introduced by BFS/MuQSS is that fairness is introduced by
> > metering with an external clock. Every request, when it becomes ready to
> > execute is assigned a virtual deadline, and execution order is then
> > determined by earliest deadline. Priority is used as a hint, rather than
> > strict ordering, where high priority requests have earlier deadlines,
> > but not necessarily earlier than outstanding work. Thus work is executed
> > in order of 'readiness', with timeslicing to demote long running work.
> >
> > The Achille's heel of this scheduler is its strong preference for
> > low-latency and favouring of new queues. Whereas it was easy to dominate
> > the old scheduler by flooding it with many requests over a short period
> > of time, the new scheduler can be dominated by a 'synchronous' client
> > that waits for each of its requests to complete before submitting the
> > next. As such a client has no history, it is always considered
> > ready-to-run and receives an earlier deadline than the long running
> > requests.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  12 +-
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
> >   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   4 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  24 --
> >   drivers/gpu/drm/i915/gt/intel_lrc.c           | 328 +++++++-----------
> >   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
> >   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  43 ++-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   6 +-
> >   drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +-
> >   drivers/gpu/drm/i915/i915_request.h           |   4 +-
> >   drivers/gpu/drm/i915/i915_scheduler.c         | 322 ++++++++++++-----
> >   drivers/gpu/drm/i915/i915_scheduler.h         |  22 +-
> >   drivers/gpu/drm/i915/i915_scheduler_types.h   |  17 +
> >   .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
> >   drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
> >   .../gpu/drm/i915/selftests/i915_scheduler.c   |  49 +++
> >   16 files changed, 484 insertions(+), 362 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c
> 
> Do we have timings to back this change up? Would it make sense to have a 
> configurable scheduler choice?

Yes, there's igt/benchmarks/gem_wsim to show the impact on scheduling
decisions for various workloads. (You can guess what the impact of
choosing a different execution order and forcing more context switches
will be... About -1% to throughput with multiple clients) And
igt/tests/gem_exec_schedule to test basic properties, with a bunch of new
fairness tests to try and decide if this is the right thing. Under
saturated conditions, there is no contest, a fair scheduler produces
consistent results, and the vdeadlines allow for realtime-response under
load.
 
> > @@ -1096,22 +1099,30 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
> >   {
> >       struct i915_request *rq, *rn, *active = NULL;
> >       struct list_head *uninitialized_var(pl);
> > -     int prio = I915_PRIORITY_INVALID;
> > +     u64 deadline = I915_DEADLINE_NEVER;
> >   
> >       lockdep_assert_held(&engine->active.lock);
> >   
> >       list_for_each_entry_safe_reverse(rq, rn,
> >                                        &engine->active.requests,
> >                                        sched.link) {
> > -             if (i915_request_completed(rq))
> > +             if (i915_request_completed(rq)) {
> > +                     list_del_init(&rq->sched.link);
> >                       continue; /* XXX */
> > +             }
> 
> Is this an unrelated change? If so separate patch?

It's not totally unrelated :)

> > @@ -2162,14 +2140,13 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >                       __unwind_incomplete_requests(engine);
> >   
> >                       last = NULL;
> > -             } else if (need_timeslice(engine, last, ve) &&
> > -                        timeslice_expired(execlists, last)) {
> > +             } else if (timeslice_expired(engine, last)) {
> >                       ENGINE_TRACE(engine,
> > -                                  "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n",
> > -                                  last->fence.context,
> > -                                  last->fence.seqno,
> > -                                  last->sched.attr.priority,
> > -                                  execlists->queue_priority_hint,
> > +                                  "expired:%s last=%llx:%llu, deadline=%llu, now=%llu, yield?=%s\n",
> > +                                  yesno(timer_expired(&execlists->timer)),
> > +                                  last->fence.context, last->fence.seqno,
> > +                                  rq_deadline(last),
> > +                                  i915_sched_to_ticks(ktime_get()),
> >                                    yesno(timeslice_yield(execlists, last)));
> 
> There are multiple introductions of ktime_get() in the patch. Perhaps 
> use monotonic clock source like ktime_get_raw()? Also immediately 
> convert to ns.

ktime_get() is monotonic. The only difference is that tkr_mono has an
wall-offset that tkr_raw does not. [I'm sure there's a good reason.] The
choice is really whether ktime_get_(mono|raw)_fast_ns() is sufficient for
our needs.

I do like the idea of having the deadline being some recognisable
timestamp, as it makes it easier to play with mixing in real, albeit
soft, deadlines.

> > @@ -2837,10 +2788,7 @@ static void __execlists_unhold(struct i915_request *rq)
> >               GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
> >   
> >               i915_request_clear_hold(rq);
> > -             list_move_tail(&rq->sched.link,
> > -                            i915_sched_lookup_priolist(rq->engine,
> > -                                                       rq_prio(rq)));
> > -             set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> > +             submit |= intel_engine_queue_request(rq->engine, rq);
> 
> As new to this codebase, I immediately wonder whether that bitwise or is 
> intentional and whether you got the short-circuiting right. It looks 
> correct to me.

bool submit, not many bits :)

> > diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> > index 118ab6650d1f..23594e712292 100644
> > --- a/drivers/gpu/drm/i915/i915_request.h
> > +++ b/drivers/gpu/drm/i915/i915_request.h
> > @@ -561,7 +561,7 @@ static inline void i915_request_clear_hold(struct i915_request *rq)
> >   }
> >   
> >   static inline struct intel_timeline *
> > -i915_request_timeline(struct i915_request *rq)
> > +i915_request_timeline(const struct i915_request *rq)
> >   {
> >       /* Valid only while the request is being constructed (or retired). */
> >       return rcu_dereference_protected(rq->timeline,
> > @@ -576,7 +576,7 @@ i915_request_gem_context(struct i915_request *rq)
> >   }
> >   
> >   static inline struct intel_timeline *
> > -i915_request_active_timeline(struct i915_request *rq)
> > +i915_request_active_timeline(const struct i915_request *rq)
> 
> Are these unrelated? Separate patch?

They were used at one point, when I had a const request.

> >   {
> >       /*
> >        * When in use during submission, we are protected by a guarantee that
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index 4c189b81cc62..30bcb6f9d99f 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -20,6 +20,11 @@ static struct i915_global_scheduler {
> >   static DEFINE_SPINLOCK(ipi_lock);
> >   static LIST_HEAD(ipi_list);
> >   
> > +static inline u64 rq_deadline(const struct i915_request *rq)
> > +{
> > +     return READ_ONCE(rq->sched.deadline);
> > +}
> > +
> 
> Does this need a release barrier paired with an acquire barrier in 
> __i915_request_set_deadline below?

No, the state can be inconsistent. If it changes as we are processing
the previous value, there will be another reschedule. Within
set_deadline, rq->sched.deadline is under the engine->active.lock, it is
just that rq_deadline() is used to peek before we take the lock, as well
as shorthand within the critical section.
 
> > +static bool __i915_request_set_deadline(struct i915_request *rq, u64 deadline)
> > +{
> > +     struct intel_engine_cs *engine = rq->engine;
> > +     struct i915_request *rn;
> > +     struct list_head *plist;
> > +     LIST_HEAD(dfs);
> > +
> > +     lockdep_assert_held(&engine->active.lock);
> > +     list_add(&rq->sched.dfs, &dfs);
> > +
> > +     list_for_each_entry(rq, &dfs, sched.dfs) {
> > +             struct i915_dependency *p;
> > +
> > +             GEM_BUG_ON(rq->engine != engine);
> > +
> > +             for_each_signaler(p, rq) {
> > +                     struct i915_request *s =
> > +                             container_of(p->signaler, typeof(*s), sched);
> > +
> > +                     GEM_BUG_ON(s == rq);
> > +
> > +                     if (rq_deadline(s) <= deadline)
> > +                             continue;
> > +
> > +                     if (i915_request_completed(s))
> > +                             continue;
> > +
> > +                     if (s->engine != rq->engine) {
> > +                             spin_lock(&ipi_lock);
> > +                             if (deadline < p->ipi_deadline) {
> > +                                     p->ipi_deadline = deadline;
> > +                                     list_move(&p->ipi_link, &ipi_list);
> > +                                     irq_work_queue(&ipi_work);
> > +                             }
> > +                             spin_unlock(&ipi_lock);
> > +                             continue;
> > +                     }
> > +
> > +                     list_move_tail(&s->sched.dfs, &dfs);
> > +             }
> > +     }
> > +
> > +     plist = i915_sched_lookup_priolist(engine, deadline);
> > +
> > +     /* Fifo and depth-first replacement ensure our deps execute first */
> > +     list_for_each_entry_safe_reverse(rq, rn, &dfs, sched.dfs) {
> > +             GEM_BUG_ON(rq->engine != engine);
> > +             GEM_BUG_ON(deadline > rq_deadline(rq));
> > +
> > +             INIT_LIST_HEAD(&rq->sched.dfs);
> > +             WRITE_ONCE(rq->sched.deadline, deadline);
> 
> An smp barrier needed?

No. It is locked by engine->active.lock, with a couple of peeks before
the lock that do not require serialisation with other changes.

> > +static u64 prio_slice(int prio)
> >   {
> > -     const struct i915_request *inflight;
> > +     u64 slice;
> > +     int sf;
> >   
> >       /*
> > -      * We only need to kick the tasklet once for the high priority
> > -      * new context we add into the queue.
> > +      * With a 1ms scheduling quantum:
> > +      *
> > +      *   MAX USER:  ~32us deadline
> > +      *   0:         ~16ms deadline
> > +      *   MIN_USER: 1000ms deadline
> >        */
> > -     if (prio <= engine->execlists.queue_priority_hint)
> > -             return;
> >   
> > -     rcu_read_lock();
> > +     if (prio >= __I915_PRIORITY_KERNEL__)
> > +             return INT_MAX - prio;
> >   
> > -     /* Nothing currently active? We're overdue for a submission! */
> > -     inflight = execlists_active(&engine->execlists);
> > -     if (!inflight)
> > -             goto unlock;
> > +     slice = __I915_PRIORITY_KERNEL__ - prio;
> > +     if (prio >= 0)
> > +             sf = 20 - 6;
> > +     else
> > +             sf = 20 - 1;
> > +
> > +     return slice << sf;
> > +}
> > +
> 
> Is this the same deadline calculation as used in the BFS? Could you 
> perhaps add a pointer to some documentation?

It is a heuristic. The scale factor in BFS is designed for a smaller
range and is not effective for passing our existing priority ordering
tests.

The challenge is to pick something that is fair that roughly matches
usage. It basically says that if client A submits 3 requests, then
client B, C will be able to run before the later requests of client A so
long as they are submitted within 16ms. Currently we get AAABC,
the vdeadlines turn that into ABCAA. So we would ideally like the quota
for each client to reflect their needs, so if client A needed all 3
requests within 16ms, it would have a vdeadline closer to 5ms (and so it
would compete for the GPU against other clients). Now with this less
strict priority system we can let normal userspace bump their
priorities, or we can use the average context runtime to try and adjust
priorities on the fly (i.e. do not used an unbias quota). But I suspect
removing any fairness will skew the scheduler once more.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
  2020-06-16  9:07   ` Thomas Hellström (Intel)
  2020-06-16 10:12     ` Chris Wilson
@ 2020-06-16 10:54     ` Chris Wilson
  1 sibling, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-16 10:54 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx

Quoting Thomas Hellström (Intel) (2020-06-16 10:07:28)
> Hi, Chris,
> 
> Some comments and questions:
> 
> On 6/8/20 12:21 AM, Chris Wilson wrote:
> > The first "scheduler" was a topographical sorting of requests into
> > priority order. The execution order was deterministic, the earliest
> > submitted, highest priority request would be executed first. Priority
> > inherited ensured that inversions were kept at bay, and allowed us to
> > dynamically boost priorities (e.g. for interactive pageflips).
> >
> > The minimalistic timeslicing scheme was an attempt to introduce fairness
> > between long running requests, by evicting the active request at the end
> > of a timeslice and moving it to the back of its priority queue (while
> > ensuring that dependencies were kept in order). For short running
> > requests from many clients of equal priority, the scheme is still very
> > much FIFO submission ordering, and as unfair as before.
> >
> > To impose fairness, we need an external metric that ensures that clients
> > are interpersed, we don't execute one long chain from client A before
> > executing any of client B. This could be imposed by the clients by using
> > a fences based on an external clock, that is they only submit work for a
> > "frame" at frame-interval, instead of submitting as much work as they
> > are able to. The standard SwapBuffers approach is akin to double
> > bufferring, where as one frame is being executed, the next is being
> > submitted, such that there is always a maximum of two frames per client
> > in the pipeline. Even this scheme exhibits unfairness under load as a
> > single client will execute two frames back to back before the next, and
> > with enough clients, deadlines will be missed.
> >
> > The idea introduced by BFS/MuQSS is that fairness is introduced by
> > metering with an external clock. Every request, when it becomes ready to
> > execute is assigned a virtual deadline, and execution order is then
> > determined by earliest deadline. Priority is used as a hint, rather than
> > strict ordering, where high priority requests have earlier deadlines,
> > but not necessarily earlier than outstanding work. Thus work is executed
> > in order of 'readiness', with timeslicing to demote long running work.
> >
> > The Achille's heel of this scheduler is its strong preference for
> > low-latency and favouring of new queues. Whereas it was easy to dominate
> > the old scheduler by flooding it with many requests over a short period
> > of time, the new scheduler can be dominated by a 'synchronous' client
> > that waits for each of its requests to complete before submitting the
> > next. As such a client has no history, it is always considered
> > ready-to-run and receives an earlier deadline than the long running
> > requests.
> >
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  12 +-
> >   .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
> >   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   4 +-
> >   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  24 --
> >   drivers/gpu/drm/i915/gt/intel_lrc.c           | 328 +++++++-----------
> >   drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
> >   drivers/gpu/drm/i915/gt/selftest_lrc.c        |  43 ++-
> >   .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   6 +-
> >   drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +-
> >   drivers/gpu/drm/i915/i915_request.h           |   4 +-
> >   drivers/gpu/drm/i915/i915_scheduler.c         | 322 ++++++++++++-----
> >   drivers/gpu/drm/i915/i915_scheduler.h         |  22 +-
> >   drivers/gpu/drm/i915/i915_scheduler_types.h   |  17 +
> >   .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
> >   drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
> >   .../gpu/drm/i915/selftests/i915_scheduler.c   |  49 +++
> >   16 files changed, 484 insertions(+), 362 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c
> 
> Do we have timings to back this change up? Would it make sense to have a 
> configurable scheduler choice?

gem_wsim workloads with different load balancers, varying the number of
clients, % variation from previous patch.
+mB--------------------------------------------------------------------+
|                               a                                      |
|                             cda                                      |
|                             c.a                                      |
|                             ..aa                                     |
|                           ..---.                                     |
|                           -.--+-.                                    |
|                        .c.-.-+++.  b                                 |
|               b    bb.d-c-+--+++.aab aa    b b                       |
|b  b   b   b  b.  b ..---+++-+++++....a. b. b b   b       b    b     b|
|                               A|                                     |
|                         |___AM____|                                  |
|                            |A__|                                     |
|                            |MA_|                                     |
+----------------------------------------------------------------------+

Clients     N           Min           Max        Median           Avg        Stddev
1          63          -8.2           5.4        -0.045      -0.02375   0.094722134
2          63        -15.96         19.28         -0.64         -1.05     2.2428076
4          63         -5.11          2.95         -1.15    -1.0683333    0.72382651
8          63         -5.63          1.85        -0.905   -0.87122449    0.73390971

The wildest swings there do appear to be a result of interrupt latency,
with the -1% impact from execution order and more context switching.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
  2020-06-16 10:12     ` Chris Wilson
@ 2020-06-16 12:11       ` Thomas Hellström (Intel)
  2020-06-16 12:44         ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Thomas Hellström (Intel) @ 2020-06-16 12:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Hi,

On 6/16/20 12:12 PM, Chris Wilson wrote:
> Quoting Thomas Hellström (Intel) (2020-06-16 10:07:28)
>> Hi, Chris,
>>
>> Some comments and questions:
>>
>> On 6/8/20 12:21 AM, Chris Wilson wrote:
>>> The first "scheduler" was a topographical sorting of requests into
>>> priority order. The execution order was deterministic, the earliest
>>> submitted, highest priority request would be executed first. Priority
>>> inherited ensured that inversions were kept at bay, and allowed us to
>>> dynamically boost priorities (e.g. for interactive pageflips).
>>>
>>> The minimalistic timeslicing scheme was an attempt to introduce fairness
>>> between long running requests, by evicting the active request at the end
>>> of a timeslice and moving it to the back of its priority queue (while
>>> ensuring that dependencies were kept in order). For short running
>>> requests from many clients of equal priority, the scheme is still very
>>> much FIFO submission ordering, and as unfair as before.
>>>
>>> To impose fairness, we need an external metric that ensures that clients
>>> are interpersed, we don't execute one long chain from client A before
>>> executing any of client B. This could be imposed by the clients by using
>>> a fences based on an external clock, that is they only submit work for a
>>> "frame" at frame-interval, instead of submitting as much work as they
>>> are able to. The standard SwapBuffers approach is akin to double
>>> bufferring, where as one frame is being executed, the next is being
>>> submitted, such that there is always a maximum of two frames per client
>>> in the pipeline. Even this scheme exhibits unfairness under load as a
>>> single client will execute two frames back to back before the next, and
>>> with enough clients, deadlines will be missed.
>>>
>>> The idea introduced by BFS/MuQSS is that fairness is introduced by
>>> metering with an external clock. Every request, when it becomes ready to
>>> execute is assigned a virtual deadline, and execution order is then
>>> determined by earliest deadline. Priority is used as a hint, rather than
>>> strict ordering, where high priority requests have earlier deadlines,
>>> but not necessarily earlier than outstanding work. Thus work is executed
>>> in order of 'readiness', with timeslicing to demote long running work.
>>>
>>> The Achille's heel of this scheduler is its strong preference for
>>> low-latency and favouring of new queues. Whereas it was easy to dominate
>>> the old scheduler by flooding it with many requests over a short period
>>> of time, the new scheduler can be dominated by a 'synchronous' client
>>> that waits for each of its requests to complete before submitting the
>>> next. As such a client has no history, it is always considered
>>> ready-to-run and receives an earlier deadline than the long running
>>> requests.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  12 +-
>>>    .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
>>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   4 +-
>>>    drivers/gpu/drm/i915/gt/intel_engine_types.h  |  24 --
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c           | 328 +++++++-----------
>>>    drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
>>>    drivers/gpu/drm/i915/gt/selftest_lrc.c        |  43 ++-
>>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   6 +-
>>>    drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +-
>>>    drivers/gpu/drm/i915/i915_request.h           |   4 +-
>>>    drivers/gpu/drm/i915/i915_scheduler.c         | 322 ++++++++++++-----
>>>    drivers/gpu/drm/i915/i915_scheduler.h         |  22 +-
>>>    drivers/gpu/drm/i915/i915_scheduler_types.h   |  17 +
>>>    .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
>>>    drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
>>>    .../gpu/drm/i915/selftests/i915_scheduler.c   |  49 +++
>>>    16 files changed, 484 insertions(+), 362 deletions(-)
>>>    create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c
>> Do we have timings to back this change up? Would it make sense to have a
>> configurable scheduler choice?
> Yes, there's igt/benchmarks/gem_wsim to show the impact on scheduling
> decisions for various workloads. (You can guess what the impact of
> choosing a different execution order and forcing more context switches
> will be... About -1% to throughput with multiple clients) And
> igt/tests/gem_exec_schedule to test basic properties, with a bunch of new
> fairness tests to try and decide if this is the right thing. Under
> saturated conditions, there is no contest, a fair scheduler produces
> consistent results, and the vdeadlines allow for realtime-response under
> load.

Yeah, it's not really to convince me, but to provide a reference for the 
future.

Perhaps add the gem_wsim timings to the commit message?

>
>> There are multiple introductions of ktime_get() in the patch. Perhaps
>> use monotonic clock source like ktime_get_raw()? Also immediately
>> convert to ns.
> ktime_get() is monotonic. The only difference is that tkr_mono has an
> wall-offset that tkr_raw does not. [I'm sure there's a good reason.] The
> choice is really whether ktime_get_(mono|raw)_fast_ns() is sufficient for
> our needs.

Hmm. Yes you're right. I was thinking about the NTP adjustments. But 
given the requirement below they might be unimportant.

>
> I do like the idea of having the deadline being some recognisable
> timestamp, as it makes it easier to play with mixing in real, albeit
> soft, deadlines.


>
>>> @@ -2837,10 +2788,7 @@ static void __execlists_unhold(struct i915_request *rq)
>>>                GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
>>>    
>>>                i915_request_clear_hold(rq);
>>> -             list_move_tail(&rq->sched.link,
>>> -                            i915_sched_lookup_priolist(rq->engine,
>>> -                                                       rq_prio(rq)));
>>> -             set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
>>> +             submit |= intel_engine_queue_request(rq->engine, rq);
>> As new to this codebase, I immediately wonder whether that bitwise or is
>> intentional and whether you got the short-circuiting right. It looks
>> correct to me.
> bool submit, not many bits :)

Yes, the code is correct. My question was related to whether it was 
accepted practice, considering a future reader may think it might have 
been a mistake and change it anyway....

>
>>>    {
>>>        /*
>>>         * When in use during submission, we are protected by a guarantee that
>>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
>>> index 4c189b81cc62..30bcb6f9d99f 100644
>>> --- a/drivers/gpu/drm/i915/i915_scheduler.c
>>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
>>> @@ -20,6 +20,11 @@ static struct i915_global_scheduler {
>>>    static DEFINE_SPINLOCK(ipi_lock);
>>>    static LIST_HEAD(ipi_list);
>>>    
>>> +static inline u64 rq_deadline(const struct i915_request *rq)
>>> +{
>>> +     return READ_ONCE(rq->sched.deadline);
>>> +}
>>> +
>> Does this need a release barrier paired with an acquire barrier in
>> __i915_request_set_deadline below?
> No, the state can be inconsistent. If it changes as we are processing
> the previous value, there will be another reschedule. Within
> set_deadline, rq->sched.deadline is under the engine->active.lock, it is
> just that rq_deadline() is used to peek before we take the lock, as well
> as shorthand within the critical section.
>   

OK, understood.


>>> +static u64 prio_slice(int prio)
>>>    {
>>> -     const struct i915_request *inflight;
>>> +     u64 slice;
>>> +     int sf;
>>>    
>>>        /*
>>> -      * We only need to kick the tasklet once for the high priority
>>> -      * new context we add into the queue.
>>> +      * With a 1ms scheduling quantum:
>>> +      *
>>> +      *   MAX USER:  ~32us deadline
>>> +      *   0:         ~16ms deadline
>>> +      *   MIN_USER: 1000ms deadline
>>>         */
>>> -     if (prio <= engine->execlists.queue_priority_hint)
>>> -             return;
>>>    
>>> -     rcu_read_lock();
>>> +     if (prio >= __I915_PRIORITY_KERNEL__)
>>> +             return INT_MAX - prio;
>>>    
>>> -     /* Nothing currently active? We're overdue for a submission! */
>>> -     inflight = execlists_active(&engine->execlists);
>>> -     if (!inflight)
>>> -             goto unlock;
>>> +     slice = __I915_PRIORITY_KERNEL__ - prio;
>>> +     if (prio >= 0)
>>> +             sf = 20 - 6;
>>> +     else
>>> +             sf = 20 - 1;
>>> +
>>> +     return slice << sf;
>>> +}
>>> +
>> Is this the same deadline calculation as used in the BFS? Could you
>> perhaps add a pointer to some documentation?
> It is a heuristic. The scale factor in BFS is designed for a smaller
> range and is not effective for passing our existing priority ordering
> tests.
>
> The challenge is to pick something that is fair that roughly matches
> usage. It basically says that if client A submits 3 requests, then
> client B, C will be able to run before the later requests of client A so
> long as they are submitted within 16ms. Currently we get AAABC,
> the vdeadlines turn that into ABCAA. So we would ideally like the quota
> for each client to reflect their needs, so if client A needed all 3
> requests within 16ms, it would have a vdeadline closer to 5ms (and so it
> would compete for the GPU against other clients). Now with this less
> strict priority system we can let normal userspace bump their
> priorities, or we can use the average context runtime to try and adjust
> priorities on the fly (i.e. do not used an unbias quota). But I suspect
> removing any fairness will skew the scheduler once more.

OK, also for future reference, it would be good to have at least a 
subset of this documented somewhere!

Thanks,

Thomas



> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
  2020-06-16 12:11       ` Thomas Hellström (Intel)
@ 2020-06-16 12:44         ` Chris Wilson
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-06-16 12:44 UTC (permalink / raw)
  To: Thomas Hellström, intel-gfx

Quoting Thomas Hellström (Intel) (2020-06-16 13:11:25)
> Hi,
> 
> On 6/16/20 12:12 PM, Chris Wilson wrote:
> > Quoting Thomas Hellström (Intel) (2020-06-16 10:07:28)
> >> Hi, Chris,
> >>
> >> Some comments and questions:
> >>
> >> On 6/8/20 12:21 AM, Chris Wilson wrote:
> >>> The first "scheduler" was a topographical sorting of requests into
> >>> priority order. The execution order was deterministic, the earliest
> >>> submitted, highest priority request would be executed first. Priority
> >>> inherited ensured that inversions were kept at bay, and allowed us to
> >>> dynamically boost priorities (e.g. for interactive pageflips).
> >>>
> >>> The minimalistic timeslicing scheme was an attempt to introduce fairness
> >>> between long running requests, by evicting the active request at the end
> >>> of a timeslice and moving it to the back of its priority queue (while
> >>> ensuring that dependencies were kept in order). For short running
> >>> requests from many clients of equal priority, the scheme is still very
> >>> much FIFO submission ordering, and as unfair as before.
> >>>
> >>> To impose fairness, we need an external metric that ensures that clients
> >>> are interpersed, we don't execute one long chain from client A before
> >>> executing any of client B. This could be imposed by the clients by using
> >>> a fences based on an external clock, that is they only submit work for a
> >>> "frame" at frame-interval, instead of submitting as much work as they
> >>> are able to. The standard SwapBuffers approach is akin to double
> >>> bufferring, where as one frame is being executed, the next is being
> >>> submitted, such that there is always a maximum of two frames per client
> >>> in the pipeline. Even this scheme exhibits unfairness under load as a
> >>> single client will execute two frames back to back before the next, and
> >>> with enough clients, deadlines will be missed.
> >>>
> >>> The idea introduced by BFS/MuQSS is that fairness is introduced by
> >>> metering with an external clock. Every request, when it becomes ready to
> >>> execute is assigned a virtual deadline, and execution order is then
> >>> determined by earliest deadline. Priority is used as a hint, rather than
> >>> strict ordering, where high priority requests have earlier deadlines,
> >>> but not necessarily earlier than outstanding work. Thus work is executed
> >>> in order of 'readiness', with timeslicing to demote long running work.
> >>>
> >>> The Achille's heel of this scheduler is its strong preference for
> >>> low-latency and favouring of new queues. Whereas it was easy to dominate
> >>> the old scheduler by flooding it with many requests over a short period
> >>> of time, the new scheduler can be dominated by a 'synchronous' client
> >>> that waits for each of its requests to complete before submitting the
> >>> next. As such a client has no history, it is always considered
> >>> ready-to-run and receives an earlier deadline than the long running
> >>> requests.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  12 +-
> >>>    .../gpu/drm/i915/gt/intel_engine_heartbeat.c  |   1 +
> >>>    drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   4 +-
> >>>    drivers/gpu/drm/i915/gt/intel_engine_types.h  |  24 --
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c           | 328 +++++++-----------
> >>>    drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   5 +-
> >>>    drivers/gpu/drm/i915/gt/selftest_lrc.c        |  43 ++-
> >>>    .../gpu/drm/i915/gt/uc/intel_guc_submission.c |   6 +-
> >>>    drivers/gpu/drm/i915/i915_priolist_types.h    |   7 +-
> >>>    drivers/gpu/drm/i915/i915_request.h           |   4 +-
> >>>    drivers/gpu/drm/i915/i915_scheduler.c         | 322 ++++++++++++-----
> >>>    drivers/gpu/drm/i915/i915_scheduler.h         |  22 +-
> >>>    drivers/gpu/drm/i915/i915_scheduler_types.h   |  17 +
> >>>    .../drm/i915/selftests/i915_mock_selftests.h  |   1 +
> >>>    drivers/gpu/drm/i915/selftests/i915_request.c |   1 +
> >>>    .../gpu/drm/i915/selftests/i915_scheduler.c   |  49 +++
> >>>    16 files changed, 484 insertions(+), 362 deletions(-)
> >>>    create mode 100644 drivers/gpu/drm/i915/selftests/i915_scheduler.c
> >> Do we have timings to back this change up? Would it make sense to have a
> >> configurable scheduler choice?
> > Yes, there's igt/benchmarks/gem_wsim to show the impact on scheduling
> > decisions for various workloads. (You can guess what the impact of
> > choosing a different execution order and forcing more context switches
> > will be... About -1% to throughput with multiple clients) And
> > igt/tests/gem_exec_schedule to test basic properties, with a bunch of new
> > fairness tests to try and decide if this is the right thing. Under
> > saturated conditions, there is no contest, a fair scheduler produces
> > consistent results, and the vdeadlines allow for realtime-response under
> > load.
> 
> Yeah, it's not really to convince me, but to provide a reference for the 
> future.
> 
> Perhaps add the gem_wsim timings to the commit message?

I don't like posting such benchmarks without saying how they can be
reproduced or providing absolute values to verify future runs. Our rules
are terrible.

This trimmed down set takes about a day to run, and we've yet to
convince people that this is a fundamental requirement for CI. So it's
really frustrating, the best we can try and do is distill essential
requirements into a pass/fair test to be run on debug kernels. At least
we cover the pathological exploits, except for where they are so bad CI
complains for them taking too long.

> >> There are multiple introductions of ktime_get() in the patch. Perhaps
> >> use monotonic clock source like ktime_get_raw()? Also immediately
> >> convert to ns.
> > ktime_get() is monotonic. The only difference is that tkr_mono has an
> > wall-offset that tkr_raw does not. [I'm sure there's a good reason.] The
> > choice is really whether ktime_get_(mono|raw)_fast_ns() is sufficient for
> > our needs.
> 
> Hmm. Yes you're right. I was thinking about the NTP adjustments. But 
> given the requirement below they might be unimportant.

I never know which is the right one to use :|

Just follow the guideline that the shortest function name is the
intended one to use, unless you understand why you should use one of the
others.

> > I do like the idea of having the deadline being some recognisable
> > timestamp, as it makes it easier to play with mixing in real, albeit
> > soft, deadlines.
> 
> 
> >
> >>> @@ -2837,10 +2788,7 @@ static void __execlists_unhold(struct i915_request *rq)
> >>>                GEM_BUG_ON(!i915_sw_fence_signaled(&rq->submit));
> >>>    
> >>>                i915_request_clear_hold(rq);
> >>> -             list_move_tail(&rq->sched.link,
> >>> -                            i915_sched_lookup_priolist(rq->engine,
> >>> -                                                       rq_prio(rq)));
> >>> -             set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
> >>> +             submit |= intel_engine_queue_request(rq->engine, rq);
> >> As new to this codebase, I immediately wonder whether that bitwise or is
> >> intentional and whether you got the short-circuiting right. It looks
> >> correct to me.
> > bool submit, not many bits :)
> 
> Yes, the code is correct. My question was related to whether it was 
> accepted practice, considering a future reader may think it might have 
> been a mistake and change it anyway....

submit |= vs if () submit = true

I feel I have used both variants in this patch.

> >>>    {
> >>>        /*
> >>>         * When in use during submission, we are protected by a guarantee that
> >>> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> >>> index 4c189b81cc62..30bcb6f9d99f 100644
> >>> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> >>> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> >>> @@ -20,6 +20,11 @@ static struct i915_global_scheduler {
> >>>    static DEFINE_SPINLOCK(ipi_lock);
> >>>    static LIST_HEAD(ipi_list);
> >>>    
> >>> +static inline u64 rq_deadline(const struct i915_request *rq)
> >>> +{
> >>> +     return READ_ONCE(rq->sched.deadline);
> >>> +}
> >>> +
> >> Does this need a release barrier paired with an acquire barrier in
> >> __i915_request_set_deadline below?
> > No, the state can be inconsistent. If it changes as we are processing
> > the previous value, there will be another reschedule. Within
> > set_deadline, rq->sched.deadline is under the engine->active.lock, it is
> > just that rq_deadline() is used to peek before we take the lock, as well
> > as shorthand within the critical section.
> >   
> 
> OK, understood.
> 
> 
> >>> +static u64 prio_slice(int prio)
> >>>    {
> >>> -     const struct i915_request *inflight;
> >>> +     u64 slice;
> >>> +     int sf;
> >>>    
> >>>        /*
> >>> -      * We only need to kick the tasklet once for the high priority
> >>> -      * new context we add into the queue.
> >>> +      * With a 1ms scheduling quantum:
> >>> +      *
> >>> +      *   MAX USER:  ~32us deadline
> >>> +      *   0:         ~16ms deadline
> >>> +      *   MIN_USER: 1000ms deadline
> >>>         */
> >>> -     if (prio <= engine->execlists.queue_priority_hint)
> >>> -             return;
> >>>    
> >>> -     rcu_read_lock();
> >>> +     if (prio >= __I915_PRIORITY_KERNEL__)
> >>> +             return INT_MAX - prio;
> >>>    
> >>> -     /* Nothing currently active? We're overdue for a submission! */
> >>> -     inflight = execlists_active(&engine->execlists);
> >>> -     if (!inflight)
> >>> -             goto unlock;
> >>> +     slice = __I915_PRIORITY_KERNEL__ - prio;
> >>> +     if (prio >= 0)
> >>> +             sf = 20 - 6;
> >>> +     else
> >>> +             sf = 20 - 1;
> >>> +
> >>> +     return slice << sf;
> >>> +}
> >>> +
> >> Is this the same deadline calculation as used in the BFS? Could you
> >> perhaps add a pointer to some documentation?
> > It is a heuristic. The scale factor in BFS is designed for a smaller
> > range and is not effective for passing our existing priority ordering
> > tests.
> >
> > The challenge is to pick something that is fair that roughly matches
> > usage. It basically says that if client A submits 3 requests, then
> > client B, C will be able to run before the later requests of client A so
> > long as they are submitted within 16ms. Currently we get AAABC,
> > the vdeadlines turn that into ABCAA. So we would ideally like the quota
> > for each client to reflect their needs, so if client A needed all 3
> > requests within 16ms, it would have a vdeadline closer to 5ms (and so it
> > would compete for the GPU against other clients). Now with this less
> > strict priority system we can let normal userspace bump their
> > priorities, or we can use the average context runtime to try and adjust
> > priorities on the fly (i.e. do not used an unbias quota). But I suspect
> > removing any fairness will skew the scheduler once more.
> 
> OK, also for future reference, it would be good to have at least a 
> subset of this documented somewhere!

Yeah. I think prio_slice is the best spot to try and explain why
different priorities have different quotas, and the impact.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling
@ 2020-06-08 20:27 kernel test robot
  0 siblings, 0 replies; 52+ messages in thread
From: kernel test robot @ 2020-06-08 20:27 UTC (permalink / raw)
  To: kbuild

[-- Attachment #1: Type: text/plain, Size: 7770 bytes --]

CC: kbuild-all(a)lists.01.org
In-Reply-To: <20200607222108.14401-26-chris@chris-wilson.co.uk>
References: <20200607222108.14401-26-chris@chris-wilson.co.uk>
TO: Chris Wilson <chris@chris-wilson.co.uk>
TO: intel-gfx(a)lists.freedesktop.org
CC: Chris Wilson <chris@chris-wilson.co.uk>

Hi Chris,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on drm-tip/drm-tip]
[cannot apply to drm-intel/for-linux-next linus/master v5.7 next-20200608]
[if your patch is applied to the wrong git tree, please drop us a note to help
improve the system. BTW, we also suggest to use '--base' option to specify the
base tree in git format-patch, please see https://stackoverflow.com/a/37406982]

url:    https://github.com/0day-ci/linux/commits/Chris-Wilson/drm-i915-Adjust-the-sentinel-assert-to-match-implementation/20200608-062434
base:   git://anongit.freedesktop.org/drm/drm-tip drm-tip
:::::: branch date: 22 hours ago
:::::: commit date: 22 hours ago
compiler: gcc-9 (Debian 9.3.0-13) 9.3.0

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>


cppcheck warnings: (new ones prefixed by >>)

   drivers/gpu/drm/i915/gt/intel_lrc.c:5664:3: warning: %d in format string (no. 2) requires 'int' but the argument type is 'unsigned int'. [invalidPrintfArgType_sint]
     snprintf(ve->base.name, sizeof(ve->base.name),
     ^
>> drivers/gpu/drm/i915/gt/intel_lrc.c:1117:8: warning: Local variable deadline shadows outer variable [shadowVar]
      u64 deadline =
          ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:1102:6: note: Shadowed declaration
    u64 deadline = I915_DEADLINE_NEVER;
        ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:1117:8: note: Shadow variable
      u64 deadline =
          ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:5821:24: warning: Local variable rq shadows outer variable [shadowVar]
     struct i915_request *rq = READ_ONCE(ve->request);
                          ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:5771:23: note: Shadowed declaration
    struct i915_request *rq, *last;
                         ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:5821:24: note: Shadow variable
     struct i915_request *rq = READ_ONCE(ve->request);
                          ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:4326:33: warning: Clarify calculation precedence for '&' and '?'. [clarifyCalculation]
     (flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
                                   ^
   drivers/gpu/drm/i915/gt/intel_lrc.c:4348:33: warning: Clarify calculation precedence for '&' and '?'. [clarifyCalculation]
     (flags & I915_DISPATCH_SECURE ? 0 : BIT(8));
                                   ^

# https://github.com/0day-ci/linux/commit/4e2dc7aa430c1b1de4e6da6a012e1f032c86a032
git remote add linux-review https://github.com/0day-ci/linux
git remote update linux-review
git checkout 4e2dc7aa430c1b1de4e6da6a012e1f032c86a032
vim +1117 drivers/gpu/drm/i915/gt/intel_lrc.c

7dc56af5260e95 drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-09-24  1096  
eb8d0f5af4ec2d drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2019-01-25  1097  static struct i915_request *
4cc79cbb01ef35 drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-05-15  1098  __unwind_incomplete_requests(struct intel_engine_cs *engine)
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1099  {
b16c765122f987 drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-10-01  1100  	struct i915_request *rq, *rn, *active = NULL;
85f5e1f385b764 drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-10-01  1101  	struct list_head *uninitialized_var(pl);
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1102  	u64 deadline = I915_DEADLINE_NEVER;
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1103  
422d7df4f090bb drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-06-14  1104  	lockdep_assert_held(&engine->active.lock);
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1105  
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1106  	list_for_each_entry_safe_reverse(rq, rn,
422d7df4f090bb drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-06-14  1107  					 &engine->active.requests,
422d7df4f090bb drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-06-14  1108  					 sched.link) {
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1109  		if (i915_request_completed(rq)) {
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1110  			list_del_init(&rq->sched.link);
22b7a426bbe1eb drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-06-20  1111  			continue; /* XXX */
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1112  		}
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1113  
e61e0f51ba7974 drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-02-21  1114  		__i915_request_unsubmit(rq);
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1115  
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1116  		if (i915_request_started(rq)) {
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07 @1117  			u64 deadline =
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1118  				i915_scheduler_next_virtual_deadline(rq_prio(rq));
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1119  			rq->sched.deadline = min(rq_deadline(rq), deadline);
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1120  		}
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1121  
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1122  		if (rq_deadline(rq) != deadline) {
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1123  			deadline = rq_deadline(rq);
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1124  			pl = i915_sched_lookup_priolist(engine, deadline);
4e2dc7aa430c1b drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-06-07  1125  
097a94815fb61b drivers/gpu/drm/i915/intel_lrc.c    Michał Winiarski 2017-09-28  1126  		}
8db05f5947132b drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-09-19  1127  		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
097a94815fb61b drivers/gpu/drm/i915/intel_lrc.c    Michał Winiarski 2017-09-28  1128  
422d7df4f090bb drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2019-06-14  1129  		list_move(&rq->sched.link, pl);
672c368f939804 drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-01-16  1130  		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
672c368f939804 drivers/gpu/drm/i915/gt/intel_lrc.c Chris Wilson     2020-01-16  1131  
b16c765122f987 drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-10-01  1132  		active = rq;
b16c765122f987 drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-10-01  1133  	}
b16c765122f987 drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2018-10-01  1134  
eb8d0f5af4ec2d drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2019-01-25  1135  	return active;
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1136  }
7e4992ac045ccd drivers/gpu/drm/i915/intel_lrc.c    Chris Wilson     2017-09-28  1137  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all(a)lists.01.org

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-06-16 12:44 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-06-07 22:20 [Intel-gfx] [PATCH 01/28] drm/i915: Adjust the sentinel assert to match implementation Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Make the hanging request non-preemptible Chris Wilson
2020-06-08 20:58   ` Mika Kuoppala
2020-06-07 22:20 ` [Intel-gfx] [PATCH 03/28] drm/i915/selftests: Teach hang-self to target only itself Chris Wilson
2020-06-10 13:21   ` Mika Kuoppala
2020-06-07 22:20 ` [Intel-gfx] [PATCH 04/28] drm/i915/selftests: Remove live_suppress_wait_preempt Chris Wilson
2020-06-11 11:38   ` Tvrtko Ursulin
2020-06-07 22:20 ` [Intel-gfx] [PATCH 05/28] drm/i915/selftests: Trim execlists runtime Chris Wilson
2020-06-12 23:05   ` Andi Shyti
2020-06-07 22:20 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 07/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 09/28] drm/i915: Add list_for_each_entry_safe_continue_reverse Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 10/28] drm/i915/gem: Separate reloc validation into an earlier step Chris Wilson
2020-06-09  7:47   ` Tvrtko Ursulin
2020-06-09 10:48     ` Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 11/28] drm/i915/gem: Lift GPU relocation allocation Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 12/28] drm/i915/gem: Build the reloc request first Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 13/28] drm/i915/gem: Add all GPU reloc awaits/signals en masse Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 14/28] dma-buf: Proxy fence, an unsignaled fence placeholder Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 15/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 16/28] drm/i915: Unpeel awaits on a proxy fence Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 17/28] drm/i915/gem: Make relocations atomic within execbuf Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 18/28] drm/i915: Strip out internal priorities Chris Wilson
2020-06-07 22:20 ` [Intel-gfx] [PATCH 19/28] drm/i915: Remove I915_USER_PRIORITY_SHIFT Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 20/28] drm/i915: Replace engine->schedule() with a known request operation Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: Do not suspend bonded requests if one hangs Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 22/28] drm/i915: Teach the i915_dependency to use a double-lock Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 23/28] drm/i915: Restructure priority inheritance Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 24/28] ipi-dag Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Check for a completed last request once Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling Chris Wilson
2020-06-16  9:07   ` Thomas Hellström (Intel)
2020-06-16 10:12     ` Chris Wilson
2020-06-16 12:11       ` Thomas Hellström (Intel)
2020-06-16 12:44         ` Chris Wilson
2020-06-16 10:54     ` Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Specify a deadline for the heartbeat Chris Wilson
2020-06-07 22:21 ` [Intel-gfx] [PATCH 28/28] drm/i915: Replace the priority boosting for the display with a deadline Chris Wilson
2020-06-07 22:49 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915: Adjust the sentinel assert to match implementation Patchwork
2020-06-07 22:51 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-06-07 23:12 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-06-08  0:58 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
2020-06-08  7:44 ` [Intel-gfx] [PATCH 01/28] " Tvrtko Ursulin
2020-06-08  9:33   ` Chris Wilson
2020-06-09  6:59     ` Tvrtko Ursulin
2020-06-09 10:29       ` Chris Wilson
2020-06-09 10:39         ` Tvrtko Ursulin
2020-06-09 10:47           ` Chris Wilson
2020-06-09 11:45             ` Tvrtko Ursulin
2020-06-08 20:43 ` Mika Kuoppala
2020-06-08 20:27 [Intel-gfx] [PATCH 26/28] drm/i915: Fair low-latency scheduling kernel test robot

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.