All of lore.kernel.org
 help / color / mirror / Atom feed
* [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context
@ 2020-05-18  8:14 Chris Wilson
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines Chris Wilson
                   ` (13 more replies)
  0 siblings, 14 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

When we introduced the saturated workload detection to tell us to back
off from semaphore usage [semaphores have a noticeable impact on
contended bus cycles with the CPU for some heavy workloads], we first
introduced it as a per-context tracker. This allows individual contexts
to try and optimise their own usage, but we found that with the local
tracking and the no-semaphore boosting, the first context to disable
semaphores got a massive priority boost and so would starve the rest and
all new contexts (as they started with semaphores enabled and lower
priority). Hence we moved the saturated workload detection to the
engine, and a consequence had to disable semaphores on virtual engines.

Now that we do not have semaphore priority boosting, we can move the
tracking back to the context and virtual engines can now utilise the
faster inter-engine synchronisation.

References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
 drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  2 --
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 --
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 15 ---------------
 drivers/gpu/drm/i915/i915_request.c           |  4 ++--
 6 files changed, 5 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index e4aece20bc80..762a251d553b 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -268,6 +268,7 @@ static int __intel_context_active(struct i915_active *active)
 	if (err)
 		goto err_timeline;
 
+	ce->saturated = 0;
 	return 0;
 
 err_timeline:
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 4954b0df4864..aed26d93c2ca 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -78,6 +78,8 @@ struct intel_context {
 	} lrc;
 	u32 tag; /* cookie passed to HW to track this context on submission */
 
+	intel_engine_mask_t saturated; /* submitting semaphores too late? */
+
 	/* Time on GPU as tracked by the hw. */
 	struct {
 		struct ewma_runtime avg;
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index d0a1078ef632..6d7fdba5adef 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -229,8 +229,6 @@ static int __engine_park(struct intel_wakeref *wf)
 	struct intel_engine_cs *engine =
 		container_of(wf, typeof(*engine), wakeref);
 
-	engine->saturated = 0;
-
 	/*
 	 * If one and only one request is completed between pm events,
 	 * we know that we are inside the kernel context and it is
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index 2b6cdf47d428..c443b6bb884b 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -332,8 +332,6 @@ struct intel_engine_cs {
 
 	struct intel_context *kernel_context; /* pinned */
 
-	intel_engine_mask_t saturated; /* submitting semaphores too late? */
-
 	struct {
 		struct delayed_work work;
 		struct i915_request *systole;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 87e6c5bdd2dc..e597325d04f1 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -5630,21 +5630,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
 	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
 
-	/*
-	 * The decision on whether to submit a request using semaphores
-	 * depends on the saturated state of the engine. We only compute
-	 * this during HW submission of the request, and we need for this
-	 * state to be globally applied to all requests being submitted
-	 * to this engine. Virtual engines encompass more than one physical
-	 * engine and so we cannot accurately tell in advance if one of those
-	 * engines is already saturated and so cannot afford to use a semaphore
-	 * and be pessimized in priority for doing so -- if we are the only
-	 * context using semaphores after all other clients have stopped, we
-	 * will be starved on the saturated system. Such a global switch for
-	 * semaphores is less than ideal, but alas is the current compromise.
-	 */
-	ve->base.saturated = ALL_ENGINES;
-
 	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
 
 	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 526c1e9acbd5..31ef683d27b4 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -467,7 +467,7 @@ bool __i915_request_submit(struct i915_request *request)
 	 */
 	if (request->sched.semaphores &&
 	    i915_sw_fence_signaled(&request->semaphore))
-		engine->saturated |= request->sched.semaphores;
+		request->context->saturated |= request->sched.semaphores;
 
 	engine->emit_fini_breadcrumb(request,
 				     request->ring->vaddr + request->postfix);
@@ -919,7 +919,7 @@ already_busywaiting(struct i915_request *rq)
 	 *
 	 * See the are-we-too-late? check in __i915_request_submit().
 	 */
-	return rq->sched.semaphores | READ_ONCE(rq->engine->saturated);
+	return rq->sched.semaphores | READ_ONCE(rq->context->saturated);
 }
 
 static int
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18 10:12   ` Tvrtko Ursulin
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 3/8] drm/i915/gt: Reuse the tasklet priority for virtual as their siblings Chris Wilson
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Make sure that we can execute a virtual request on an already busy
engine, and conversely that we can execute a normal request if the
engines are already fully occupied by virtual requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_lrc.c | 179 +++++++++++++++++++++++++
 1 file changed, 179 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 824f99c4cc7c..1fc54359bd53 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -3766,6 +3766,184 @@ static int live_virtual_mask(void *arg)
 	return 0;
 }
 
+static int slicein_virtual_engine(struct intel_gt *gt,
+				  struct intel_engine_cs **siblings,
+				  unsigned int nsibling)
+{
+	struct intel_context *ce;
+	struct i915_request *rq;
+	struct igt_spinner spin;
+	unsigned int n;
+	int err = 0;
+
+	/*
+	 * Virtual requests must take part in timeslicing on the target engines.
+	 */
+
+	if (igt_spinner_init(&spin, gt))
+		return -ENOMEM;
+
+	for (n = 0; n < nsibling; n++) {
+		ce = intel_context_create(siblings[n]);
+		if (IS_ERR(ce)) {
+			err = PTR_ERR(ce);
+			goto out;
+		}
+
+		rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		i915_request_add(rq);
+	}
+
+	ce = intel_execlists_create_virtual(siblings, nsibling);
+	if (IS_ERR(ce)) {
+		err = PTR_ERR(ce);
+		goto out;
+	}
+
+	rq = intel_context_create_request(ce);
+	intel_context_put(ce);
+	if (IS_ERR(rq)) {
+		err = PTR_ERR(rq);
+		goto out;
+	}
+
+	i915_request_get(rq);
+	i915_request_add(rq);
+	if (i915_request_wait(rq, 0, HZ / 10) < 0) {
+		GEM_TRACE_ERR("%s(%s) failed to slice in virtual request\n",
+			      __func__, rq->engine->name);
+		GEM_TRACE_DUMP();
+		intel_gt_set_wedged(gt);
+		err = -EIO;
+	}
+	i915_request_put(rq);
+
+out:
+	igt_spinner_end(&spin);
+	if (igt_flush_test(gt->i915))
+		err = -EIO;
+	igt_spinner_fini(&spin);
+	return err;
+}
+
+static int sliceout_virtual_engine(struct intel_gt *gt,
+				   struct intel_engine_cs **siblings,
+				   unsigned int nsibling)
+{
+	struct intel_context *ce;
+	struct i915_request *rq;
+	struct igt_spinner spin;
+	unsigned int n;
+	int err = 0;
+
+	/*
+	 * Virtual requests must allow others a fair timeslice.
+	 */
+
+	if (igt_spinner_init(&spin, gt))
+		return -ENOMEM;
+
+	for (n = 0; n <= nsibling; n++) { /* oversubscribed */
+		ce = intel_execlists_create_virtual(siblings, nsibling);
+		if (IS_ERR(ce)) {
+			err = PTR_ERR(ce);
+			goto out;
+		}
+
+		rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		i915_request_add(rq);
+	}
+
+	for (n = 0; !err && n < nsibling; n++) {
+		ce = intel_context_create(siblings[n]);
+		if (IS_ERR(ce)) {
+			err = PTR_ERR(ce);
+			goto out;
+		}
+
+		rq = intel_context_create_request(ce);
+		intel_context_put(ce);
+
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		i915_request_get(rq);
+		i915_request_add(rq);
+		if (i915_request_wait(rq, 0, HZ / 10) < 0) {
+			GEM_TRACE_ERR("%s(%s) failed to slice out virtual request\n",
+				      __func__, siblings[n]->name);
+			GEM_TRACE_DUMP();
+			intel_gt_set_wedged(gt);
+			err = -EIO;
+		}
+		i915_request_put(rq);
+	}
+
+out:
+	igt_spinner_end(&spin);
+	if (igt_flush_test(gt->i915))
+		err = -EIO;
+	igt_spinner_fini(&spin);
+	return err;
+}
+
+static int live_virtual_slice(void *arg)
+{
+	struct intel_gt *gt = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	unsigned int class, inst;
+	int err;
+
+	if (intel_uc_uses_guc_submission(&gt->uc))
+		return 0;
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		unsigned int nsibling;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			struct intel_engine_cs *engine;
+
+			engine = gt->engine_class[class][inst];
+			if (!engine)
+				break;
+
+			if (!intel_engine_has_timeslices(engine))
+				continue;
+
+			siblings[nsibling++] = engine;
+		}
+		if (nsibling < 2)
+			continue;
+
+		err = slicein_virtual_engine(gt, siblings, nsibling);
+		if (err)
+			return err;
+
+		err = sliceout_virtual_engine(gt, siblings, nsibling);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int preserved_virtual_engine(struct intel_gt *gt,
 				    struct intel_engine_cs **siblings,
 				    unsigned int nsibling)
@@ -4329,6 +4507,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_virtual_engine),
 		SUBTEST(live_virtual_mask),
 		SUBTEST(live_virtual_preserved),
+		SUBTEST(live_virtual_slice),
 		SUBTEST(live_virtual_bond),
 		SUBTEST(live_virtual_reset),
 	};
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 3/8] drm/i915/gt: Reuse the tasklet priority for virtual as their siblings
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18 10:13   ` Tvrtko Ursulin
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Kick virtual siblings on timeslice out Chris Wilson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

In order to keep all the tasklets in the same execution lists and so
fifo ordered, be consistent and use the same priority for all.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index e597325d04f1..80885ba87db5 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1403,7 +1403,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 	struct i915_request *next = READ_ONCE(ve->request);
 
 	if (next && next->execution_mask & ~rq->execution_mask)
-		tasklet_schedule(&ve->base.execlists.tasklet);
+		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
 
 static inline void
@@ -5560,7 +5560,7 @@ static void virtual_submit_request(struct i915_request *rq)
 		GEM_BUG_ON(!list_empty(virtual_queue(ve)));
 		list_move_tail(&rq->sched.link, virtual_queue(ve));
 
-		tasklet_schedule(&ve->base.execlists.tasklet);
+		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 	}
 
 	spin_unlock_irqrestore(&ve->base.active.lock, flags);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 4/8] drm/i915/gt: Kick virtual siblings on timeslice out
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines Chris Wilson
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 3/8] drm/i915/gt: Reuse the tasklet priority for virtual as their siblings Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18 10:29   ` Tvrtko Ursulin
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing Chris Wilson
                   ` (10 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If we decide to timeslice out the current virtual request, we will
unsubmit it while it is still busy (ve->context.inflight == sibling[0]).
If the virtual tasklet and then the other sibling tasklets run before we
completely schedule out the active virtual request for the preemption,
those other tasklets will see that the virtul request is still inflight
on sibling[0] and leave it be. Therefore when we finally schedule-out
the virtual request and if we see that we have passed it back to the
virtual engine, reschedule the virtual tasklet so that it may be
resubmitted on any of the siblings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 80885ba87db5..05486e801a63 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1402,7 +1402,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 	struct i915_request *next = READ_ONCE(ve->request);
 
-	if (next && next->execution_mask & ~rq->execution_mask)
+	if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (2 preceding siblings ...)
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Kick virtual siblings on timeslice out Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18 10:36   ` Tvrtko Ursulin
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
                   ` (9 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

It was quite the oversight to only factor in the normal queue to decide
the timeslicing switch priority. By leaving out the next virtual request
from the priority decision, we would not timeslice the current engine if
there was an available virtual request.

Testcase: igt/gem_exec_balancer/sliced
Fixes: 3df2deed411e ("drm/i915/execlists: Enable timeslice on partial virtual engine dequeue")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 32 ++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 05486e801a63..120efb3eaf96 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1895,7 +1895,8 @@ static void defer_active(struct intel_engine_cs *engine)
 
 static bool
 need_timeslice(const struct intel_engine_cs *engine,
-	       const struct i915_request *rq)
+	       const struct i915_request *rq,
+	       const struct rb_node *rb)
 {
 	int hint;
 
@@ -1903,6 +1904,24 @@ need_timeslice(const struct intel_engine_cs *engine,
 		return false;
 
 	hint = engine->execlists.queue_priority_hint;
+
+	if (rb) {
+		const struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		const struct intel_engine_cs *inflight =
+			intel_context_inflight(&ve->context);
+
+		if (!inflight || inflight == engine) {
+			struct i915_request *next;
+
+			rcu_read_lock();
+			next = READ_ONCE(ve->request);
+			if (next)
+				hint = max(hint, rq_prio(next));
+			rcu_read_unlock();
+		}
+	}
+
 	if (!list_is_last(&rq->sched.link, &engine->active.requests))
 		hint = max(hint, rq_prio(list_next_entry(rq, sched.link)));
 
@@ -1977,10 +1996,9 @@ static void set_timeslice(struct intel_engine_cs *engine)
 	set_timer_ms(&engine->execlists.timer, duration);
 }
 
-static void start_timeslice(struct intel_engine_cs *engine)
+static void start_timeslice(struct intel_engine_cs *engine, int prio)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
-	const int prio = queue_prio(execlists);
 	unsigned long duration;
 
 	if (!intel_engine_has_timeslices(engine))
@@ -2140,7 +2158,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			__unwind_incomplete_requests(engine);
 
 			last = NULL;
-		} else if (need_timeslice(engine, last) &&
+		} else if (need_timeslice(engine, last, rb) &&
 			   timeslice_expired(execlists, last)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
@@ -2188,7 +2206,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * Even if ELSP[1] is occupied and not worthy
 				 * of timeslices, our queue might be.
 				 */
-				start_timeslice(engine);
+				start_timeslice(engine, queue_prio(execlists));
 				return;
 			}
 		}
@@ -2223,7 +2241,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 			if (last && !can_merge_rq(last, rq)) {
 				spin_unlock(&ve->base.active.lock);
-				start_timeslice(engine);
+				start_timeslice(engine, rq_prio(rq));
 				return; /* leave this for another sibling */
 			}
 
@@ -5519,7 +5537,7 @@ static void virtual_submission_tasklet(unsigned long data)
 submit_engine:
 		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
 		node->prio = prio;
-		if (first && prio > sibling->execlists.queue_priority_hint)
+		if (first && prio >= sibling->execlists.queue_priority_hint)
 			tasklet_hi_schedule(&sibling->execlists.tasklet);
 
 		spin_unlock(&sibling->active.lock);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (3 preceding siblings ...)
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18 10:51   ` Tvrtko Ursulin
  2020-05-18 12:33   ` [Intel-gfx] [PATCH v2] " Chris Wilson
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
                   ` (8 subsequent siblings)
  13 siblings, 2 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rather than going back and forth between the rb_node entry and the
virtual_engine type, store the ve local and reuse it. As the
container_of conversion from rb_node to virtual_engine requires a
variable offset, performing that conversion just once shaves off a bit
of code.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 93 +++++++++++++++--------------
 1 file changed, 47 insertions(+), 46 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 120efb3eaf96..7a5ac3375225 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -451,7 +451,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq,
-				struct rb_node *rb)
+				struct virtual_engine *ve)
 {
 	int last_prio;
 
@@ -488,9 +488,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
 		return true;
 
-	if (rb) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	if (ve) {
 		bool preempt = false;
 
 		if (engine == ve->siblings[0]) { /* only preempt one sibling */
@@ -1812,6 +1810,35 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	return true;
 }
 
+static struct virtual_engine *
+first_virtual_engine(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *el = &engine->execlists;
+	struct rb_node *rb = rb_first_cached(&el->virtual);
+
+	while (rb) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (!rq) { /* lazily cleanup after another engine handled rq */
+			rb_erase_cached(rb, &el->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&el->virtual);
+			continue;
+		}
+
+		if (!virtual_matches(ve, rq, engine)) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		return ve;
+	}
+
+	return NULL;
+}
+
 static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
 {
 	/*
@@ -1896,7 +1923,7 @@ static void defer_active(struct intel_engine_cs *engine)
 static bool
 need_timeslice(const struct intel_engine_cs *engine,
 	       const struct i915_request *rq,
-	       const struct rb_node *rb)
+	       struct virtual_engine *ve)
 {
 	int hint;
 
@@ -1905,9 +1932,7 @@ need_timeslice(const struct intel_engine_cs *engine,
 
 	hint = engine->execlists.queue_priority_hint;
 
-	if (rb) {
-		const struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	if (ve) {
 		const struct intel_engine_cs *inflight =
 			intel_context_inflight(&ve->context);
 
@@ -2057,8 +2082,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **port = execlists->pending;
 	struct i915_request ** const last_port = port + execlists->port_mask;
-	struct i915_request * const *active;
+	struct i915_request * const *active = READ_ONCE(execlists->active);
 	struct i915_request *last;
+	struct virtual_engine *ve;
 	struct rb_node *rb;
 	bool submit = false;
 
@@ -2084,26 +2110,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
-		struct i915_request *rq = READ_ONCE(ve->request);
-
-		if (!rq) { /* lazily cleanup after another engine handled rq */
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
-
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
-
-		break;
-	}
-
 	/*
 	 * If the queue is higher priority than the last
 	 * request in the currently active context, submit afresh.
@@ -2111,10 +2117,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * the active context to interject the preemption request,
 	 * i.e. we will retrigger preemption following the ack in case
 	 * of trouble.
-	 */
-	active = READ_ONCE(execlists->active);
-
-	/*
+	 *
 	 * In theory we can skip over completed contexts that have not
 	 * yet been processed by events (as those events are in flight):
 	 *
@@ -2125,9 +2128,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * find itself trying to jump back into a context it has just
 	 * completed and barf.
 	 */
-
 	if ((last = *active)) {
-		if (need_preempt(engine, last, rb)) {
+		ve = first_virtual_engine(engine);
+
+		if (need_preempt(engine, last, ve)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
 				return;
@@ -2158,7 +2162,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			__unwind_incomplete_requests(engine);
 
 			last = NULL;
-		} else if (need_timeslice(engine, last, rb) &&
+		} else if (need_timeslice(engine, last, ve) &&
 			   timeslice_expired(execlists, last)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
@@ -2212,9 +2216,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 	}
 
-	while (rb) { /* XXX virtual is always taking precedence */
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	/* XXX virtual is always taking precedence */
+	while ((ve = first_virtual_engine(engine))) {
 		struct i915_request *rq;
 
 		spin_lock(&ve->base.active.lock);
@@ -2222,9 +2225,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		rq = ve->request;
 		if (unlikely(!rq)) { /* lost the race to a sibling */
 			spin_unlock(&ve->base.active.lock);
+
+			rb = &ve->nodes[engine->id].rb;
 			rb_erase_cached(rb, &execlists->virtual);
 			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
 			continue;
 		}
 
@@ -2233,11 +2237,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		GEM_BUG_ON(rq->context != &ve->context);
 
 		if (rq_prio(rq) >= queue_prio(execlists)) {
-			if (!virtual_matches(ve, rq, engine)) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_next(rb);
-				continue;
-			}
+			GEM_BUG_ON(!virtual_matches(ve, rq, engine));
 
 			if (last && !can_merge_rq(last, rq)) {
 				spin_unlock(&ve->base.active.lock);
@@ -2257,6 +2257,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			WRITE_ONCE(ve->request, NULL);
 			WRITE_ONCE(ve->base.execlists.queue_priority_hint,
 				   INT_MIN);
+
+			rb = &ve->nodes[engine->id].rb;
 			rb_erase_cached(rb, &execlists->virtual);
 			RB_CLEAR_NODE(rb);
 
@@ -2309,7 +2311,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 */
 			if (!submit) {
 				spin_unlock(&ve->base.active.lock);
-				rb = rb_first_cached(&execlists->virtual);
 				continue;
 			}
 		}
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (4 preceding siblings ...)
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18 12:53   ` Tvrtko Ursulin
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 8/8] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
                   ` (7 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Once a virtual engine has been bound to a sibling, it will remain bound
until we finally schedule out the last active request. We can not rebind
the context to a new sibling while it is inflight as the context save
will conflict, hence we wait. As we cannot then use any other sibliing
while the context is inflight, only kick the bound sibling while it
inflight and upon scheduling out the kick the rest (so that we can swap
engines on timeslicing if the previously bound engine becomes
oversubscribed).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
 1 file changed, 13 insertions(+), 17 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7a5ac3375225..fe8f3518d6b8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
-	struct i915_request *next = READ_ONCE(ve->request);
 
-	if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
+	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
 
@@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
 			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
 		struct i915_request *rq = READ_ONCE(ve->request);
 
-		if (!rq) { /* lazily cleanup after another engine handled rq */
+		/* lazily cleanup after another engine handled rq */
+		if (!rq || !virtual_matches(ve, rq, engine)) {
 			rb_erase_cached(rb, &el->virtual);
 			RB_CLEAR_NODE(rb);
 			rb = rb_first_cached(&el->virtual);
 			continue;
 		}
 
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
-
 		return ve;
 	}
 
@@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
 	if (unlikely(!mask))
 		return;
 
-	local_irq_disable();
 	for (n = 0; n < ve->num_siblings; n++) {
 		struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
 		struct ve_node * const node = &ve->nodes[sibling->id];
@@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
 		if (!READ_ONCE(ve->request))
 			break; /* already handled by a sibling's tasklet */
 
+		spin_lock_irq(&sibling->active.lock);
+
 		if (unlikely(!(mask & sibling->mask))) {
 			if (!RB_EMPTY_NODE(&node->rb)) {
-				spin_lock(&sibling->active.lock);
 				rb_erase_cached(&node->rb,
 						&sibling->execlists.virtual);
 				RB_CLEAR_NODE(&node->rb);
-				spin_unlock(&sibling->active.lock);
 			}
-			continue;
-		}
 
-		spin_lock(&sibling->active.lock);
+			goto unlock_engine;
+		}
 
-		if (!RB_EMPTY_NODE(&node->rb)) {
+		if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
 			/*
 			 * Cheat and avoid rebalancing the tree if we can
 			 * reuse this node in situ.
@@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
 		if (first && prio >= sibling->execlists.queue_priority_hint)
 			tasklet_hi_schedule(&sibling->execlists.tasklet);
 
-		spin_unlock(&sibling->active.lock);
+unlock_engine:
+		spin_unlock_irq(&sibling->active.lock);
+
+		if (intel_context_inflight(&ve->context))
+			break;
 	}
-	local_irq_enable();
 }
 
 static void virtual_submit_request(struct i915_request *rq)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH 8/8] drm/i915/gt: Resubmit the virtual engine on schedule-out
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (5 preceding siblings ...)
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
@ 2020-05-18  8:14 ` Chris Wilson
  2020-05-18  9:53 ` [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Tvrtko Ursulin
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18  8:14 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Having recognised that we do not change the sibling until we schedule
out, we can then defer the decision to resubmit the virtual engine from
the unwind of the active queue to scheduling out of the virtual context.

By keeping the unwind order intact on the local engine, we can preserve
data dependency ordering while doing a preempt-to-busy pass until we
have determined the new ELSP. This means that if we try to timeslice
between a virtual engine and a data-dependent ordinary request, the pair
will maintain their relative ordering and we will avoid the
resubmission, cancelling the timeslicing until further change.

The dilemma though is that we then may end up in a situation where the
'demotion' of the virtual request to an ordinary request in the engine
queue results in filling the ELSP[] with virtual requests instead of
spreading the load across the engines. To compensate for this, we mark
each virtual request and refuse to resubmit a virtual request in the
secondary ELSP slots, thus forcing subsequent virtual requests to be
scheduled out after timeslicing. By delaying the decision until we
schedule out, we will avoid unnecessary resubmission.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c    | 99 ++++++++++++++++----------
 drivers/gpu/drm/i915/gt/selftest_lrc.c |  2 +-
 2 files changed, 62 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index fe8f3518d6b8..a0e337b855e3 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1114,46 +1114,17 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		__i915_request_unsubmit(rq);
 
-		/*
-		 * Push the request back into the queue for later resubmission.
-		 * If this request is not native to this physical engine (i.e.
-		 * it came from a virtual source), push it back onto the virtual
-		 * engine so that it can be moved across onto another physical
-		 * engine as load dictates.
-		 */
-		if (likely(rq->execution_mask == engine->mask)) {
-			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-			if (rq_prio(rq) != prio) {
-				prio = rq_prio(rq);
-				pl = i915_sched_lookup_priolist(engine, prio);
-			}
-			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
-
-			list_move(&rq->sched.link, pl);
-			set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(engine, prio);
+		}
+		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
-			active = rq;
-		} else {
-			struct intel_engine_cs *owner = rq->context->engine;
+		list_move(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
-			/*
-			 * Decouple the virtual breadcrumb before moving it
-			 * back to the virtual engine -- we don't want the
-			 * request to complete in the background and try
-			 * and cancel the breadcrumb on the virtual engine
-			 * (instead of the old engine where it is linked)!
-			 */
-			if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-				     &rq->fence.flags)) {
-				spin_lock_nested(&rq->lock,
-						 SINGLE_DEPTH_NESTING);
-				i915_request_cancel_breadcrumb(rq);
-				spin_unlock(&rq->lock);
-			}
-			WRITE_ONCE(rq->engine, owner);
-			owner->submit_request(rq);
-			active = NULL;
-		}
+		active = rq;
 	}
 
 	return active;
@@ -1395,12 +1366,41 @@ execlists_schedule_in(struct i915_request *rq, int idx)
 	return i915_request_get(rq);
 }
 
+static void
+resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
+{
+	/*
+	 * Decouple the virtual breadcrumb before moving it back to the virtual
+	 * engine -- we don't want the request to complete in the background
+	 * and then try and cancel the breadcrumb on the virtual engine
+	 * (instead of the old engine where it is linked)!
+	 */
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) {
+		spin_lock_nested(&rq->lock, SINGLE_DEPTH_NESTING);
+		i915_request_cancel_breadcrumb(rq);
+		spin_unlock(&rq->lock);
+	}
+
+	WRITE_ONCE(rq->engine, &ve->base);
+	ve->base.submit_request(rq);
+}
+
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
 
 	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
+
+	/*
+	 * This engine is now too busy to run this virtual request, so
+	 * see if we can find an alternative engine for it to execute on.
+	 * Once a request has become bonded to this engine, we treat it the
+	 * same as other native request.
+	 */
+	if (i915_request_in_priority_queue(rq) &&
+	    rq->execution_mask != rq->engine->mask)
+		resubmit_virtual_request(rq, ve);
 }
 
 static inline void
@@ -1646,6 +1646,20 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
 			return false;
 		}
 
+		/*
+		 * We want virtual requests to only be in the first slot so
+		 * that they are never stuck behind a hog and can be immediately
+		 * transferred onto the next idle engine.
+		 */
+		if (rq->execution_mask != engine->mask &&
+		    port != execlists->pending) {
+			GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n",
+				      engine->name,
+				      ce->timeline->fence_context,
+				      port - execlists->pending);
+			return false;
+		}
+
 		/* Hold tightly onto the lock to prevent concurrent retires! */
 		if (!spin_trylock_irqsave(&rq->lock, flags))
 			continue;
@@ -2353,6 +2367,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				if (i915_request_has_sentinel(last))
 					goto done;
 
+				/*
+				 * We avoid submitting virtual requests into
+				 * the secondary ports so that we can migrate
+				 * the request immediately to another engine
+				 * rather than wait for the primary request.
+				 */
+				if (rq->execution_mask != engine->mask)
+					goto done;
+
 				/*
 				 * If GVT overrides us we only ever submit
 				 * port[0], leaving port[1] empty. Note that we
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 1fc54359bd53..e541ff47aa30 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -4393,7 +4393,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	spin_lock_irq(&engine->active.lock);
 	__unwind_incomplete_requests(engine);
 	spin_unlock_irq(&engine->active.lock);
-	GEM_BUG_ON(rq->engine != ve->engine);
+	GEM_BUG_ON(rq->engine != engine);
 
 	/* Reset the engine while keeping our active request on hold */
 	execlists_hold(engine, rq);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (6 preceding siblings ...)
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 8/8] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
@ 2020-05-18  9:53 ` Tvrtko Ursulin
  2020-05-18 10:11   ` Chris Wilson
  2020-05-18 11:55 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] " Patchwork
                   ` (5 subsequent siblings)
  13 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18  9:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 09:14, Chris Wilson wrote:
> When we introduced the saturated workload detection to tell us to back
> off from semaphore usage [semaphores have a noticeable impact on
> contended bus cycles with the CPU for some heavy workloads], we first
> introduced it as a per-context tracker. This allows individual contexts
> to try and optimise their own usage, but we found that with the local
> tracking and the no-semaphore boosting, the first context to disable
> semaphores got a massive priority boost and so would starve the rest and
> all new contexts (as they started with semaphores enabled and lower
> priority). Hence we moved the saturated workload detection to the
> engine, and a consequence had to disable semaphores on virtual engines.
> 
> Now that we do not have semaphore priority boosting, we can move the
> tracking back to the context and virtual engines can now utilise the
> faster inter-engine synchronisation.
> 
> References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")

We'd need to dig out the bug report which the above commit fixed and see 
what tests need to be ran to check for no regressions. Sounds tricky to 
find without a tag. I certainly don't remember it from a year ago. :(

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       |  1 +
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  2 ++
>   drivers/gpu/drm/i915/gt/intel_engine_pm.c     |  2 --
>   drivers/gpu/drm/i915/gt/intel_engine_types.h  |  2 --
>   drivers/gpu/drm/i915/gt/intel_lrc.c           | 15 ---------------
>   drivers/gpu/drm/i915/i915_request.c           |  4 ++--
>   6 files changed, 5 insertions(+), 21 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index e4aece20bc80..762a251d553b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -268,6 +268,7 @@ static int __intel_context_active(struct i915_active *active)
>   	if (err)
>   		goto err_timeline;
>   
> +	ce->saturated = 0;
>   	return 0;
>   
>   err_timeline:
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 4954b0df4864..aed26d93c2ca 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -78,6 +78,8 @@ struct intel_context {
>   	} lrc;
>   	u32 tag; /* cookie passed to HW to track this context on submission */
>   
> +	intel_engine_mask_t saturated; /* submitting semaphores too late? */
> +
>   	/* Time on GPU as tracked by the hw. */
>   	struct {
>   		struct ewma_runtime avg;
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> index d0a1078ef632..6d7fdba5adef 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
> @@ -229,8 +229,6 @@ static int __engine_park(struct intel_wakeref *wf)
>   	struct intel_engine_cs *engine =
>   		container_of(wf, typeof(*engine), wakeref);
>   
> -	engine->saturated = 0;
> -
>   	/*
>   	 * If one and only one request is completed between pm events,
>   	 * we know that we are inside the kernel context and it is
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> index 2b6cdf47d428..c443b6bb884b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
> @@ -332,8 +332,6 @@ struct intel_engine_cs {
>   
>   	struct intel_context *kernel_context; /* pinned */
>   
> -	intel_engine_mask_t saturated; /* submitting semaphores too late? */
> -
>   	struct {
>   		struct delayed_work work;
>   		struct i915_request *systole;
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 87e6c5bdd2dc..e597325d04f1 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -5630,21 +5630,6 @@ intel_execlists_create_virtual(struct intel_engine_cs **siblings,
>   	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   	ve->base.uabi_instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>   
> -	/*
> -	 * The decision on whether to submit a request using semaphores
> -	 * depends on the saturated state of the engine. We only compute
> -	 * this during HW submission of the request, and we need for this
> -	 * state to be globally applied to all requests being submitted
> -	 * to this engine. Virtual engines encompass more than one physical
> -	 * engine and so we cannot accurately tell in advance if one of those
> -	 * engines is already saturated and so cannot afford to use a semaphore
> -	 * and be pessimized in priority for doing so -- if we are the only
> -	 * context using semaphores after all other clients have stopped, we
> -	 * will be starved on the saturated system. Such a global switch for
> -	 * semaphores is less than ideal, but alas is the current compromise.
> -	 */
> -	ve->base.saturated = ALL_ENGINES;
> -
>   	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>   
>   	intel_engine_init_active(&ve->base, ENGINE_VIRTUAL);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 526c1e9acbd5..31ef683d27b4 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -467,7 +467,7 @@ bool __i915_request_submit(struct i915_request *request)
>   	 */
>   	if (request->sched.semaphores &&
>   	    i915_sw_fence_signaled(&request->semaphore))
> -		engine->saturated |= request->sched.semaphores;
> +		request->context->saturated |= request->sched.semaphores;
>   
>   	engine->emit_fini_breadcrumb(request,
>   				     request->ring->vaddr + request->postfix);
> @@ -919,7 +919,7 @@ already_busywaiting(struct i915_request *rq)
>   	 *
>   	 * See the are-we-too-late? check in __i915_request_submit().
>   	 */
> -	return rq->sched.semaphores | READ_ONCE(rq->engine->saturated);
> +	return rq->sched.semaphores | READ_ONCE(rq->context->saturated);
>   }
>   
>   static int
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context
  2020-05-18  9:53 ` [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Tvrtko Ursulin
@ 2020-05-18 10:11   ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 10:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 10:53:22)
> 
> On 18/05/2020 09:14, Chris Wilson wrote:
> > When we introduced the saturated workload detection to tell us to back
> > off from semaphore usage [semaphores have a noticeable impact on
> > contended bus cycles with the CPU for some heavy workloads], we first
> > introduced it as a per-context tracker. This allows individual contexts
> > to try and optimise their own usage, but we found that with the local
> > tracking and the no-semaphore boosting, the first context to disable
> > semaphores got a massive priority boost and so would starve the rest and
> > all new contexts (as they started with semaphores enabled and lower
> > priority). Hence we moved the saturated workload detection to the
> > engine, and a consequence had to disable semaphores on virtual engines.
> > 
> > Now that we do not have semaphore priority boosting, we can move the
> > tracking back to the context and virtual engines can now utilise the
> > faster inter-engine synchronisation.
> > 
> > References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")
> 
> We'd need to dig out the bug report which the above commit fixed and see 
> what tests need to be ran to check for no regressions. Sounds tricky to 
> find without a tag. I certainly don't remember it from a year ago. :(

This is all about the semaphore priority boosting and inversions that
caused. The situation was that we would turn off the semaphore usage for
existing contexts, but new contexts would arrive and try and use
semaphore and be demoted in priority. Thus the new contexts would be
starved.

No semaphore boosting and the playing field is level again, and -b i915 is
no longer slower than -b busy/context/etc for unsaturated workloads.

I wanted to try and remove the saturation entirely. The impact on the
perf_density tests seems to be much lower than before, but I think that
is due to other mitigating factors.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines Chris Wilson
@ 2020-05-18 10:12   ` Tvrtko Ursulin
  2020-05-18 10:21     ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 10:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 09:14, Chris Wilson wrote:
> Make sure that we can execute a virtual request on an already busy
> engine, and conversely that we can execute a normal request if the
> engines are already fully occupied by virtual requests.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/selftest_lrc.c | 179 +++++++++++++++++++++++++
>   1 file changed, 179 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> index 824f99c4cc7c..1fc54359bd53 100644
> --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> @@ -3766,6 +3766,184 @@ static int live_virtual_mask(void *arg)
>   	return 0;
>   }
>   
> +static int slicein_virtual_engine(struct intel_gt *gt,
> +				  struct intel_engine_cs **siblings,
> +				  unsigned int nsibling)
> +{
> +	struct intel_context *ce;
> +	struct i915_request *rq;
> +	struct igt_spinner spin;
> +	unsigned int n;
> +	int err = 0;
> +
> +	/*
> +	 * Virtual requests must take part in timeslicing on the target engines.
> +	 */
> +
> +	if (igt_spinner_init(&spin, gt))
> +		return -ENOMEM;
> +
> +	for (n = 0; n < nsibling; n++) {
> +		ce = intel_context_create(siblings[n]);
> +		if (IS_ERR(ce)) {
> +			err = PTR_ERR(ce);
> +			goto out;
> +		}
> +
> +		rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
> +		intel_context_put(ce);
> +
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out;
> +		}
> +
> +		i915_request_add(rq);
> +	}
> +
> +	ce = intel_execlists_create_virtual(siblings, nsibling);
> +	if (IS_ERR(ce)) {
> +		err = PTR_ERR(ce);
> +		goto out;
> +	}
> +
> +	rq = intel_context_create_request(ce);
> +	intel_context_put(ce);
> +	if (IS_ERR(rq)) {
> +		err = PTR_ERR(rq);
> +		goto out;
> +	}
> +
> +	i915_request_get(rq);
> +	i915_request_add(rq);
> +	if (i915_request_wait(rq, 0, HZ / 10) < 0) {
> +		GEM_TRACE_ERR("%s(%s) failed to slice in virtual request\n",
> +			      __func__, rq->engine->name);
> +		GEM_TRACE_DUMP();
> +		intel_gt_set_wedged(gt);
> +		err = -EIO;
> +	}
> +	i915_request_put(rq);
> +
> +out:
> +	igt_spinner_end(&spin);
> +	if (igt_flush_test(gt->i915))
> +		err = -EIO;
> +	igt_spinner_fini(&spin);
> +	return err;
> +}
> +
> +static int sliceout_virtual_engine(struct intel_gt *gt,
> +				   struct intel_engine_cs **siblings,
> +				   unsigned int nsibling)
> +{
> +	struct intel_context *ce;
> +	struct i915_request *rq;
> +	struct igt_spinner spin;
> +	unsigned int n;
> +	int err = 0;
> +
> +	/*
> +	 * Virtual requests must allow others a fair timeslice.
> +	 */
> +
> +	if (igt_spinner_init(&spin, gt))
> +		return -ENOMEM;
> +
> +	for (n = 0; n <= nsibling; n++) { /* oversubscribed */
> +		ce = intel_execlists_create_virtual(siblings, nsibling);
> +		if (IS_ERR(ce)) {
> +			err = PTR_ERR(ce);
> +			goto out;
> +		}
> +
> +		rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
> +		intel_context_put(ce);
> +
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out;
> +		}
> +
> +		i915_request_add(rq);
> +	}
> +
> +	for (n = 0; !err && n < nsibling; n++) {
> +		ce = intel_context_create(siblings[n]);
> +		if (IS_ERR(ce)) {
> +			err = PTR_ERR(ce);
> +			goto out;
> +		}
> +
> +		rq = intel_context_create_request(ce);
> +		intel_context_put(ce);
> +
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out;
> +		}
> +
> +		i915_request_get(rq);
> +		i915_request_add(rq);
> +		if (i915_request_wait(rq, 0, HZ / 10) < 0) {
> +			GEM_TRACE_ERR("%s(%s) failed to slice out virtual request\n",
> +				      __func__, siblings[n]->name);
> +			GEM_TRACE_DUMP();
> +			intel_gt_set_wedged(gt);
> +			err = -EIO;
> +		}
> +		i915_request_put(rq);
> +	}
> +
> +out:
> +	igt_spinner_end(&spin);
> +	if (igt_flush_test(gt->i915))
> +		err = -EIO;
> +	igt_spinner_fini(&spin);
> +	return err;
> +}
> +
> +static int live_virtual_slice(void *arg)
> +{
> +	struct intel_gt *gt = arg;
> +	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> +	unsigned int class, inst;
> +	int err;
> +
> +	if (intel_uc_uses_guc_submission(&gt->uc))
> +		return 0;

Shouldn't the intel_engine_has_timeslices check below be enough? I am 
worried not to silently skip this seemingly pretty generic too much.

> +
> +	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> +		unsigned int nsibling;
> +
> +		nsibling = 0;
> +		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
> +			struct intel_engine_cs *engine;
> +
> +			engine = gt->engine_class[class][inst];
> +			if (!engine)
> +				break;

This should be continue I think, to account for vcs0 + vcs2 on Icelake.

> +
> +			if (!intel_engine_has_timeslices(engine))
> +				continue;
> +
> +			siblings[nsibling++] = engine;
> +		}
> +		if (nsibling < 2)
> +			continue;
> +
> +		err = slicein_virtual_engine(gt, siblings, nsibling);
> +		if (err)
> +			return err;
> +
> +		err = sliceout_virtual_engine(gt, siblings, nsibling);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
>   static int preserved_virtual_engine(struct intel_gt *gt,
>   				    struct intel_engine_cs **siblings,
>   				    unsigned int nsibling)
> @@ -4329,6 +4507,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_virtual_engine),
>   		SUBTEST(live_virtual_mask),
>   		SUBTEST(live_virtual_preserved),
> +		SUBTEST(live_virtual_slice),
>   		SUBTEST(live_virtual_bond),
>   		SUBTEST(live_virtual_reset),
>   	};
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 3/8] drm/i915/gt: Reuse the tasklet priority for virtual as their siblings
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 3/8] drm/i915/gt: Reuse the tasklet priority for virtual as their siblings Chris Wilson
@ 2020-05-18 10:13   ` Tvrtko Ursulin
  0 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 10:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 18/05/2020 09:14, Chris Wilson wrote:
> In order to keep all the tasklets in the same execution lists and so
> fifo ordered, be consistent and use the same priority for all.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index e597325d04f1..80885ba87db5 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1403,7 +1403,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>   	struct i915_request *next = READ_ONCE(ve->request);
>   
>   	if (next && next->execution_mask & ~rq->execution_mask)
> -		tasklet_schedule(&ve->base.execlists.tasklet);
> +		tasklet_hi_schedule(&ve->base.execlists.tasklet);
>   }
>   
>   static inline void
> @@ -5560,7 +5560,7 @@ static void virtual_submit_request(struct i915_request *rq)
>   		GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>   		list_move_tail(&rq->sched.link, virtual_queue(ve));
>   
> -		tasklet_schedule(&ve->base.execlists.tasklet);
> +		tasklet_hi_schedule(&ve->base.execlists.tasklet);
>   	}
>   
>   	spin_unlock_irqrestore(&ve->base.active.lock, flags);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines
  2020-05-18 10:12   ` Tvrtko Ursulin
@ 2020-05-18 10:21     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 10:21 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 11:12:29)
> 
> On 18/05/2020 09:14, Chris Wilson wrote:
> > Make sure that we can execute a virtual request on an already busy
> > engine, and conversely that we can execute a normal request if the
> > engines are already fully occupied by virtual requests.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gt/selftest_lrc.c | 179 +++++++++++++++++++++++++
> >   1 file changed, 179 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > index 824f99c4cc7c..1fc54359bd53 100644
> > --- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
> > @@ -3766,6 +3766,184 @@ static int live_virtual_mask(void *arg)
> >       return 0;
> >   }
> >   
> > +static int slicein_virtual_engine(struct intel_gt *gt,
> > +                               struct intel_engine_cs **siblings,
> > +                               unsigned int nsibling)
> > +{
> > +     struct intel_context *ce;
> > +     struct i915_request *rq;
> > +     struct igt_spinner spin;
> > +     unsigned int n;
> > +     int err = 0;
> > +
> > +     /*
> > +      * Virtual requests must take part in timeslicing on the target engines.
> > +      */
> > +
> > +     if (igt_spinner_init(&spin, gt))
> > +             return -ENOMEM;
> > +
> > +     for (n = 0; n < nsibling; n++) {
> > +             ce = intel_context_create(siblings[n]);
> > +             if (IS_ERR(ce)) {
> > +                     err = PTR_ERR(ce);
> > +                     goto out;
> > +             }
> > +
> > +             rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
> > +             intel_context_put(ce);
> > +
> > +             if (IS_ERR(rq)) {
> > +                     err = PTR_ERR(rq);
> > +                     goto out;
> > +             }
> > +
> > +             i915_request_add(rq);
> > +     }
> > +
> > +     ce = intel_execlists_create_virtual(siblings, nsibling);
> > +     if (IS_ERR(ce)) {
> > +             err = PTR_ERR(ce);
> > +             goto out;
> > +     }
> > +
> > +     rq = intel_context_create_request(ce);
> > +     intel_context_put(ce);
> > +     if (IS_ERR(rq)) {
> > +             err = PTR_ERR(rq);
> > +             goto out;
> > +     }
> > +
> > +     i915_request_get(rq);
> > +     i915_request_add(rq);
> > +     if (i915_request_wait(rq, 0, HZ / 10) < 0) {
> > +             GEM_TRACE_ERR("%s(%s) failed to slice in virtual request\n",
> > +                           __func__, rq->engine->name);
> > +             GEM_TRACE_DUMP();
> > +             intel_gt_set_wedged(gt);
> > +             err = -EIO;
> > +     }
> > +     i915_request_put(rq);
> > +
> > +out:
> > +     igt_spinner_end(&spin);
> > +     if (igt_flush_test(gt->i915))
> > +             err = -EIO;
> > +     igt_spinner_fini(&spin);
> > +     return err;
> > +}
> > +
> > +static int sliceout_virtual_engine(struct intel_gt *gt,
> > +                                struct intel_engine_cs **siblings,
> > +                                unsigned int nsibling)
> > +{
> > +     struct intel_context *ce;
> > +     struct i915_request *rq;
> > +     struct igt_spinner spin;
> > +     unsigned int n;
> > +     int err = 0;
> > +
> > +     /*
> > +      * Virtual requests must allow others a fair timeslice.
> > +      */
> > +
> > +     if (igt_spinner_init(&spin, gt))
> > +             return -ENOMEM;
> > +
> > +     for (n = 0; n <= nsibling; n++) { /* oversubscribed */
> > +             ce = intel_execlists_create_virtual(siblings, nsibling);
> > +             if (IS_ERR(ce)) {
> > +                     err = PTR_ERR(ce);
> > +                     goto out;
> > +             }
> > +
> > +             rq = igt_spinner_create_request(&spin, ce, MI_ARB_CHECK);
> > +             intel_context_put(ce);
> > +
> > +             if (IS_ERR(rq)) {
> > +                     err = PTR_ERR(rq);
> > +                     goto out;
> > +             }
> > +
> > +             i915_request_add(rq);
> > +     }
> > +
> > +     for (n = 0; !err && n < nsibling; n++) {
> > +             ce = intel_context_create(siblings[n]);
> > +             if (IS_ERR(ce)) {
> > +                     err = PTR_ERR(ce);
> > +                     goto out;
> > +             }
> > +
> > +             rq = intel_context_create_request(ce);
> > +             intel_context_put(ce);
> > +
> > +             if (IS_ERR(rq)) {
> > +                     err = PTR_ERR(rq);
> > +                     goto out;
> > +             }
> > +
> > +             i915_request_get(rq);
> > +             i915_request_add(rq);
> > +             if (i915_request_wait(rq, 0, HZ / 10) < 0) {
> > +                     GEM_TRACE_ERR("%s(%s) failed to slice out virtual request\n",
> > +                                   __func__, siblings[n]->name);
> > +                     GEM_TRACE_DUMP();
> > +                     intel_gt_set_wedged(gt);
> > +                     err = -EIO;
> > +             }
> > +             i915_request_put(rq);
> > +     }
> > +
> > +out:
> > +     igt_spinner_end(&spin);
> > +     if (igt_flush_test(gt->i915))
> > +             err = -EIO;
> > +     igt_spinner_fini(&spin);
> > +     return err;
> > +}
> > +
> > +static int live_virtual_slice(void *arg)
> > +{
> > +     struct intel_gt *gt = arg;
> > +     struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> > +     unsigned int class, inst;
> > +     int err;
> > +
> > +     if (intel_uc_uses_guc_submission(&gt->uc))
> > +             return 0;
> 
> Shouldn't the intel_engine_has_timeslices check below be enough? I am 
> worried not to silently skip this seemingly pretty generic too much.

I haven't looked too hard which of these are execlists specific. I have
a plan somewhere to recreate a bunch of these as functional tests for
the i915_request layer.

That would give us a rough progression like

low level live_execlists_selftests
-> mid level live_request_selftests
 -> uapi gem_exec_scheduler at al

> > +
> > +     for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> > +             unsigned int nsibling;
> > +
> > +             nsibling = 0;
> > +             for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
> > +                     struct intel_engine_cs *engine;
> > +
> > +                     engine = gt->engine_class[class][inst];
> > +                     if (!engine)
> > +                             break;
> 
> This should be continue I think, to account for vcs0 + vcs2 on Icelake.

Oh, they all break atm iirc. Ah, no just the one I copied. Do I hear
the plea for refactoring the duplicated code :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 4/8] drm/i915/gt: Kick virtual siblings on timeslice out
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Kick virtual siblings on timeslice out Chris Wilson
@ 2020-05-18 10:29   ` Tvrtko Ursulin
  0 siblings, 0 replies; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 10:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 09:14, Chris Wilson wrote:
> If we decide to timeslice out the current virtual request, we will
> unsubmit it while it is still busy (ve->context.inflight == sibling[0]).
> If the virtual tasklet and then the other sibling tasklets run before we
> completely schedule out the active virtual request for the preemption,
> those other tasklets will see that the virtul request is still inflight
> on sibling[0] and leave it be. Therefore when we finally schedule-out
> the virtual request and if we see that we have passed it back to the
> virtual engine, reschedule the virtual tasklet so that it may be
> resubmitted on any of the siblings.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 80885ba87db5..05486e801a63 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1402,7 +1402,7 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>   	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
>   	struct i915_request *next = READ_ONCE(ve->request);
>   
> -	if (next && next->execution_mask & ~rq->execution_mask)
> +	if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
>   		tasklet_hi_schedule(&ve->base.execlists.tasklet);
>   }
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing Chris Wilson
@ 2020-05-18 10:36   ` Tvrtko Ursulin
  2020-05-18 10:38     ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 10:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 09:14, Chris Wilson wrote:
> It was quite the oversight to only factor in the normal queue to decide
> the timeslicing switch priority. By leaving out the next virtual request
> from the priority decision, we would not timeslice the current engine if
> there was an available virtual request.
> 
> Testcase: igt/gem_exec_balancer/sliced
> Fixes: 3df2deed411e ("drm/i915/execlists: Enable timeslice on partial virtual engine dequeue")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 32 ++++++++++++++++++++++-------
>   1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 05486e801a63..120efb3eaf96 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1895,7 +1895,8 @@ static void defer_active(struct intel_engine_cs *engine)
>   
>   static bool
>   need_timeslice(const struct intel_engine_cs *engine,
> -	       const struct i915_request *rq)
> +	       const struct i915_request *rq,
> +	       const struct rb_node *rb)
>   {
>   	int hint;
>   
> @@ -1903,6 +1904,24 @@ need_timeslice(const struct intel_engine_cs *engine,
>   		return false;
>   
>   	hint = engine->execlists.queue_priority_hint;
> +
> +	if (rb) {
> +		const struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		const struct intel_engine_cs *inflight =
> +			intel_context_inflight(&ve->context);
> +
> +		if (!inflight || inflight == engine) {
> +			struct i915_request *next;
> +
> +			rcu_read_lock();
> +			next = READ_ONCE(ve->request);
> +			if (next)
> +				hint = max(hint, rq_prio(next));
> +			rcu_read_unlock();
> +		}
> +	}
> +
>   	if (!list_is_last(&rq->sched.link, &engine->active.requests))
>   		hint = max(hint, rq_prio(list_next_entry(rq, sched.link)));
>   
> @@ -1977,10 +1996,9 @@ static void set_timeslice(struct intel_engine_cs *engine)
>   	set_timer_ms(&engine->execlists.timer, duration);
>   }
>   
> -static void start_timeslice(struct intel_engine_cs *engine)
> +static void start_timeslice(struct intel_engine_cs *engine, int prio)
>   {
>   	struct intel_engine_execlists *execlists = &engine->execlists;
> -	const int prio = queue_prio(execlists);
>   	unsigned long duration;
>   
>   	if (!intel_engine_has_timeslices(engine))
> @@ -2140,7 +2158,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			__unwind_incomplete_requests(engine);
>   
>   			last = NULL;
> -		} else if (need_timeslice(engine, last) &&
> +		} else if (need_timeslice(engine, last, rb) &&
>   			   timeslice_expired(execlists, last)) {
>   			if (i915_request_completed(last)) {
>   				tasklet_hi_schedule(&execlists->tasklet);
> @@ -2188,7 +2206,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   				 * Even if ELSP[1] is occupied and not worthy
>   				 * of timeslices, our queue might be.
>   				 */
> -				start_timeslice(engine);
> +				start_timeslice(engine, queue_prio(execlists));
>   				return;
>   			}
>   		}
> @@ -2223,7 +2241,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   
>   			if (last && !can_merge_rq(last, rq)) {
>   				spin_unlock(&ve->base.active.lock);
> -				start_timeslice(engine);
> +				start_timeslice(engine, rq_prio(rq));
>   				return; /* leave this for another sibling */
>   			}
>   
> @@ -5519,7 +5537,7 @@ static void virtual_submission_tasklet(unsigned long data)
>   submit_engine:
>   		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
>   		node->prio = prio;
> -		if (first && prio > sibling->execlists.queue_priority_hint)
> +		if (first && prio >= sibling->execlists.queue_priority_hint)

I got the rest but not why this is needed?

Regards,

Tvrtko

>   			tasklet_hi_schedule(&sibling->execlists.tasklet);
>   
>   		spin_unlock(&sibling->active.lock);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing
  2020-05-18 10:36   ` Tvrtko Ursulin
@ 2020-05-18 10:38     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 10:38 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 11:36:15)
> 
> On 18/05/2020 09:14, Chris Wilson wrote:
> > @@ -5519,7 +5537,7 @@ static void virtual_submission_tasklet(unsigned long data)
> >   submit_engine:
> >               GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> >               node->prio = prio;
> > -             if (first && prio > sibling->execlists.queue_priority_hint)
> > +             if (first && prio >= sibling->execlists.queue_priority_hint)
> 
> I got the rest but not why this is needed?

It's not. I failed in my mission to undo this chunk correctly.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
@ 2020-05-18 10:51   ` Tvrtko Ursulin
  2020-05-18 10:57     ` Chris Wilson
  2020-05-18 12:33   ` [Intel-gfx] [PATCH v2] " Chris Wilson
  1 sibling, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 10:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 09:14, Chris Wilson wrote:
> Rather than going back and forth between the rb_node entry and the
> virtual_engine type, store the ve local and reuse it. As the
> container_of conversion from rb_node to virtual_engine requires a
> variable offset, performing that conversion just once shaves off a bit
> of code.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 93 +++++++++++++++--------------
>   1 file changed, 47 insertions(+), 46 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 120efb3eaf96..7a5ac3375225 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -451,7 +451,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
>   				const struct i915_request *rq,
> -				struct rb_node *rb)
> +				struct virtual_engine *ve)
>   {
>   	int last_prio;
>   
> @@ -488,9 +488,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
>   		return true;
>   
> -	if (rb) {
> -		struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +	if (ve) {
>   		bool preempt = false;
>   
>   		if (engine == ve->siblings[0]) { /* only preempt one sibling */
> @@ -1812,6 +1810,35 @@ static bool virtual_matches(const struct virtual_engine *ve,
>   	return true;
>   }
>   
> +static struct virtual_engine *
> +first_virtual_engine(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists *el = &engine->execlists;
> +	struct rb_node *rb = rb_first_cached(&el->virtual);
> +
> +	while (rb) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +
> +		if (!rq) { /* lazily cleanup after another engine handled rq */
> +			rb_erase_cached(rb, &el->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&el->virtual);
> +			continue;
> +		}
> +
> +		if (!virtual_matches(ve, rq, engine)) {
> +			rb = rb_next(rb);
> +			continue;
> +		}
> +
> +		return ve;
> +	}
> +
> +	return NULL;
> +}
> +
>   static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
>   {
>   	/*
> @@ -1896,7 +1923,7 @@ static void defer_active(struct intel_engine_cs *engine)
>   static bool
>   need_timeslice(const struct intel_engine_cs *engine,
>   	       const struct i915_request *rq,
> -	       const struct rb_node *rb)
> +	       struct virtual_engine *ve)
>   {
>   	int hint;
>   
> @@ -1905,9 +1932,7 @@ need_timeslice(const struct intel_engine_cs *engine,
>   
>   	hint = engine->execlists.queue_priority_hint;
>   
> -	if (rb) {
> -		const struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +	if (ve) {
>   		const struct intel_engine_cs *inflight =
>   			intel_context_inflight(&ve->context);
>   
> @@ -2057,8 +2082,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct i915_request **port = execlists->pending;
>   	struct i915_request ** const last_port = port + execlists->port_mask;
> -	struct i915_request * const *active;
> +	struct i915_request * const *active = READ_ONCE(execlists->active);
>   	struct i915_request *last;
> +	struct virtual_engine *ve;
>   	struct rb_node *rb;
>   	bool submit = false;
>   
> @@ -2084,26 +2110,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> -	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> -		struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> -		struct i915_request *rq = READ_ONCE(ve->request);
> -
> -		if (!rq) { /* lazily cleanup after another engine handled rq */
> -			rb_erase_cached(rb, &execlists->virtual);
> -			RB_CLEAR_NODE(rb);
> -			rb = rb_first_cached(&execlists->virtual);
> -			continue;
> -		}
> -
> -		if (!virtual_matches(ve, rq, engine)) {
> -			rb = rb_next(rb);
> -			continue;
> -		}
> -
> -		break;
> -	}
> -
>   	/*
>   	 * If the queue is higher priority than the last
>   	 * request in the currently active context, submit afresh.
> @@ -2111,10 +2117,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * the active context to interject the preemption request,
>   	 * i.e. we will retrigger preemption following the ack in case
>   	 * of trouble.
> -	 */
> -	active = READ_ONCE(execlists->active);
> -
> -	/*
> +	 *
>   	 * In theory we can skip over completed contexts that have not
>   	 * yet been processed by events (as those events are in flight):
>   	 *
> @@ -2125,9 +2128,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * find itself trying to jump back into a context it has just
>   	 * completed and barf.
>   	 */
> -
>   	if ((last = *active)) {
> -		if (need_preempt(engine, last, rb)) {
> +		ve = first_virtual_engine(engine);

If you left this outside the if..

> +
> +		if (need_preempt(engine, last, ve)) {
>   			if (i915_request_completed(last)) {
>   				tasklet_hi_schedule(&execlists->tasklet);
>   				return;
> @@ -2158,7 +2162,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			__unwind_incomplete_requests(engine);
>   
>   			last = NULL;
> -		} else if (need_timeslice(engine, last, rb) &&
> +		} else if (need_timeslice(engine, last, ve) &&
>   			   timeslice_expired(execlists, last)) {
>   			if (i915_request_completed(last)) {
>   				tasklet_hi_schedule(&execlists->tasklet);
> @@ -2212,9 +2216,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		}
>   	}
>   
> -	while (rb) { /* XXX virtual is always taking precedence */
> -		struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +	/* XXX virtual is always taking precedence */
> +	while ((ve = first_virtual_engine(engine))) {

... then here you wouldn't have to do a re-lookup of the same node, right?

>   		struct i915_request *rq;
>   
>   		spin_lock(&ve->base.active.lock);
> @@ -2222,9 +2225,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		rq = ve->request;
>   		if (unlikely(!rq)) { /* lost the race to a sibling */
>   			spin_unlock(&ve->base.active.lock);
> +
> +			rb = &ve->nodes[engine->id].rb;
>   			rb_erase_cached(rb, &execlists->virtual);
>   			RB_CLEAR_NODE(rb);
> -			rb = rb_first_cached(&execlists->virtual);
>   			continue;
>   		}
>   
> @@ -2233,11 +2237,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		GEM_BUG_ON(rq->context != &ve->context);
>   
>   		if (rq_prio(rq) >= queue_prio(execlists)) {
> -			if (!virtual_matches(ve, rq, engine)) {
> -				spin_unlock(&ve->base.active.lock);
> -				rb = rb_next(rb);
> -				continue;
> -			}
> +			GEM_BUG_ON(!virtual_matches(ve, rq, engine));

But you'd have to keep the virtual_matches logic. Don't know.. no strong 
feelings either way.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

>   
>   			if (last && !can_merge_rq(last, rq)) {
>   				spin_unlock(&ve->base.active.lock);
> @@ -2257,6 +2257,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			WRITE_ONCE(ve->request, NULL);
>   			WRITE_ONCE(ve->base.execlists.queue_priority_hint,
>   				   INT_MIN);
> +
> +			rb = &ve->nodes[engine->id].rb;
>   			rb_erase_cached(rb, &execlists->virtual);
>   			RB_CLEAR_NODE(rb);
>   
> @@ -2309,7 +2311,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			 */
>   			if (!submit) {
>   				spin_unlock(&ve->base.active.lock);
> -				rb = rb_first_cached(&execlists->virtual);
>   				continue;
>   			}
>   		}
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-05-18 10:51   ` Tvrtko Ursulin
@ 2020-05-18 10:57     ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 10:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 11:51:40)
> 
> On 18/05/2020 09:14, Chris Wilson wrote:
> > @@ -2125,9 +2128,10 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >        * find itself trying to jump back into a context it has just
> >        * completed and barf.
> >        */
> > -
> >       if ((last = *active)) {
> > -             if (need_preempt(engine, last, rb)) {
> > +             ve = first_virtual_engine(engine);
> 
> If you left this outside the if..
> 
> > @@ -2212,9 +2216,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               }
> >       }
> >   
> > -     while (rb) { /* XXX virtual is always taking precedence */
> > -             struct virtual_engine *ve =
> > -                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +     /* XXX virtual is always taking precedence */
> > +     while ((ve = first_virtual_engine(engine))) {
> 
> ... then here you wouldn't have to do a re-lookup of the same node, right?

But since we loop inside here, we need to recheck either at the continue
or here.

So:
	ve = first_virtual_engine();
	...
	if (ve) do {
	} while (ve = first_virtual_engine());

is kind of what we want, so a

	while (ve) {
	next_virtual_engine:
		ve = first_virtual_engine()
	}

instead of continues?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (7 preceding siblings ...)
  2020-05-18  9:53 ` [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Tvrtko Ursulin
@ 2020-05-18 11:55 ` Patchwork
  2020-05-18 11:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2020-05-18 11:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Move saturated workload detection back to the context
URL   : https://patchwork.freedesktop.org/series/77344/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
a07fc7f12a37 drm/i915: Move saturated workload detection back to the context
-:22: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#22: 
References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")

-:22: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")'
#22: 
References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")

total: 1 errors, 1 warnings, 0 checks, 68 lines checked
d8b03783bec1 drm/i915/selftests: Add tests for timeslicing virtual engines
d4ceed5eec15 drm/i915/gt: Reuse the tasklet priority for virtual as their siblings
03c163916111 drm/i915/gt: Kick virtual siblings on timeslice out
eaf97f160cbc drm/i915/gt: Incorporate the virtual engine into timeslicing
494e3a2b5679 drm/i915/gt: Use virtual_engine during execlists_dequeue
4aadd62c73e8 drm/i915/gt: Decouple inflight virtual engines
ec4244a39688 drm/i915/gt: Resubmit the virtual engine on schedule-out

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (8 preceding siblings ...)
  2020-05-18 11:55 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] " Patchwork
@ 2020-05-18 11:56 ` Patchwork
  2020-05-18 12:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2020-05-18 11:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Move saturated workload detection back to the context
URL   : https://patchwork.freedesktop.org/series/77344/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/display/intel_display.c:1222:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1225:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1228:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1231:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2274:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2275:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2276:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2277:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2278:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2279:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/intel_reset.c:1310:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/sysfs_engines.c:61:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:62:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:66:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gvt/mmio.c:287:23: warning: memcpy with byte count of 279040
+drivers/gpu/drm/i915/i915_perf.c:1425:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1479:15: warning: memset with byte count of 16777216
+./include/linux/compiler.h:199:9: warning: context imbalance in 'engines_sample' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [1/8] drm/i915: Move saturated workload detection back to the context
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (9 preceding siblings ...)
  2020-05-18 11:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-05-18 12:17 ` Patchwork
  2020-05-18 15:55 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2) Patchwork
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2020-05-18 12:17 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Move saturated workload detection back to the context
URL   : https://patchwork.freedesktop.org/series/77344/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_8494 -> Patchwork_17685
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17685/index.html

Known issues
------------

  Here are the changes found in Patchwork_17685 that come from known issues:

### IGT changes ###

#### Possible fixes ####

  * igt@i915_selftest@live@execlists:
    - fi-whl-u:           [INCOMPLETE][1] ([i915#656]) -> [PASS][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-whl-u/igt@i915_selftest@live@execlists.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17685/fi-whl-u/igt@i915_selftest@live@execlists.html

  
  [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656


Participating hosts (51 -> 45)
------------------------------

  Additional (1): fi-kbl-7560u 
  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8494 -> Patchwork_17685

  CI-20190529: 20190529
  CI_DRM_8494: 3d15348fde9b998e754da0b0655baf02b98e7f17 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5657: 649eae5c905a7460b44305800f95db83a6dd47cb @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17685: ec4244a39688506247a4c19ad737006cc18a4b43 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

ec4244a39688 drm/i915/gt: Resubmit the virtual engine on schedule-out
4aadd62c73e8 drm/i915/gt: Decouple inflight virtual engines
494e3a2b5679 drm/i915/gt: Use virtual_engine during execlists_dequeue
eaf97f160cbc drm/i915/gt: Incorporate the virtual engine into timeslicing
03c163916111 drm/i915/gt: Kick virtual siblings on timeslice out
d4ceed5eec15 drm/i915/gt: Reuse the tasklet priority for virtual as their siblings
d8b03783bec1 drm/i915/selftests: Add tests for timeslicing virtual engines
a07fc7f12a37 drm/i915: Move saturated workload detection back to the context

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17685/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] [PATCH v2] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
  2020-05-18 10:51   ` Tvrtko Ursulin
@ 2020-05-18 12:33   ` Chris Wilson
  2020-05-18 13:01     ` Tvrtko Ursulin
  1 sibling, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 12:33 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rather than going back and forth between the rb_node entry and the
virtual_engine type, store the ve local and reuse it. As the
container_of conversion from rb_node to virtual_engine requires a
variable offset, performing that conversion just once shaves off a bit
of code.

v2: Keep a single virtual engine lookup, for typical use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 176 +++++++++++++---------------
 1 file changed, 84 insertions(+), 92 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 8524c5f3a329..7843bf3f3f1f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -451,7 +451,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq,
-				struct rb_node *rb)
+				struct virtual_engine *ve)
 {
 	int last_prio;
 
@@ -488,9 +488,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
 		return true;
 
-	if (rb) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	if (ve) {
 		bool preempt = false;
 
 		if (engine == ve->siblings[0]) { /* only preempt one sibling */
@@ -1812,6 +1810,35 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	return true;
 }
 
+static struct virtual_engine *
+first_virtual_engine(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *el = &engine->execlists;
+	struct rb_node *rb = rb_first_cached(&el->virtual);
+
+	while (rb) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (!rq) { /* lazily cleanup after another engine handled rq */
+			rb_erase_cached(rb, &el->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&el->virtual);
+			continue;
+		}
+
+		if (!virtual_matches(ve, rq, engine)) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		return ve;
+	}
+
+	return NULL;
+}
+
 static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
 {
 	/*
@@ -1896,7 +1923,7 @@ static void defer_active(struct intel_engine_cs *engine)
 static bool
 need_timeslice(const struct intel_engine_cs *engine,
 	       const struct i915_request *rq,
-	       const struct rb_node *rb)
+	       struct virtual_engine *ve)
 {
 	int hint;
 
@@ -1905,9 +1932,7 @@ need_timeslice(const struct intel_engine_cs *engine,
 
 	hint = engine->execlists.queue_priority_hint;
 
-	if (rb) {
-		const struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	if (ve) {
 		const struct intel_engine_cs *inflight =
 			intel_context_inflight(&ve->context);
 
@@ -2057,7 +2082,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **port = execlists->pending;
 	struct i915_request ** const last_port = port + execlists->port_mask;
-	struct i915_request * const *active;
+	struct i915_request * const *active = READ_ONCE(execlists->active);
+	struct virtual_engine *ve = first_virtual_engine(engine);
 	struct i915_request *last;
 	struct rb_node *rb;
 	bool submit = false;
@@ -2084,26 +2110,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
-	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
-		struct i915_request *rq = READ_ONCE(ve->request);
-
-		if (!rq) { /* lazily cleanup after another engine handled rq */
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
-
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
-
-		break;
-	}
-
 	/*
 	 * If the queue is higher priority than the last
 	 * request in the currently active context, submit afresh.
@@ -2111,10 +2117,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * the active context to interject the preemption request,
 	 * i.e. we will retrigger preemption following the ack in case
 	 * of trouble.
-	 */
-	active = READ_ONCE(execlists->active);
-
-	/*
+	 *
 	 * In theory we can skip over completed contexts that have not
 	 * yet been processed by events (as those events are in flight):
 	 *
@@ -2125,9 +2128,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * find itself trying to jump back into a context it has just
 	 * completed and barf.
 	 */
-
 	if ((last = *active)) {
-		if (need_preempt(engine, last, rb)) {
+		if (need_preempt(engine, last, ve)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
 				return;
@@ -2158,7 +2160,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			__unwind_incomplete_requests(engine);
 
 			last = NULL;
-		} else if (need_timeslice(engine, last, rb) &&
+		} else if (need_timeslice(engine, last, ve) &&
 			   timeslice_expired(execlists, last)) {
 			if (i915_request_completed(last)) {
 				tasklet_hi_schedule(&execlists->tasklet);
@@ -2212,57 +2214,53 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 	}
 
-	while (rb) { /* XXX virtual is always taking precedence */
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	while (ve) { /* XXX virtual is always taking precedence */
 		struct i915_request *rq;
 
 		spin_lock(&ve->base.active.lock);
 
 		rq = ve->request;
-		if (unlikely(!rq)) { /* lost the race to a sibling */
-			spin_unlock(&ve->base.active.lock);
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
+		if (unlikely(!rq)) /* lost the race to a sibling */
+			goto unlock;
 
 		GEM_BUG_ON(rq != ve->request);
 		GEM_BUG_ON(rq->engine != &ve->base);
 		GEM_BUG_ON(rq->context != &ve->context);
 
-		if (rq_prio(rq) >= queue_prio(execlists)) {
-			if (!virtual_matches(ve, rq, engine)) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_next(rb);
-				continue;
-			}
+		if (rq_prio(rq) < queue_prio(execlists)) {
+			spin_unlock(&ve->base.active.lock);
+			break;
+		}
 
-			if (last && !can_merge_rq(last, rq)) {
-				spin_unlock(&ve->base.active.lock);
-				start_timeslice(engine, rq_prio(rq));
-				return; /* leave this for another sibling */
-			}
+		GEM_BUG_ON(!virtual_matches(ve, rq, engine));
 
-			ENGINE_TRACE(engine,
-				     "virtual rq=%llx:%lld%s, new engine? %s\n",
-				     rq->fence.context,
-				     rq->fence.seqno,
-				     i915_request_completed(rq) ? "!" :
-				     i915_request_started(rq) ? "*" :
-				     "",
-				     yesno(engine != ve->siblings[0]));
-
-			WRITE_ONCE(ve->request, NULL);
-			WRITE_ONCE(ve->base.execlists.queue_priority_hint,
-				   INT_MIN);
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
+		if (last && !can_merge_rq(last, rq)) {
+			spin_unlock(&ve->base.active.lock);
+			start_timeslice(engine, rq_prio(rq));
+			return; /* leave this for another sibling */
+		}
+
+		ENGINE_TRACE(engine,
+			     "virtual rq=%llx:%lld%s, new engine? %s\n",
+			     rq->fence.context,
+			     rq->fence.seqno,
+			     i915_request_completed(rq) ? "!" :
+			     i915_request_started(rq) ? "*" :
+			     "",
+			     yesno(engine != ve->siblings[0]));
 
-			GEM_BUG_ON(!(rq->execution_mask & engine->mask));
-			WRITE_ONCE(rq->engine, engine);
+		WRITE_ONCE(ve->request, NULL);
+		WRITE_ONCE(ve->base.execlists.queue_priority_hint,
+			   INT_MIN);
 
+		rb = &ve->nodes[engine->id].rb;
+		rb_erase_cached(rb, &execlists->virtual);
+		RB_CLEAR_NODE(rb);
+
+		GEM_BUG_ON(!(rq->execution_mask & engine->mask));
+		WRITE_ONCE(rq->engine, engine);
+
+		if (__i915_request_submit(rq)) {
 			if (engine != ve->siblings[0]) {
 				u32 *regs = ve->context.lrc_reg_state;
 				unsigned int n;
@@ -2294,28 +2292,22 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				GEM_BUG_ON(ve->siblings[0] != engine);
 			}
 
-			if (__i915_request_submit(rq)) {
-				submit = true;
-				last = rq;
-			}
-			i915_request_put(rq);
-
-			/*
-			 * Hmm, we have a bunch of virtual engine requests,
-			 * but the first one was already completed (thanks
-			 * preempt-to-busy!). Keep looking at the veng queue
-			 * until we have no more relevant requests (i.e.
-			 * the normal submit queue has higher priority).
-			 */
-			if (!submit) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_first_cached(&execlists->virtual);
-				continue;
-			}
+			submit = true;
+			last = rq;
 		}
 
+		i915_request_put(rq);
+unlock:
 		spin_unlock(&ve->base.active.lock);
-		break;
+
+		/*
+		 * Hmm, we have a bunch of virtual engine requests,
+		 * but the first one was already completed (thanks
+		 * preempt-to-busy!). Keep looking at the veng queue
+		 * until we have no more relevant requests (i.e.
+		 * the normal submit queue has higher priority).
+		 */
+		ve = submit ? NULL : first_virtual_engine(engine);
 	}
 
 	while ((rb = rb_first_cached(&execlists->queue))) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines
  2020-05-18  8:14 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
@ 2020-05-18 12:53   ` Tvrtko Ursulin
  2020-05-18 13:00     ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 12:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 09:14, Chris Wilson wrote:
> Once a virtual engine has been bound to a sibling, it will remain bound
> until we finally schedule out the last active request. We can not rebind
> the context to a new sibling while it is inflight as the context save
> will conflict, hence we wait. As we cannot then use any other sibliing
> while the context is inflight, only kick the bound sibling while it
> inflight and upon scheduling out the kick the rest (so that we can swap
> engines on timeslicing if the previously bound engine becomes
> oversubscribed).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
>   1 file changed, 13 insertions(+), 17 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 7a5ac3375225..fe8f3518d6b8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
>   static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>   {
>   	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> -	struct i915_request *next = READ_ONCE(ve->request);
>   
> -	if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
> +	if (READ_ONCE(ve->request))
>   		tasklet_hi_schedule(&ve->base.execlists.tasklet);
>   }
>   
> @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
>   			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
>   		struct i915_request *rq = READ_ONCE(ve->request);
>   
> -		if (!rq) { /* lazily cleanup after another engine handled rq */
> +		/* lazily cleanup after another engine handled rq */
> +		if (!rq || !virtual_matches(ve, rq, engine)) {
>   			rb_erase_cached(rb, &el->virtual);
>   			RB_CLEAR_NODE(rb);
>   			rb = rb_first_cached(&el->virtual);
>   			continue;
>   		}
>   
> -		if (!virtual_matches(ve, rq, engine)) {
> -			rb = rb_next(rb);
> -			continue;
> -		}
> -
>   		return ve;
>   	}
>   
> @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
>   	if (unlikely(!mask))
>   		return;
>   
> -	local_irq_disable();
>   	for (n = 0; n < ve->num_siblings; n++) {
>   		struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
>   		struct ve_node * const node = &ve->nodes[sibling->id];
> @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
>   		if (!READ_ONCE(ve->request))
>   			break; /* already handled by a sibling's tasklet */
>   
> +		spin_lock_irq(&sibling->active.lock);
> +
>   		if (unlikely(!(mask & sibling->mask))) {
>   			if (!RB_EMPTY_NODE(&node->rb)) {
> -				spin_lock(&sibling->active.lock);
>   				rb_erase_cached(&node->rb,
>   						&sibling->execlists.virtual);
>   				RB_CLEAR_NODE(&node->rb);
> -				spin_unlock(&sibling->active.lock);
>   			}
> -			continue;
> -		}
>   
> -		spin_lock(&sibling->active.lock);
> +			goto unlock_engine;
> +		}
>   
> -		if (!RB_EMPTY_NODE(&node->rb)) {
> +		if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
>   			/*
>   			 * Cheat and avoid rebalancing the tree if we can
>   			 * reuse this node in situ.
> @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
>   		if (first && prio >= sibling->execlists.queue_priority_hint)
>   			tasklet_hi_schedule(&sibling->execlists.tasklet);
>   
> -		spin_unlock(&sibling->active.lock);
> +unlock_engine:
> +		spin_unlock_irq(&sibling->active.lock);
> +
> +		if (intel_context_inflight(&ve->context))
> +			break;

So virtual request may not be added to all siblings any longer. Will it 
still be able to schedule it on any if time slicing kicks in under these 
conditions?

This is equivalent to the hunk in first_virtual_engine which also 
removes it from all other siblings.

I guess it's inline with what the commit messages says - that new 
sibling will be picked upon time slicing. I just don't quite see the 
path which would do it. Only path which shuffles the siblings array 
around is in dequeue, and dequeue on other that the engine which first 
picked it will not happen any more. I must be missing something..

Regards,

Tvrtko

>   	}
> -	local_irq_enable();
>   }
>   
>   static void virtual_submit_request(struct i915_request *rq)
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines
  2020-05-18 12:53   ` Tvrtko Ursulin
@ 2020-05-18 13:00     ` Chris Wilson
  2020-05-18 14:55       ` Tvrtko Ursulin
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 13:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 13:53:29)
> 
> On 18/05/2020 09:14, Chris Wilson wrote:
> > Once a virtual engine has been bound to a sibling, it will remain bound
> > until we finally schedule out the last active request. We can not rebind
> > the context to a new sibling while it is inflight as the context save
> > will conflict, hence we wait. As we cannot then use any other sibliing
> > while the context is inflight, only kick the bound sibling while it
> > inflight and upon scheduling out the kick the rest (so that we can swap
> > engines on timeslicing if the previously bound engine becomes
> > oversubscribed).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
> >   1 file changed, 13 insertions(+), 17 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 7a5ac3375225..fe8f3518d6b8 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
> >   static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> >   {
> >       struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> > -     struct i915_request *next = READ_ONCE(ve->request);
> >   
> > -     if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
> > +     if (READ_ONCE(ve->request))
> >               tasklet_hi_schedule(&ve->base.execlists.tasklet);
> >   }
> >   
> > @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
> >                       rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> >               struct i915_request *rq = READ_ONCE(ve->request);
> >   
> > -             if (!rq) { /* lazily cleanup after another engine handled rq */
> > +             /* lazily cleanup after another engine handled rq */
> > +             if (!rq || !virtual_matches(ve, rq, engine)) {
> >                       rb_erase_cached(rb, &el->virtual);
> >                       RB_CLEAR_NODE(rb);
> >                       rb = rb_first_cached(&el->virtual);
> >                       continue;
> >               }
> >   
> > -             if (!virtual_matches(ve, rq, engine)) {
> > -                     rb = rb_next(rb);
> > -                     continue;
> > -             }
> > -
> >               return ve;
> >       }
> >   
> > @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
> >       if (unlikely(!mask))
> >               return;
> >   
> > -     local_irq_disable();
> >       for (n = 0; n < ve->num_siblings; n++) {
> >               struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
> >               struct ve_node * const node = &ve->nodes[sibling->id];
> > @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
> >               if (!READ_ONCE(ve->request))
> >                       break; /* already handled by a sibling's tasklet */
> >   
> > +             spin_lock_irq(&sibling->active.lock);
> > +
> >               if (unlikely(!(mask & sibling->mask))) {
> >                       if (!RB_EMPTY_NODE(&node->rb)) {
> > -                             spin_lock(&sibling->active.lock);
> >                               rb_erase_cached(&node->rb,
> >                                               &sibling->execlists.virtual);
> >                               RB_CLEAR_NODE(&node->rb);
> > -                             spin_unlock(&sibling->active.lock);
> >                       }
> > -                     continue;
> > -             }
> >   
> > -             spin_lock(&sibling->active.lock);
> > +                     goto unlock_engine;
> > +             }
> >   
> > -             if (!RB_EMPTY_NODE(&node->rb)) {
> > +             if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
> >                       /*
> >                        * Cheat and avoid rebalancing the tree if we can
> >                        * reuse this node in situ.
> > @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
> >               if (first && prio >= sibling->execlists.queue_priority_hint)
> >                       tasklet_hi_schedule(&sibling->execlists.tasklet);
> >   
> > -             spin_unlock(&sibling->active.lock);
> > +unlock_engine:
> > +             spin_unlock_irq(&sibling->active.lock);
> > +
> > +             if (intel_context_inflight(&ve->context))
> > +                     break;
> 
> So virtual request may not be added to all siblings any longer. Will it 
> still be able to schedule it on any if time slicing kicks in under these 
> conditions?

Yes.
 
> This is equivalent to the hunk in first_virtual_engine which also 
> removes it from all other siblings.
> 
> I guess it's inline with what the commit messages says - that new 
> sibling will be picked upon time slicing. I just don't quite see the 
> path which would do it. Only path which shuffles the siblings array 
> around is in dequeue, and dequeue on other that the engine which first 
> picked it will not happen any more. I must be missing something..

It's all on the execlists_schedule_out. During timeslicing we call
unwind_incomplete_requests which moves the requests back to the priotree
(and in this patch back to the virtual engine).

But... We cannot use the virtual request on any other engine until it has
been scheduled out. That only happens after we complete execlists_dequeue()
and submit a new ELSP[]. Once the HW acks the change, we call
execlists_schedule_out on the virtual_request.

Now we known that intel_context_inflight() will return false so any
engine can pick up the request, and so it's time to kick the virtual
tasklet and in turn kick all the siblings.

So timeslicing works by not submitting the virtual request again and
when it expires on this sibling[0], we wake up all the other siblings
and the first that is idle wins the race.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-05-18 12:33   ` [Intel-gfx] [PATCH v2] " Chris Wilson
@ 2020-05-18 13:01     ` Tvrtko Ursulin
  2020-05-18 13:09       ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 13:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 13:33, Chris Wilson wrote:
> Rather than going back and forth between the rb_node entry and the
> virtual_engine type, store the ve local and reuse it. As the
> container_of conversion from rb_node to virtual_engine requires a
> variable offset, performing that conversion just once shaves off a bit
> of code.
> 
> v2: Keep a single virtual engine lookup, for typical use.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 176 +++++++++++++---------------
>   1 file changed, 84 insertions(+), 92 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 8524c5f3a329..7843bf3f3f1f 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -451,7 +451,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
>   				const struct i915_request *rq,
> -				struct rb_node *rb)
> +				struct virtual_engine *ve)
>   {
>   	int last_prio;
>   
> @@ -488,9 +488,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
>   		return true;
>   
> -	if (rb) {
> -		struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +	if (ve) {
>   		bool preempt = false;
>   
>   		if (engine == ve->siblings[0]) { /* only preempt one sibling */
> @@ -1812,6 +1810,35 @@ static bool virtual_matches(const struct virtual_engine *ve,
>   	return true;
>   }
>   
> +static struct virtual_engine *
> +first_virtual_engine(struct intel_engine_cs *engine)
> +{
> +	struct intel_engine_execlists *el = &engine->execlists;
> +	struct rb_node *rb = rb_first_cached(&el->virtual);
> +
> +	while (rb) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +
> +		if (!rq) { /* lazily cleanup after another engine handled rq */
> +			rb_erase_cached(rb, &el->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&el->virtual);
> +			continue;
> +		}
> +
> +		if (!virtual_matches(ve, rq, engine)) {
> +			rb = rb_next(rb);
> +			continue;
> +		}
> +
> +		return ve;
> +	}
> +
> +	return NULL;
> +}
> +
>   static void virtual_xfer_breadcrumbs(struct virtual_engine *ve)
>   {
>   	/*
> @@ -1896,7 +1923,7 @@ static void defer_active(struct intel_engine_cs *engine)
>   static bool
>   need_timeslice(const struct intel_engine_cs *engine,
>   	       const struct i915_request *rq,
> -	       const struct rb_node *rb)
> +	       struct virtual_engine *ve)
>   {
>   	int hint;
>   
> @@ -1905,9 +1932,7 @@ need_timeslice(const struct intel_engine_cs *engine,
>   
>   	hint = engine->execlists.queue_priority_hint;
>   
> -	if (rb) {
> -		const struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +	if (ve) {
>   		const struct intel_engine_cs *inflight =
>   			intel_context_inflight(&ve->context);
>   
> @@ -2057,7 +2082,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
>   	struct i915_request **port = execlists->pending;
>   	struct i915_request ** const last_port = port + execlists->port_mask;
> -	struct i915_request * const *active;
> +	struct i915_request * const *active = READ_ONCE(execlists->active);
> +	struct virtual_engine *ve = first_virtual_engine(engine);
>   	struct i915_request *last;
>   	struct rb_node *rb;
>   	bool submit = false;
> @@ -2084,26 +2110,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> -	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> -		struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> -		struct i915_request *rq = READ_ONCE(ve->request);
> -
> -		if (!rq) { /* lazily cleanup after another engine handled rq */
> -			rb_erase_cached(rb, &execlists->virtual);
> -			RB_CLEAR_NODE(rb);
> -			rb = rb_first_cached(&execlists->virtual);
> -			continue;
> -		}
> -
> -		if (!virtual_matches(ve, rq, engine)) {
> -			rb = rb_next(rb);
> -			continue;
> -		}
> -
> -		break;
> -	}
> -
>   	/*
>   	 * If the queue is higher priority than the last
>   	 * request in the currently active context, submit afresh.
> @@ -2111,10 +2117,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * the active context to interject the preemption request,
>   	 * i.e. we will retrigger preemption following the ack in case
>   	 * of trouble.
> -	 */
> -	active = READ_ONCE(execlists->active);
> -
> -	/*
> +	 *
>   	 * In theory we can skip over completed contexts that have not
>   	 * yet been processed by events (as those events are in flight):
>   	 *
> @@ -2125,9 +2128,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * find itself trying to jump back into a context it has just
>   	 * completed and barf.
>   	 */
> -
>   	if ((last = *active)) {
> -		if (need_preempt(engine, last, rb)) {
> +		if (need_preempt(engine, last, ve)) {
>   			if (i915_request_completed(last)) {
>   				tasklet_hi_schedule(&execlists->tasklet);
>   				return;
> @@ -2158,7 +2160,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   			__unwind_incomplete_requests(engine);
>   
>   			last = NULL;
> -		} else if (need_timeslice(engine, last, rb) &&
> +		} else if (need_timeslice(engine, last, ve) &&
>   			   timeslice_expired(execlists, last)) {
>   			if (i915_request_completed(last)) {
>   				tasklet_hi_schedule(&execlists->tasklet);
> @@ -2212,57 +2214,53 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		}
>   	}
>   
> -	while (rb) { /* XXX virtual is always taking precedence */
> -		struct virtual_engine *ve =
> -			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +	while (ve) { /* XXX virtual is always taking precedence */
>   		struct i915_request *rq;
>   
>   		spin_lock(&ve->base.active.lock);
>   
>   		rq = ve->request;
> -		if (unlikely(!rq)) { /* lost the race to a sibling */
> -			spin_unlock(&ve->base.active.lock);
> -			rb_erase_cached(rb, &execlists->virtual);
> -			RB_CLEAR_NODE(rb);
> -			rb = rb_first_cached(&execlists->virtual);
> -			continue;
> -		}
> +		if (unlikely(!rq)) /* lost the race to a sibling */
> +			goto unlock;

Doesn't this now rely on a later patch to clear the node?

>   
>   		GEM_BUG_ON(rq != ve->request);
>   		GEM_BUG_ON(rq->engine != &ve->base);
>   		GEM_BUG_ON(rq->context != &ve->context);
>   
> -		if (rq_prio(rq) >= queue_prio(execlists)) {
> -			if (!virtual_matches(ve, rq, engine)) {
> -				spin_unlock(&ve->base.active.lock);
> -				rb = rb_next(rb);
> -				continue;
> -			}
> +		if (rq_prio(rq) < queue_prio(execlists)) {
> +			spin_unlock(&ve->base.active.lock);
> +			break;
> +		}
>   
> -			if (last && !can_merge_rq(last, rq)) {
> -				spin_unlock(&ve->base.active.lock);
> -				start_timeslice(engine, rq_prio(rq));
> -				return; /* leave this for another sibling */
> -			}
> +		GEM_BUG_ON(!virtual_matches(ve, rq, engine));

This as well.

Regards,

Tvrtko

>   
> -			ENGINE_TRACE(engine,
> -				     "virtual rq=%llx:%lld%s, new engine? %s\n",
> -				     rq->fence.context,
> -				     rq->fence.seqno,
> -				     i915_request_completed(rq) ? "!" :
> -				     i915_request_started(rq) ? "*" :
> -				     "",
> -				     yesno(engine != ve->siblings[0]));
> -
> -			WRITE_ONCE(ve->request, NULL);
> -			WRITE_ONCE(ve->base.execlists.queue_priority_hint,
> -				   INT_MIN);
> -			rb_erase_cached(rb, &execlists->virtual);
> -			RB_CLEAR_NODE(rb);
> +		if (last && !can_merge_rq(last, rq)) {
> +			spin_unlock(&ve->base.active.lock);
> +			start_timeslice(engine, rq_prio(rq));
> +			return; /* leave this for another sibling */
> +		}
> +
> +		ENGINE_TRACE(engine,
> +			     "virtual rq=%llx:%lld%s, new engine? %s\n",
> +			     rq->fence.context,
> +			     rq->fence.seqno,
> +			     i915_request_completed(rq) ? "!" :
> +			     i915_request_started(rq) ? "*" :
> +			     "",
> +			     yesno(engine != ve->siblings[0]));
>   
> -			GEM_BUG_ON(!(rq->execution_mask & engine->mask));
> -			WRITE_ONCE(rq->engine, engine);
> +		WRITE_ONCE(ve->request, NULL);
> +		WRITE_ONCE(ve->base.execlists.queue_priority_hint,
> +			   INT_MIN);
>   
> +		rb = &ve->nodes[engine->id].rb;
> +		rb_erase_cached(rb, &execlists->virtual);
> +		RB_CLEAR_NODE(rb);
> +
> +		GEM_BUG_ON(!(rq->execution_mask & engine->mask));
> +		WRITE_ONCE(rq->engine, engine);
> +
> +		if (__i915_request_submit(rq)) {
>   			if (engine != ve->siblings[0]) {
>   				u32 *regs = ve->context.lrc_reg_state;
>   				unsigned int n;
> @@ -2294,28 +2292,22 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   				GEM_BUG_ON(ve->siblings[0] != engine);
>   			}
>   
> -			if (__i915_request_submit(rq)) {
> -				submit = true;
> -				last = rq;
> -			}
> -			i915_request_put(rq);
> -
> -			/*
> -			 * Hmm, we have a bunch of virtual engine requests,
> -			 * but the first one was already completed (thanks
> -			 * preempt-to-busy!). Keep looking at the veng queue
> -			 * until we have no more relevant requests (i.e.
> -			 * the normal submit queue has higher priority).
> -			 */
> -			if (!submit) {
> -				spin_unlock(&ve->base.active.lock);
> -				rb = rb_first_cached(&execlists->virtual);
> -				continue;
> -			}
> +			submit = true;
> +			last = rq;
>   		}
>   
> +		i915_request_put(rq);
> +unlock:
>   		spin_unlock(&ve->base.active.lock);
> -		break;
> +
> +		/*
> +		 * Hmm, we have a bunch of virtual engine requests,
> +		 * but the first one was already completed (thanks
> +		 * preempt-to-busy!). Keep looking at the veng queue
> +		 * until we have no more relevant requests (i.e.
> +		 * the normal submit queue has higher priority).
> +		 */
> +		ve = submit ? NULL : first_virtual_engine(engine);
>   	}
>   
>   	while ((rb = rb_first_cached(&execlists->queue))) {
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH v2] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-05-18 13:01     ` Tvrtko Ursulin
@ 2020-05-18 13:09       ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 13:09 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 14:01:27)
> 
> On 18/05/2020 13:33, Chris Wilson wrote:
> > +static struct virtual_engine *
> > +first_virtual_engine(struct intel_engine_cs *engine)
> > +{
> > +     struct intel_engine_execlists *el = &engine->execlists;
> > +     struct rb_node *rb = rb_first_cached(&el->virtual);
> > +
> > +     while (rb) {
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +             struct i915_request *rq = READ_ONCE(ve->request);
> > +
> > +             if (!rq) { /* lazily cleanup after another engine handled rq */
> > +                     rb_erase_cached(rb, &el->virtual);
> > +                     RB_CLEAR_NODE(rb);
> > +                     rb = rb_first_cached(&el->virtual);
> > +                     continue;
> > +             }
> > +
> > +             if (!virtual_matches(ve, rq, engine)) {
> > +                     rb = rb_next(rb);
> > +                     continue;
> > +             }
> > +
> > +             return ve;
> > +     }
> > +
> > +     return NULL;
> > +}

> > -     while (rb) { /* XXX virtual is always taking precedence */
> > -             struct virtual_engine *ve =
> > -                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +     while (ve) { /* XXX virtual is always taking precedence */
> >               struct i915_request *rq;
> >   
> >               spin_lock(&ve->base.active.lock);
> >   
> >               rq = ve->request;
> > -             if (unlikely(!rq)) { /* lost the race to a sibling */
> > -                     spin_unlock(&ve->base.active.lock);
> > -                     rb_erase_cached(rb, &execlists->virtual);
> > -                     RB_CLEAR_NODE(rb);
> > -                     rb = rb_first_cached(&execlists->virtual);
> > -                     continue;
> > -             }
> > +             if (unlikely(!rq)) /* lost the race to a sibling */
> > +                     goto unlock;
> 
> Doesn't this now rely on a later patch to clear the node?

This is cleared by first_virtual_engine

> >   
> >               GEM_BUG_ON(rq != ve->request);
> >               GEM_BUG_ON(rq->engine != &ve->base);
> >               GEM_BUG_ON(rq->context != &ve->context);
> >   
> > -             if (rq_prio(rq) >= queue_prio(execlists)) {
> > -                     if (!virtual_matches(ve, rq, engine)) {
> > -                             spin_unlock(&ve->base.active.lock);
> > -                             rb = rb_next(rb);
> > -                             continue;
> > -                     }
> > +             if (rq_prio(rq) < queue_prio(execlists)) {
> > +                     spin_unlock(&ve->base.active.lock);
> > +                     break;
> > +             }
> >   
> > -                     if (last && !can_merge_rq(last, rq)) {
> > -                             spin_unlock(&ve->base.active.lock);
> > -                             start_timeslice(engine, rq_prio(rq));
> > -                             return; /* leave this for another sibling */
> > -                     }
> > +             GEM_BUG_ON(!virtual_matches(ve, rq, engine));
> 
> This as well.

As first_virtual_engine skips non-matching nodes, it should only
unmatch during the unlocked phase since the lookup if it is claimed by
another engine. Then ve->request would be set to NULL and we the above
check fails.

So I think this patch stands by itself.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines
  2020-05-18 13:00     ` Chris Wilson
@ 2020-05-18 14:55       ` Tvrtko Ursulin
  2020-05-18 15:40         ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Tvrtko Ursulin @ 2020-05-18 14:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/05/2020 14:00, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-05-18 13:53:29)
>>
>> On 18/05/2020 09:14, Chris Wilson wrote:
>>> Once a virtual engine has been bound to a sibling, it will remain bound
>>> until we finally schedule out the last active request. We can not rebind
>>> the context to a new sibling while it is inflight as the context save
>>> will conflict, hence we wait. As we cannot then use any other sibliing
>>> while the context is inflight, only kick the bound sibling while it
>>> inflight and upon scheduling out the kick the rest (so that we can swap
>>> engines on timeslicing if the previously bound engine becomes
>>> oversubscribed).
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
>>>    1 file changed, 13 insertions(+), 17 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index 7a5ac3375225..fe8f3518d6b8 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
>>>    static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
>>>    {
>>>        struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
>>> -     struct i915_request *next = READ_ONCE(ve->request);
>>>    
>>> -     if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
>>> +     if (READ_ONCE(ve->request))
>>>                tasklet_hi_schedule(&ve->base.execlists.tasklet);
>>>    }
>>>    
>>> @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
>>>                        rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
>>>                struct i915_request *rq = READ_ONCE(ve->request);
>>>    
>>> -             if (!rq) { /* lazily cleanup after another engine handled rq */
>>> +             /* lazily cleanup after another engine handled rq */
>>> +             if (!rq || !virtual_matches(ve, rq, engine)) {
>>>                        rb_erase_cached(rb, &el->virtual);
>>>                        RB_CLEAR_NODE(rb);
>>>                        rb = rb_first_cached(&el->virtual);
>>>                        continue;
>>>                }
>>>    
>>> -             if (!virtual_matches(ve, rq, engine)) {
>>> -                     rb = rb_next(rb);
>>> -                     continue;
>>> -             }
>>> -
>>>                return ve;
>>>        }
>>>    
>>> @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
>>>        if (unlikely(!mask))
>>>                return;
>>>    
>>> -     local_irq_disable();
>>>        for (n = 0; n < ve->num_siblings; n++) {
>>>                struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
>>>                struct ve_node * const node = &ve->nodes[sibling->id];
>>> @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
>>>                if (!READ_ONCE(ve->request))
>>>                        break; /* already handled by a sibling's tasklet */
>>>    
>>> +             spin_lock_irq(&sibling->active.lock);
>>> +
>>>                if (unlikely(!(mask & sibling->mask))) {
>>>                        if (!RB_EMPTY_NODE(&node->rb)) {
>>> -                             spin_lock(&sibling->active.lock);
>>>                                rb_erase_cached(&node->rb,
>>>                                                &sibling->execlists.virtual);
>>>                                RB_CLEAR_NODE(&node->rb);
>>> -                             spin_unlock(&sibling->active.lock);
>>>                        }
>>> -                     continue;
>>> -             }
>>>    
>>> -             spin_lock(&sibling->active.lock);
>>> +                     goto unlock_engine;
>>> +             }
>>>    
>>> -             if (!RB_EMPTY_NODE(&node->rb)) {
>>> +             if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
>>>                        /*
>>>                         * Cheat and avoid rebalancing the tree if we can
>>>                         * reuse this node in situ.
>>> @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
>>>                if (first && prio >= sibling->execlists.queue_priority_hint)
>>>                        tasklet_hi_schedule(&sibling->execlists.tasklet);
>>>    
>>> -             spin_unlock(&sibling->active.lock);
>>> +unlock_engine:
>>> +             spin_unlock_irq(&sibling->active.lock);
>>> +
>>> +             if (intel_context_inflight(&ve->context))
>>> +                     break;
>>
>> So virtual request may not be added to all siblings any longer. Will it
>> still be able to schedule it on any if time slicing kicks in under these
>> conditions?
> 
> Yes.
>   
>> This is equivalent to the hunk in first_virtual_engine which also
>> removes it from all other siblings.
>>
>> I guess it's inline with what the commit messages says - that new
>> sibling will be picked upon time slicing. I just don't quite see the
>> path which would do it. Only path which shuffles the siblings array
>> around is in dequeue, and dequeue on other that the engine which first
>> picked it will not happen any more. I must be missing something..
> 
> It's all on the execlists_schedule_out. During timeslicing we call
> unwind_incomplete_requests which moves the requests back to the priotree
> (and in this patch back to the virtual engine).
> 
> But... We cannot use the virtual request on any other engine until it has
> been scheduled out. That only happens after we complete execlists_dequeue()
> and submit a new ELSP[]. Once the HW acks the change, we call
> execlists_schedule_out on the virtual_request.
> 
> Now we known that intel_context_inflight() will return false so any
> engine can pick up the request, and so it's time to kick the virtual
> tasklet and in turn kick all the siblings.
> 
> So timeslicing works by not submitting the virtual request again and
> when it expires on this sibling[0], we wake up all the other siblings
> and the first that is idle wins the race.

If a virtual request is on hw and timeslice expires:

1. Unwinds the request.
       -> kicks the virtual tasklet
2. Virtual tasklet runs and puts the request back on siblings.
       -> kicks the physical tasklets
3. Siblings tasklet runs and submits the request.

So two tasklets latency even if no other runnable requests?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines
  2020-05-18 14:55       ` Tvrtko Ursulin
@ 2020-05-18 15:40         ` Chris Wilson
  2020-05-18 15:48           ` Chris Wilson
  0 siblings, 1 reply; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 15:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-05-18 15:55:46)
> 
> On 18/05/2020 14:00, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-05-18 13:53:29)
> >>
> >> On 18/05/2020 09:14, Chris Wilson wrote:
> >>> Once a virtual engine has been bound to a sibling, it will remain bound
> >>> until we finally schedule out the last active request. We can not rebind
> >>> the context to a new sibling while it is inflight as the context save
> >>> will conflict, hence we wait. As we cannot then use any other sibliing
> >>> while the context is inflight, only kick the bound sibling while it
> >>> inflight and upon scheduling out the kick the rest (so that we can swap
> >>> engines on timeslicing if the previously bound engine becomes
> >>> oversubscribed).
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
> >>>    1 file changed, 13 insertions(+), 17 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 7a5ac3375225..fe8f3518d6b8 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
> >>>    static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> >>>    {
> >>>        struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> >>> -     struct i915_request *next = READ_ONCE(ve->request);
> >>>    
> >>> -     if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
> >>> +     if (READ_ONCE(ve->request))
> >>>                tasklet_hi_schedule(&ve->base.execlists.tasklet);
> >>>    }
> >>>    
> >>> @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
> >>>                        rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> >>>                struct i915_request *rq = READ_ONCE(ve->request);
> >>>    
> >>> -             if (!rq) { /* lazily cleanup after another engine handled rq */
> >>> +             /* lazily cleanup after another engine handled rq */
> >>> +             if (!rq || !virtual_matches(ve, rq, engine)) {
> >>>                        rb_erase_cached(rb, &el->virtual);
> >>>                        RB_CLEAR_NODE(rb);
> >>>                        rb = rb_first_cached(&el->virtual);
> >>>                        continue;
> >>>                }
> >>>    
> >>> -             if (!virtual_matches(ve, rq, engine)) {
> >>> -                     rb = rb_next(rb);
> >>> -                     continue;
> >>> -             }
> >>> -
> >>>                return ve;
> >>>        }
> >>>    
> >>> @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
> >>>        if (unlikely(!mask))
> >>>                return;
> >>>    
> >>> -     local_irq_disable();
> >>>        for (n = 0; n < ve->num_siblings; n++) {
> >>>                struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
> >>>                struct ve_node * const node = &ve->nodes[sibling->id];
> >>> @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
> >>>                if (!READ_ONCE(ve->request))
> >>>                        break; /* already handled by a sibling's tasklet */
> >>>    
> >>> +             spin_lock_irq(&sibling->active.lock);
> >>> +
> >>>                if (unlikely(!(mask & sibling->mask))) {
> >>>                        if (!RB_EMPTY_NODE(&node->rb)) {
> >>> -                             spin_lock(&sibling->active.lock);
> >>>                                rb_erase_cached(&node->rb,
> >>>                                                &sibling->execlists.virtual);
> >>>                                RB_CLEAR_NODE(&node->rb);
> >>> -                             spin_unlock(&sibling->active.lock);
> >>>                        }
> >>> -                     continue;
> >>> -             }
> >>>    
> >>> -             spin_lock(&sibling->active.lock);
> >>> +                     goto unlock_engine;
> >>> +             }
> >>>    
> >>> -             if (!RB_EMPTY_NODE(&node->rb)) {
> >>> +             if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
> >>>                        /*
> >>>                         * Cheat and avoid rebalancing the tree if we can
> >>>                         * reuse this node in situ.
> >>> @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
> >>>                if (first && prio >= sibling->execlists.queue_priority_hint)
> >>>                        tasklet_hi_schedule(&sibling->execlists.tasklet);
> >>>    
> >>> -             spin_unlock(&sibling->active.lock);
> >>> +unlock_engine:
> >>> +             spin_unlock_irq(&sibling->active.lock);
> >>> +
> >>> +             if (intel_context_inflight(&ve->context))
> >>> +                     break;
> >>
> >> So virtual request may not be added to all siblings any longer. Will it
> >> still be able to schedule it on any if time slicing kicks in under these
> >> conditions?
> > 
> > Yes.
> >   
> >> This is equivalent to the hunk in first_virtual_engine which also
> >> removes it from all other siblings.
> >>
> >> I guess it's inline with what the commit messages says - that new
> >> sibling will be picked upon time slicing. I just don't quite see the
> >> path which would do it. Only path which shuffles the siblings array
> >> around is in dequeue, and dequeue on other that the engine which first
> >> picked it will not happen any more. I must be missing something..
> > 
> > It's all on the execlists_schedule_out. During timeslicing we call
> > unwind_incomplete_requests which moves the requests back to the priotree
> > (and in this patch back to the virtual engine).
> > 
> > But... We cannot use the virtual request on any other engine until it has
> > been scheduled out. That only happens after we complete execlists_dequeue()
> > and submit a new ELSP[]. Once the HW acks the change, we call
> > execlists_schedule_out on the virtual_request.
> > 
> > Now we known that intel_context_inflight() will return false so any
> > engine can pick up the request, and so it's time to kick the virtual
> > tasklet and in turn kick all the siblings.
> > 
> > So timeslicing works by not submitting the virtual request again and
> > when it expires on this sibling[0], we wake up all the other siblings
> > and the first that is idle wins the race.
> 
> If a virtual request is on hw and timeslice expires:
> 
> 1. Unwinds the request.
>        -> kicks the virtual tasklet
> 2. Virtual tasklet runs and puts the request back on siblings.
>        -> kicks the physical tasklets
> 3. Siblings tasklet runs and submits the request.
> 
> So two tasklets latency even if no other runnable requests?

Yes. It's worse than that when you look at it. Since the other
execlists_submission_tasklet will run *before* we schedule out, they do
nothing as they cannot use the virtual request. So they will only
function when we kick them again after the schedule-out.

This was "bug" number 3 I mentioned you found with your tiny snippet.
In the next patch we optimise away the ineffective timeslice preemptions
of the virtual request. The inter-engine inefficiencies are inherent to
the system though -- we can only schedule the other engines once we have
scheduled out after the context save of the virtual engine.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines
  2020-05-18 15:40         ` Chris Wilson
@ 2020-05-18 15:48           ` Chris Wilson
  0 siblings, 0 replies; 32+ messages in thread
From: Chris Wilson @ 2020-05-18 15:48 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2020-05-18 16:40:15)
> Quoting Tvrtko Ursulin (2020-05-18 15:55:46)
> > 
> > On 18/05/2020 14:00, Chris Wilson wrote:
> > > Quoting Tvrtko Ursulin (2020-05-18 13:53:29)
> > >>
> > >> On 18/05/2020 09:14, Chris Wilson wrote:
> > >>> Once a virtual engine has been bound to a sibling, it will remain bound
> > >>> until we finally schedule out the last active request. We can not rebind
> > >>> the context to a new sibling while it is inflight as the context save
> > >>> will conflict, hence we wait. As we cannot then use any other sibliing
> > >>> while the context is inflight, only kick the bound sibling while it
> > >>> inflight and upon scheduling out the kick the rest (so that we can swap
> > >>> engines on timeslicing if the previously bound engine becomes
> > >>> oversubscribed).
> > >>>
> > >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > >>> ---
> > >>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 30 +++++++++++++----------------
> > >>>    1 file changed, 13 insertions(+), 17 deletions(-)
> > >>>
> > >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > >>> index 7a5ac3375225..fe8f3518d6b8 100644
> > >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > >>> @@ -1398,9 +1398,8 @@ execlists_schedule_in(struct i915_request *rq, int idx)
> > >>>    static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
> > >>>    {
> > >>>        struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> > >>> -     struct i915_request *next = READ_ONCE(ve->request);
> > >>>    
> > >>> -     if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
> > >>> +     if (READ_ONCE(ve->request))
> > >>>                tasklet_hi_schedule(&ve->base.execlists.tasklet);
> > >>>    }
> > >>>    
> > >>> @@ -1821,18 +1820,14 @@ first_virtual_engine(struct intel_engine_cs *engine)
> > >>>                        rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > >>>                struct i915_request *rq = READ_ONCE(ve->request);
> > >>>    
> > >>> -             if (!rq) { /* lazily cleanup after another engine handled rq */
> > >>> +             /* lazily cleanup after another engine handled rq */
> > >>> +             if (!rq || !virtual_matches(ve, rq, engine)) {
> > >>>                        rb_erase_cached(rb, &el->virtual);
> > >>>                        RB_CLEAR_NODE(rb);
> > >>>                        rb = rb_first_cached(&el->virtual);
> > >>>                        continue;
> > >>>                }
> > >>>    
> > >>> -             if (!virtual_matches(ve, rq, engine)) {
> > >>> -                     rb = rb_next(rb);
> > >>> -                     continue;
> > >>> -             }
> > >>> -
> > >>>                return ve;
> > >>>        }
> > >>>    
> > >>> @@ -5478,7 +5473,6 @@ static void virtual_submission_tasklet(unsigned long data)
> > >>>        if (unlikely(!mask))
> > >>>                return;
> > >>>    
> > >>> -     local_irq_disable();
> > >>>        for (n = 0; n < ve->num_siblings; n++) {
> > >>>                struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
> > >>>                struct ve_node * const node = &ve->nodes[sibling->id];
> > >>> @@ -5488,20 +5482,19 @@ static void virtual_submission_tasklet(unsigned long data)
> > >>>                if (!READ_ONCE(ve->request))
> > >>>                        break; /* already handled by a sibling's tasklet */
> > >>>    
> > >>> +             spin_lock_irq(&sibling->active.lock);
> > >>> +
> > >>>                if (unlikely(!(mask & sibling->mask))) {
> > >>>                        if (!RB_EMPTY_NODE(&node->rb)) {
> > >>> -                             spin_lock(&sibling->active.lock);
> > >>>                                rb_erase_cached(&node->rb,
> > >>>                                                &sibling->execlists.virtual);
> > >>>                                RB_CLEAR_NODE(&node->rb);
> > >>> -                             spin_unlock(&sibling->active.lock);
> > >>>                        }
> > >>> -                     continue;
> > >>> -             }
> > >>>    
> > >>> -             spin_lock(&sibling->active.lock);
> > >>> +                     goto unlock_engine;
> > >>> +             }
> > >>>    
> > >>> -             if (!RB_EMPTY_NODE(&node->rb)) {
> > >>> +             if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
> > >>>                        /*
> > >>>                         * Cheat and avoid rebalancing the tree if we can
> > >>>                         * reuse this node in situ.
> > >>> @@ -5541,9 +5534,12 @@ static void virtual_submission_tasklet(unsigned long data)
> > >>>                if (first && prio >= sibling->execlists.queue_priority_hint)
> > >>>                        tasklet_hi_schedule(&sibling->execlists.tasklet);
> > >>>    
> > >>> -             spin_unlock(&sibling->active.lock);
> > >>> +unlock_engine:
> > >>> +             spin_unlock_irq(&sibling->active.lock);
> > >>> +
> > >>> +             if (intel_context_inflight(&ve->context))
> > >>> +                     break;
> > >>
> > >> So virtual request may not be added to all siblings any longer. Will it
> > >> still be able to schedule it on any if time slicing kicks in under these
> > >> conditions?
> > > 
> > > Yes.
> > >   
> > >> This is equivalent to the hunk in first_virtual_engine which also
> > >> removes it from all other siblings.
> > >>
> > >> I guess it's inline with what the commit messages says - that new
> > >> sibling will be picked upon time slicing. I just don't quite see the
> > >> path which would do it. Only path which shuffles the siblings array
> > >> around is in dequeue, and dequeue on other that the engine which first
> > >> picked it will not happen any more. I must be missing something..
> > > 
> > > It's all on the execlists_schedule_out. During timeslicing we call
> > > unwind_incomplete_requests which moves the requests back to the priotree
> > > (and in this patch back to the virtual engine).
> > > 
> > > But... We cannot use the virtual request on any other engine until it has
> > > been scheduled out. That only happens after we complete execlists_dequeue()
> > > and submit a new ELSP[]. Once the HW acks the change, we call
> > > execlists_schedule_out on the virtual_request.
> > > 
> > > Now we known that intel_context_inflight() will return false so any
> > > engine can pick up the request, and so it's time to kick the virtual
> > > tasklet and in turn kick all the siblings.
> > > 
> > > So timeslicing works by not submitting the virtual request again and
> > > when it expires on this sibling[0], we wake up all the other siblings
> > > and the first that is idle wins the race.
> > 
> > If a virtual request is on hw and timeslice expires:
> > 
> > 1. Unwinds the request.
> >        -> kicks the virtual tasklet
> > 2. Virtual tasklet runs and puts the request back on siblings.
> >        -> kicks the physical tasklets
> > 3. Siblings tasklet runs and submits the request.
> > 
> > So two tasklets latency even if no other runnable requests?
> 
> Yes. It's worse than that when you look at it. Since the other
> execlists_submission_tasklet will run *before* we schedule out, they do
> nothing as they cannot use the virtual request. So they will only
> function when we kick them again after the schedule-out.
> 
> This was "bug" number 3 I mentioned you found with your tiny snippet.
> In the next patch we optimise away the ineffective timeslice preemptions
> of the virtual request. The inter-engine inefficiencies are inherent to
> the system though -- we can only schedule the other engines once we have
> scheduled out after the context save of the virtual engine.

It's worth mentioning that while glaringly inefficient, the rescheduling
is vital for fast virtual engine migration. Some of my attempts to avoid
the extra kicks, ended up skipping the rescheduling (e.g. keeping a pair
of virtual engines in a single ELSP and timeslicing between that pair)
and that leads to poor wsim results.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2)
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (10 preceding siblings ...)
  2020-05-18 12:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-05-18 15:55 ` Patchwork
  2020-05-18 15:57 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
  2020-05-18 16:28 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  13 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2020-05-18 15:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2)
URL   : https://patchwork.freedesktop.org/series/77344/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
ebeb414a1e8b drm/i915: Move saturated workload detection back to the context
-:22: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#22: 
References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")

-:22: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")'
#22: 
References: 44d89409a12e ("drm/i915: Make the semaphore saturation mask global")

total: 1 errors, 1 warnings, 0 checks, 68 lines checked
c342f572622c drm/i915/selftests: Add tests for timeslicing virtual engines
e52d02f21091 drm/i915/gt: Reuse the tasklet priority for virtual as their siblings
5738cd8b2a42 drm/i915/gt: Kick virtual siblings on timeslice out
003a59b1d9d0 drm/i915/gt: Incorporate the virtual engine into timeslicing
d9417a9630ea drm/i915/gt: Use virtual_engine during execlists_dequeue
040e56db9cd6 drm/i915/gt: Decouple inflight virtual engines
cb577ac5632e drm/i915/gt: Resubmit the virtual engine on schedule-out

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2)
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (11 preceding siblings ...)
  2020-05-18 15:55 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2) Patchwork
@ 2020-05-18 15:57 ` Patchwork
  2020-05-18 16:28 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork
  13 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2020-05-18 15:57 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2)
URL   : https://patchwork.freedesktop.org/series/77344/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.0
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/display/intel_display.c:1222:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1225:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1228:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/display/intel_display.c:1231:22: error: Expected constant expression in case statement
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2274:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2275:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2276:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2277:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2278:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gem/i915_gem_context.c:2279:17: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/intel_reset.c:1310:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/sysfs_engines.c:61:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:62:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gt/sysfs_engines.c:66:10: error: bad integer constant expression
+drivers/gpu/drm/i915/gvt/mmio.c:287:23: warning: memcpy with byte count of 279040
+drivers/gpu/drm/i915/i915_perf.c:1425:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1479:15: warning: memset with byte count of 16777216
+./include/linux/compiler.h:199:9: warning: context imbalance in 'engines_sample' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:408:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [Intel-gfx] ✗ Fi.CI.BAT: failure for series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2)
  2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
                   ` (12 preceding siblings ...)
  2020-05-18 15:57 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-05-18 16:28 ` Patchwork
  13 siblings, 0 replies; 32+ messages in thread
From: Patchwork @ 2020-05-18 16:28 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2)
URL   : https://patchwork.freedesktop.org/series/77344/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_8494 -> Patchwork_17693
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_17693 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_17693, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/index.html

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_17693:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_exec_gttfill@basic:
    - fi-apl-guc:         [PASS][1] -> [INCOMPLETE][2]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-apl-guc/igt@gem_exec_gttfill@basic.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-apl-guc/igt@gem_exec_gttfill@basic.html

  * igt@i915_selftest@live@execlists:
    - fi-cfl-8109u:       [PASS][3] -> [INCOMPLETE][4]
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-cfl-8109u/igt@i915_selftest@live@execlists.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-cfl-8109u/igt@i915_selftest@live@execlists.html
    - fi-icl-u2:          [PASS][5] -> [INCOMPLETE][6]
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-icl-u2/igt@i915_selftest@live@execlists.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-icl-u2/igt@i915_selftest@live@execlists.html
    - fi-tgl-y:           [PASS][7] -> [INCOMPLETE][8]
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-tgl-y/igt@i915_selftest@live@execlists.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-tgl-y/igt@i915_selftest@live@execlists.html
    - fi-icl-y:           [PASS][9] -> [INCOMPLETE][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-icl-y/igt@i915_selftest@live@execlists.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-icl-y/igt@i915_selftest@live@execlists.html
    - fi-icl-guc:         [PASS][11] -> [INCOMPLETE][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-icl-guc/igt@i915_selftest@live@execlists.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-icl-guc/igt@i915_selftest@live@execlists.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live@execlists:
    - {fi-tgl-u}:         [PASS][13] -> [INCOMPLETE][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-tgl-u/igt@i915_selftest@live@execlists.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-tgl-u/igt@i915_selftest@live@execlists.html
    - {fi-tgl-dsi}:       [PASS][15] -> [INCOMPLETE][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-tgl-dsi/igt@i915_selftest@live@execlists.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-tgl-dsi/igt@i915_selftest@live@execlists.html

  
Known issues
------------

  Here are the changes found in Patchwork_17693 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live@client:
    - fi-bwr-2160:        [PASS][17] -> [INCOMPLETE][18] ([i915#489])
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-bwr-2160/igt@i915_selftest@live@client.html
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-bwr-2160/igt@i915_selftest@live@client.html

  * igt@i915_selftest@live@execlists:
    - fi-skl-lmem:        [PASS][19] -> [INCOMPLETE][20] ([i915#1795])
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-skl-lmem/igt@i915_selftest@live@execlists.html
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-skl-lmem/igt@i915_selftest@live@execlists.html

  
#### Possible fixes ####

  * igt@i915_selftest@live@execlists:
    - fi-whl-u:           [INCOMPLETE][21] ([i915#656]) -> [PASS][22]
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-whl-u/igt@i915_selftest@live@execlists.html
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-whl-u/igt@i915_selftest@live@execlists.html

  
#### Warnings ####

  * igt@i915_pm_rpm@module-reload:
    - fi-kbl-x1275:       [SKIP][23] ([fdo#109271]) -> [FAIL][24] ([i915#62] / [i915#95])
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_8494/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/fi-kbl-x1275/igt@i915_pm_rpm@module-reload.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [i915#1795]: https://gitlab.freedesktop.org/drm/intel/issues/1795
  [i915#489]: https://gitlab.freedesktop.org/drm/intel/issues/489
  [i915#62]: https://gitlab.freedesktop.org/drm/intel/issues/62
  [i915#656]: https://gitlab.freedesktop.org/drm/intel/issues/656
  [i915#95]: https://gitlab.freedesktop.org/drm/intel/issues/95


Participating hosts (51 -> 44)
------------------------------

  Missing    (7): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-bsw-cyan fi-ctg-p8600 fi-byt-clapper fi-bdw-samus 


Build changes
-------------

  * Linux: CI_DRM_8494 -> Patchwork_17693

  CI-20190529: 20190529
  CI_DRM_8494: 3d15348fde9b998e754da0b0655baf02b98e7f17 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5657: 649eae5c905a7460b44305800f95db83a6dd47cb @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_17693: cb577ac5632e0f911fc50ec127facad46debf396 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

cb577ac5632e drm/i915/gt: Resubmit the virtual engine on schedule-out
040e56db9cd6 drm/i915/gt: Decouple inflight virtual engines
d9417a9630ea drm/i915/gt: Use virtual_engine during execlists_dequeue
003a59b1d9d0 drm/i915/gt: Incorporate the virtual engine into timeslicing
5738cd8b2a42 drm/i915/gt: Kick virtual siblings on timeslice out
e52d02f21091 drm/i915/gt: Reuse the tasklet priority for virtual as their siblings
c342f572622c drm/i915/selftests: Add tests for timeslicing virtual engines
ebeb414a1e8b drm/i915: Move saturated workload detection back to the context

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_17693/index.html
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2020-05-18 16:28 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-18  8:14 [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Chris Wilson
2020-05-18  8:14 ` [Intel-gfx] [PATCH 2/8] drm/i915/selftests: Add tests for timeslicing virtual engines Chris Wilson
2020-05-18 10:12   ` Tvrtko Ursulin
2020-05-18 10:21     ` Chris Wilson
2020-05-18  8:14 ` [Intel-gfx] [PATCH 3/8] drm/i915/gt: Reuse the tasklet priority for virtual as their siblings Chris Wilson
2020-05-18 10:13   ` Tvrtko Ursulin
2020-05-18  8:14 ` [Intel-gfx] [PATCH 4/8] drm/i915/gt: Kick virtual siblings on timeslice out Chris Wilson
2020-05-18 10:29   ` Tvrtko Ursulin
2020-05-18  8:14 ` [Intel-gfx] [PATCH 5/8] drm/i915/gt: Incorporate the virtual engine into timeslicing Chris Wilson
2020-05-18 10:36   ` Tvrtko Ursulin
2020-05-18 10:38     ` Chris Wilson
2020-05-18  8:14 ` [Intel-gfx] [PATCH 6/8] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
2020-05-18 10:51   ` Tvrtko Ursulin
2020-05-18 10:57     ` Chris Wilson
2020-05-18 12:33   ` [Intel-gfx] [PATCH v2] " Chris Wilson
2020-05-18 13:01     ` Tvrtko Ursulin
2020-05-18 13:09       ` Chris Wilson
2020-05-18  8:14 ` [Intel-gfx] [PATCH 7/8] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
2020-05-18 12:53   ` Tvrtko Ursulin
2020-05-18 13:00     ` Chris Wilson
2020-05-18 14:55       ` Tvrtko Ursulin
2020-05-18 15:40         ` Chris Wilson
2020-05-18 15:48           ` Chris Wilson
2020-05-18  8:14 ` [Intel-gfx] [PATCH 8/8] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
2020-05-18  9:53 ` [Intel-gfx] [PATCH 1/8] drm/i915: Move saturated workload detection back to the context Tvrtko Ursulin
2020-05-18 10:11   ` Chris Wilson
2020-05-18 11:55 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] " Patchwork
2020-05-18 11:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-05-18 12:17 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-05-18 15:55 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [1/8] drm/i915: Move saturated workload detection back to the context (rev2) Patchwork
2020-05-18 15:57 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-05-18 16:28 ` [Intel-gfx] ✗ Fi.CI.BAT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.