[Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks

All of lore.kernel.org
 help / color / mirror / Atom feed

* [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks
@ 2020-11-17 11:30 Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together Chris Wilson
                   ` (30 more replies)
  0 siblings, 31 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Allow us to validate mocs configurations after reset if we have either
engine or global reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_mocs.c | 40 +++++++++++++------------
 1 file changed, 21 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_mocs.c b/drivers/gpu/drm/i915/gt/selftest_mocs.c
index b25eba50c88e..21dcd91cbd62 100644
--- a/drivers/gpu/drm/i915/gt/selftest_mocs.c
+++ b/drivers/gpu/drm/i915/gt/selftest_mocs.c
@@ -361,29 +361,34 @@ static int active_engine_reset(struct intel_context *ce,
 static int __live_mocs_reset(struct live_mocs *mocs,
 			     struct intel_context *ce)
 {
+	struct intel_gt *gt = ce->engine->gt;
 	int err;
 
-	err = intel_engine_reset(ce->engine, "mocs");
-	if (err)
-		return err;
+	if (intel_has_reset_engine(gt)) {
+		err = intel_engine_reset(ce->engine, "mocs");
+		if (err)
+			return err;
 
-	err = check_mocs_engine(mocs, ce);
-	if (err)
-		return err;
+		err = check_mocs_engine(mocs, ce);
+		if (err)
+			return err;
 
-	err = active_engine_reset(ce, "mocs");
-	if (err)
-		return err;
+		err = active_engine_reset(ce, "mocs");
+		if (err)
+			return err;
 
-	err = check_mocs_engine(mocs, ce);
-	if (err)
-		return err;
+		err = check_mocs_engine(mocs, ce);
+		if (err)
+			return err;
+	}
 
-	intel_gt_reset(ce->engine->gt, ce->engine->mask, "mocs");
+	if (intel_has_gpu_reset(gt)) {
+		intel_gt_reset(gt, ce->engine->mask, "mocs");
 
-	err = check_mocs_engine(mocs, ce);
-	if (err)
-		return err;
+		err = check_mocs_engine(mocs, ce);
+		if (err)
+			return err;
+	}
 
 	return 0;
 }
@@ -398,9 +403,6 @@ static int live_mocs_reset(void *arg)
 
 	/* Check the mocs setup is retained over per-engine and global resets */
 
-	if (!intel_has_reset_engine(gt))
-		return 0;
-
 	err = live_mocs_init(&mocs, gt);
 	if (err)
 		return err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 03/28] drm/i915/gem: Drop free_work for GEM contexts Chris Wilson
                   ` (29 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If we run out of ring space, or exceed the desired runtime, we wish to
stop the subtest. Put these checks together, so that we always keep the
requests flushed on completion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/selftest_timeline.c | 12 +++++-------
 1 file changed, 5 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/selftest_timeline.c b/drivers/gpu/drm/i915/gt/selftest_timeline.c
index 2edf2b15885f..ef7e3ce8c60c 100644
--- a/drivers/gpu/drm/i915/gt/selftest_timeline.c
+++ b/drivers/gpu/drm/i915/gt/selftest_timeline.c
@@ -1090,12 +1090,6 @@ static int live_hwsp_read(void *arg)
 			}
 			count++;
 
-			if (8 * watcher[1].rq->ring->emit >
-			    3 * watcher[1].rq->ring->size) {
-				i915_request_put(rq);
-				break;
-			}
-
 			/* Flush the timeline before manually wrapping again */
 			if (i915_request_wait(rq,
 					      I915_WAIT_INTERRUPTIBLE,
@@ -1104,9 +1098,13 @@ static int live_hwsp_read(void *arg)
 				i915_request_put(rq);
 				goto out;
 			}
-
 			retire_requests(tl);
 			i915_request_put(rq);
+
+			if (8 * watcher[1].rq->ring->emit >
+			    3 * watcher[1].rq->ring->size)
+				break;
+
 		} while (!__igt_timeout(end_time, NULL));
 		WRITE_ONCE(*(u32 *)tl->hwsp_seqno, 0xdeadbeef);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 03/28] drm/i915/gem: Drop free_work for GEM contexts
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows Chris Wilson
                   ` (28 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The free_list and worker was introduced in commit 5f09a9c8ab6b ("drm/i915:
Allow contexts to be unreferenced locklessly"), but subsequently made
redundant by the removal of the last sleeping lock in commit 2935ed5339c4
("drm/i915: Remove logical HW ID"). As we can now free the GEM context
immediately from any context, remove the deferral of the free_list

v2: Lift removing the context from the global list into close().

Suggested-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gem/i915_gem_context.c   | 59 +++----------------
 drivers/gpu/drm/i915/gem/i915_gem_context.h   |  1 -
 .../gpu/drm/i915/gem/i915_gem_context_types.h |  1 -
 drivers/gpu/drm/i915/i915_drv.h               |  3 -
 drivers/gpu/drm/i915/i915_gem.c               |  2 -
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 -
 6 files changed, 8 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.c b/drivers/gpu/drm/i915/gem/i915_gem_context.c
index 4fd38101bb56..a0819c80e78f 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.c
@@ -333,13 +333,12 @@ static struct i915_gem_engines *default_engines(struct i915_gem_context *ctx)
 	return e;
 }
 
-static void i915_gem_context_free(struct i915_gem_context *ctx)
+void i915_gem_context_release(struct kref *ref)
 {
-	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
+	struct i915_gem_context *ctx = container_of(ref, typeof(*ctx), ref);
 
-	spin_lock(&ctx->i915->gem.contexts.lock);
-	list_del(&ctx->link);
-	spin_unlock(&ctx->i915->gem.contexts.lock);
+	trace_i915_context_free(ctx);
+	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
 	mutex_destroy(&ctx->engines_mutex);
 	mutex_destroy(&ctx->lut_mutex);
@@ -353,37 +352,6 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	kfree_rcu(ctx, rcu);
 }
 
-static void contexts_free_all(struct llist_node *list)
-{
-	struct i915_gem_context *ctx, *cn;
-
-	llist_for_each_entry_safe(ctx, cn, list, free_link)
-		i915_gem_context_free(ctx);
-}
-
-static void contexts_flush_free(struct i915_gem_contexts *gc)
-{
-	contexts_free_all(llist_del_all(&gc->free_list));
-}
-
-static void contexts_free_worker(struct work_struct *work)
-{
-	struct i915_gem_contexts *gc =
-		container_of(work, typeof(*gc), free_work);
-
-	contexts_flush_free(gc);
-}
-
-void i915_gem_context_release(struct kref *ref)
-{
-	struct i915_gem_context *ctx = container_of(ref, typeof(*ctx), ref);
-	struct i915_gem_contexts *gc = &ctx->i915->gem.contexts;
-
-	trace_i915_context_free(ctx);
-	if (llist_add(&ctx->free_link, &gc->free_list))
-		schedule_work(&gc->free_work);
-}
-
 static inline struct i915_gem_engines *
 __context_engines_static(const struct i915_gem_context *ctx)
 {
@@ -632,6 +600,10 @@ static void context_close(struct i915_gem_context *ctx)
 	 */
 	lut_close(ctx);
 
+	spin_lock(&ctx->i915->gem.contexts.lock);
+	list_del(&ctx->link);
+	spin_unlock(&ctx->i915->gem.contexts.lock);
+
 	mutex_unlock(&ctx->mutex);
 
 	/*
@@ -849,9 +821,6 @@ i915_gem_create_context(struct drm_i915_private *i915, unsigned int flags)
 	    !HAS_EXECLISTS(i915))
 		return ERR_PTR(-EINVAL);
 
-	/* Reap the stale contexts */
-	contexts_flush_free(&i915->gem.contexts);
-
 	ctx = __create_context(i915);
 	if (IS_ERR(ctx))
 		return ctx;
@@ -896,9 +865,6 @@ static void init_contexts(struct i915_gem_contexts *gc)
 {
 	spin_lock_init(&gc->lock);
 	INIT_LIST_HEAD(&gc->list);
-
-	INIT_WORK(&gc->free_work, contexts_free_worker);
-	init_llist_head(&gc->free_list);
 }
 
 void i915_gem_init__contexts(struct drm_i915_private *i915)
@@ -909,12 +875,6 @@ void i915_gem_init__contexts(struct drm_i915_private *i915)
 		"logical" : "fake");
 }
 
-void i915_gem_driver_release__contexts(struct drm_i915_private *i915)
-{
-	flush_work(&i915->gem.contexts.free_work);
-	rcu_barrier(); /* and flush the left over RCU frees */
-}
-
 static int gem_context_register(struct i915_gem_context *ctx,
 				struct drm_i915_file_private *fpriv,
 				u32 *id)
@@ -988,7 +948,6 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 void i915_gem_context_close(struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_private *i915 = file_priv->dev_priv;
 	struct i915_address_space *vm;
 	struct i915_gem_context *ctx;
 	unsigned long idx;
@@ -1000,8 +959,6 @@ void i915_gem_context_close(struct drm_file *file)
 	xa_for_each(&file_priv->vm_xa, idx, vm)
 		i915_vm_put(vm);
 	xa_destroy(&file_priv->vm_xa);
-
-	contexts_flush_free(&i915->gem.contexts);
 }
 
 int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context.h b/drivers/gpu/drm/i915/gem/i915_gem_context.h
index a133f92bbedb..b5c908f3f4f2 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context.h
@@ -110,7 +110,6 @@ i915_gem_context_clear_user_engines(struct i915_gem_context *ctx)
 
 /* i915_gem_context.c */
 void i915_gem_init__contexts(struct drm_i915_private *i915);
-void i915_gem_driver_release__contexts(struct drm_i915_private *i915);
 
 int i915_gem_context_open(struct drm_i915_private *i915,
 			  struct drm_file *file);
diff --git a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
index ae14ca24a11f..1449f54924e0 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/gem/i915_gem_context_types.h
@@ -108,7 +108,6 @@ struct i915_gem_context {
 
 	/** link: place with &drm_i915_private.context_list */
 	struct list_head link;
-	struct llist_node free_link;
 
 	/**
 	 * @ref: reference count
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 15be8debae54..1a2a1276dd31 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1172,9 +1172,6 @@ struct drm_i915_private {
 		struct i915_gem_contexts {
 			spinlock_t lock; /* locks list */
 			struct list_head list;
-
-			struct llist_head free_list;
-			struct work_struct free_work;
 		} contexts;
 
 		/*
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 58276694c848..17a4636ee542 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1207,8 +1207,6 @@ void i915_gem_driver_remove(struct drm_i915_private *dev_priv)
 
 void i915_gem_driver_release(struct drm_i915_private *dev_priv)
 {
-	i915_gem_driver_release__contexts(dev_priv);
-
 	intel_gt_driver_release(&dev_priv->gt);
 
 	intel_wa_list_free(&dev_priv->gt_wa_list);
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index e946bd2087d8..0188f877cab2 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -64,8 +64,6 @@ static void mock_device_release(struct drm_device *dev)
 	mock_device_flush(i915);
 	intel_gt_driver_remove(&i915->gt);
 
-	i915_gem_driver_release__contexts(i915);
-
 	i915_gem_drain_workqueue(i915);
 	i915_gem_drain_freed_objects(i915);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 03/28] drm/i915/gem: Drop free_work for GEM contexts Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:42   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time Chris Wilson
                   ` (27 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The presumption was that some time would always elapse between recording
the start and the finish of a context switch. This turns out to be a
regular occurrence and emitting a debug statement superfluous.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 8a51c1c3a091..52b84474f93a 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1307,7 +1307,7 @@ static void reset_active(struct i915_request *rq,
 static void st_update_runtime_underflow(struct intel_context *ce, s32 dt)
 {
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
-	ce->runtime.num_underflow += dt < 0;
+	ce->runtime.num_underflow++;
 	ce->runtime.max_underflow = max_t(u32, ce->runtime.max_underflow, -dt);
 #endif
 }
@@ -1324,7 +1324,7 @@ static void intel_context_update_runtime(struct intel_context *ce)
 	ce->runtime.last = intel_context_get_runtime(ce);
 	dt = ce->runtime.last - old;
 
-	if (unlikely(dt <= 0)) {
+	if (unlikely(dt < 0)) {
 		CE_TRACE(ce, "runtime underflow: last=%u, new=%u, delta=%d\n",
 			 old, ce->runtime.last, dt);
 		st_update_runtime_underflow(ce, dt);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (2 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 12:44   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Include semaphore status in print_request() Chris Wilson
                   ` (26 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since we wake the GT up before executing a request, and go to sleep as
soon as it is retired, the GT wake time not only represents how long the
device is powered up, but also provides a summary, albeit an overestimate,
of the device runtime.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_gt_pm.c    | 45 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_gt_pm.h    |  2 ++
 drivers/gpu/drm/i915/gt/intel_gt_types.h | 24 +++++++++++++
 drivers/gpu/drm/i915/i915_debugfs.c      |  2 ++
 drivers/gpu/drm/i915/i915_pmu.c          |  6 ++++
 include/uapi/drm/i915_drm.h              |  3 +-
 6 files changed, 81 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
index 274aa0dd7050..dd2f88bed65a 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
@@ -39,6 +39,24 @@ static void user_forcewake(struct intel_gt *gt, bool suspend)
 	intel_gt_pm_put(gt);
 }
 
+static void runtime_begin(struct intel_gt *gt)
+{
+	write_seqcount_begin(&gt->stats.lock);
+	gt->stats.start = ktime_get();
+	gt->stats.active = true;
+	write_seqcount_end(&gt->stats.lock);
+}
+
+static void runtime_end(struct intel_gt *gt)
+{
+	write_seqcount_begin(&gt->stats.lock);
+	gt->stats.active = false;
+	gt->stats.total =
+		ktime_add(gt->stats.total,
+			  ktime_sub(ktime_get(), gt->stats.start));
+	write_seqcount_end(&gt->stats.lock);
+}
+
 static int __gt_unpark(struct intel_wakeref *wf)
 {
 	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
@@ -67,6 +85,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
 	i915_pmu_gt_unparked(i915);
 
 	intel_gt_unpark_requests(gt);
+	runtime_begin(gt);
 
 	return 0;
 }
@@ -79,6 +98,7 @@ static int __gt_park(struct intel_wakeref *wf)
 
 	GT_TRACE(gt, "\n");
 
+	runtime_end(gt);
 	intel_gt_park_requests(gt);
 
 	i915_vma_parked(gt);
@@ -106,6 +126,7 @@ static const struct intel_wakeref_ops wf_ops = {
 void intel_gt_pm_init_early(struct intel_gt *gt)
 {
 	intel_wakeref_init(&gt->wakeref, gt->uncore->rpm, &wf_ops);
+	seqcount_mutex_init(&gt->stats.lock, &gt->wakeref.mutex);
 }
 
 void intel_gt_pm_init(struct intel_gt *gt)
@@ -339,6 +360,30 @@ int intel_gt_runtime_resume(struct intel_gt *gt)
 	return intel_uc_runtime_resume(&gt->uc);
 }
 
+static ktime_t __intel_gt_get_busy_time(const struct intel_gt *gt)
+{
+	ktime_t total = gt->stats.total;
+
+	if (gt->stats.active)
+		total = ktime_add(total,
+				  ktime_sub(ktime_get(), gt->stats.start));
+
+	return total;
+}
+
+ktime_t intel_gt_get_busy_time(const struct intel_gt *gt)
+{
+	unsigned int seq;
+	ktime_t total;
+
+	do {
+		seq = read_seqcount_begin(&gt->stats.lock);
+		total = __intel_gt_get_busy_time(gt);
+	} while (read_seqcount_retry(&gt->stats.lock, seq));
+
+	return total;
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftest_gt_pm.c"
 #endif
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
index 60f0e2fbe55c..aa8f2cda946b 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
@@ -58,6 +58,8 @@ int intel_gt_resume(struct intel_gt *gt);
 void intel_gt_runtime_suspend(struct intel_gt *gt);
 int intel_gt_runtime_resume(struct intel_gt *gt);
 
+ktime_t intel_gt_get_busy_time(const struct intel_gt *gt);
+
 static inline bool is_mock_gt(const struct intel_gt *gt)
 {
 	return I915_SELFTEST_ONLY(gt->awake == -ENODEV);
diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
index 6d39a4a11bf3..c7bde529feab 100644
--- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
@@ -87,6 +87,30 @@ struct intel_gt {
 
 	u32 pm_guc_events;
 
+	struct {
+		bool active;
+
+		/**
+		 * @lock: Lock protecting the below fields.
+		 */
+		seqcount_mutex_t lock;
+
+		/**
+		 * @total: Total time this engine was busy.
+		 *
+		 * Accumulated time not counting the most recent block in cases
+		 * where engine is currently busy (active > 0).
+		 */
+		ktime_t total;
+
+		/**
+		 * @start: Timestamp of the last idle to active transition.
+		 *
+		 * Idle is defined as active == 0, active is active > 0.
+		 */
+		ktime_t start;
+	} stats;
+
 	struct intel_engine_cs *engine[I915_NUM_ENGINES];
 	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
 					    [MAX_ENGINE_INSTANCE + 1];
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 77e76b665098..337293c7bb7d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1316,6 +1316,8 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 	seq_printf(m, "GT awake? %s [%d]\n",
 		   yesno(dev_priv->gt.awake),
 		   atomic_read(&dev_priv->gt.wakeref.count));
+	seq_printf(m, "GT busy: %llu ms\n",
+		   ktime_to_ms(intel_gt_get_busy_time(&dev_priv->gt)));
 	seq_printf(m, "CS timestamp frequency: %u Hz\n",
 		   RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_hz);
 
diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index cd786ad12be7..36fc60cf5725 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -488,6 +488,8 @@ config_status(struct drm_i915_private *i915, u64 config)
 		if (!HAS_RC6(i915))
 			return -ENODEV;
 		break;
+	case I915_PMU_BUSY_TIME:
+		break;
 	default:
 		return -ENOENT;
 	}
@@ -595,6 +597,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
 		case I915_PMU_RC6_RESIDENCY:
 			val = get_rc6(&i915->gt);
 			break;
+		case I915_PMU_BUSY_TIME:
+			val = ktime_to_ns(intel_gt_get_busy_time(&i915->gt));
+			break;
 		}
 	}
 
@@ -898,6 +903,7 @@ create_event_attributes(struct i915_pmu *pmu)
 		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
 		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
 		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
+		__event(I915_PMU_BUSY_TIME, "busy-time", "ns"),
 	};
 	static const struct {
 		enum drm_i915_pmu_engine_sample sample;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index fa1f3d62f9a6..b66b7c1fd564 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -177,8 +177,9 @@ enum drm_i915_pmu_engine_sample {
 #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
 #define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(2)
 #define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(3)
+#define I915_PMU_BUSY_TIME		__I915_PMU_OTHER(4)
 
-#define I915_PMU_LAST I915_PMU_RC6_RESIDENCY
+#define I915_PMU_LAST I915_PMU_BUSY_TIME
 
 /* Each region is a minimum of 16k, and there are at most 255 of them.
  */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 06/28] drm/i915/gt: Include semaphore status in print_request()
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (3 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show() Chris Wilson
                   ` (25 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

When pretty-printing the requests for debug, also show the status of any
semaphore waits as part of its runnable status.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 0b31670343f5..1ed84ee8ce41 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1321,6 +1321,7 @@ static void print_request(struct drm_printer *m,
 		   rq->fence.context, rq->fence.seqno,
 		   i915_request_completed(rq) ? "!" :
 		   i915_request_started(rq) ? "*" :
+		   !i915_sw_fence_signaled(&rq->semaphore) ? "&" :
 		   "",
 		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
 			    &rq->fence.flags) ? "+" :
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show()
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (4 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Include semaphore status in print_request() Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 12:51   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging Chris Wilson
                   ` (24 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Extract i915_request_show for reuse in other request chain pretty
printers.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c | 47 ++---------------------
 drivers/gpu/drm/i915/gt/intel_lrc.c       |  2 +-
 drivers/gpu/drm/i915/gt/intel_lrc.h       |  2 +-
 drivers/gpu/drm/i915/i915_request.c       | 39 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_request.h       |  5 +++
 5 files changed, 50 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index 1ed84ee8ce41..c3bb2e9546e6 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1294,45 +1294,6 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
 	}
 }
 
-static int print_sched_attr(const struct i915_sched_attr *attr,
-			    char *buf, int x, int len)
-{
-	if (attr->priority == I915_PRIORITY_INVALID)
-		return x;
-
-	x += snprintf(buf + x, len - x,
-		      " prio=%d", attr->priority);
-
-	return x;
-}
-
-static void print_request(struct drm_printer *m,
-			  struct i915_request *rq,
-			  const char *prefix)
-{
-	const char *name = rq->fence.ops->get_timeline_name(&rq->fence);
-	char buf[80] = "";
-	int x = 0;
-
-	x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf));
-
-	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",
-		   prefix,
-		   rq->fence.context, rq->fence.seqno,
-		   i915_request_completed(rq) ? "!" :
-		   i915_request_started(rq) ? "*" :
-		   !i915_sw_fence_signaled(&rq->semaphore) ? "&" :
-		   "",
-		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
-			    &rq->fence.flags) ? "+" :
-		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
-			    &rq->fence.flags) ? "-" :
-		   "",
-		   buf,
-		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
-		   name);
-}
-
 static struct intel_timeline *get_timeline(struct i915_request *rq)
 {
 	struct intel_timeline *tl;
@@ -1530,7 +1491,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 					intel_context_is_banned(rq->context) ? "*" : "");
 			len += print_ring(hdr + len, sizeof(hdr) - len, rq);
 			scnprintf(hdr + len, sizeof(hdr) - len, "rq: ");
-			print_request(m, rq, hdr);
+			i915_request_show(m, rq, hdr);
 		}
 		for (port = execlists->pending; (rq = *port); port++) {
 			char hdr[160];
@@ -1544,7 +1505,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
 					intel_context_is_banned(rq->context) ? "*" : "");
 			len += print_ring(hdr + len, sizeof(hdr) - len, rq);
 			scnprintf(hdr + len, sizeof(hdr) - len, "rq: ");
-			print_request(m, rq, hdr);
+			i915_request_show(m, rq, hdr);
 		}
 		rcu_read_unlock();
 		execlists_active_unlock_bh(execlists);
@@ -1688,7 +1649,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (rq) {
 		struct intel_timeline *tl = get_timeline(rq);
 
-		print_request(m, rq, "\t\tactive ");
+		i915_request_show(m, rq, "\t\tactive ");
 
 		drm_printf(m, "\t\tring->start:  0x%08x\n",
 			   i915_ggtt_offset(rq->ring->vma));
@@ -1726,7 +1687,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 		drm_printf(m, "\tDevice is asleep; skipping register dump\n");
 	}
 
-	intel_execlists_show_requests(engine, m, print_request, 8);
+	intel_execlists_show_requests(engine, m, i915_request_show, 8);
 
 	drm_printf(m, "HWSP:\n");
 	hexdump(m, engine->status_page.addr, PAGE_SIZE);
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 52b84474f93a..a4b8c20d12a9 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -5980,7 +5980,7 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
-							struct i915_request *rq,
+							const struct i915_request *rq,
 							const char *prefix),
 				   unsigned int max)
 {
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
index c2d287f25497..32e6e204f544 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.h
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
@@ -106,7 +106,7 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
-							struct i915_request *rq,
+							const struct i915_request *rq,
 							const char *prefix),
 				   unsigned int max);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0e813819b041..cebe07a85625 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1855,6 +1855,45 @@ long i915_request_wait(struct i915_request *rq,
 	return timeout;
 }
 
+static int print_sched_attr(const struct i915_sched_attr *attr,
+			    char *buf, int x, int len)
+{
+	if (attr->priority == I915_PRIORITY_INVALID)
+		return x;
+
+	x += snprintf(buf + x, len - x,
+		      " prio=%d", attr->priority);
+
+	return x;
+}
+
+void i915_request_show(struct drm_printer *m,
+		       const struct i915_request *rq,
+		       const char *prefix)
+{
+	const char *name = rq->fence.ops->get_timeline_name((struct dma_fence *)&rq->fence);
+	char buf[80] = "";
+	int x = 0;
+
+	x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf));
+
+	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",
+		   prefix,
+		   rq->fence.context, rq->fence.seqno,
+		   i915_request_completed(rq) ? "!" :
+		   i915_request_started(rq) ? "*" :
+		   !i915_sw_fence_signaled(&rq->semaphore) ? "&" :
+		   "",
+		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
+			    &rq->fence.flags) ? "+" :
+		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
+			    &rq->fence.flags) ? "-" :
+		   "",
+		   buf,
+		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
+		   name);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 16b721080195..09609071b725 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -43,6 +43,7 @@
 
 struct drm_file;
 struct drm_i915_gem_object;
+struct drm_printer;
 struct i915_request;
 
 struct i915_capture_list {
@@ -369,6 +370,10 @@ long i915_request_wait(struct i915_request *rq,
 #define I915_WAIT_PRIORITY	BIT(1) /* small priority bump for the request */
 #define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
 
+void i915_request_show(struct drm_printer *m,
+		       const struct i915_request *rq,
+		       const char *prefix);
+
 static inline bool i915_request_signaled(const struct i915_request *rq)
 {
 	/* The request may live longer than its HWSP, so check flags first! */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (5 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show() Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 12:59   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
                   ` (23 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Include the active timelines for debugfs/i915_engine_info, so that we
can see which have unready requests inflight which are not shown
otherwise.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_timeline.c | 79 ++++++++++++++++++++++++
 drivers/gpu/drm/i915/gt/intel_timeline.h |  8 +++
 drivers/gpu/drm/i915/i915_debugfs.c      | 18 +++---
 3 files changed, 97 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
index 7ea94d201fe6..2b4ed4b2b67c 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.c
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
@@ -617,6 +617,85 @@ void intel_gt_fini_timelines(struct intel_gt *gt)
 	GEM_BUG_ON(!list_empty(&timelines->hwsp_free_list));
 }
 
+void intel_gt_show_timelines(struct intel_gt *gt,
+			     struct drm_printer *m,
+			     void (*show_request)(struct drm_printer *m,
+						  const struct i915_request *rq,
+						  const char *prefix))
+{
+	struct intel_gt_timelines *timelines = &gt->timelines;
+	struct intel_timeline *tl, *tn;
+	LIST_HEAD(free);
+
+	spin_lock(&timelines->lock);
+	list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
+		unsigned long count, ready, inflight;
+		struct i915_request *rq, *rn;
+		struct dma_fence *fence;
+
+		if (!mutex_trylock(&tl->mutex))
+			continue;
+
+		intel_timeline_get(tl);
+		GEM_BUG_ON(!atomic_read(&tl->active_count));
+		atomic_inc(&tl->active_count); /* pin the list element */
+		spin_unlock(&timelines->lock);
+
+		count = 0;
+		ready = 0;
+		inflight = 0;
+		list_for_each_entry_safe(rq, rn, &tl->requests, link) {
+			if (i915_request_completed(rq))
+				continue;
+
+			count++;
+			if (i915_request_is_ready(rq))
+				ready++;
+			if (i915_request_is_active(rq))
+				inflight++;
+		}
+
+		drm_printf(m, "Timeline %llx: { ", tl->fence_context);
+		drm_printf(m, "count %lu, ready: %lu, inflight: %lu",
+			   count, ready, inflight);
+		drm_printf(m, ", seqno: { current: %d, last: %d }",
+			   *tl->hwsp_seqno, tl->seqno);
+		fence = i915_active_fence_get(&tl->last_request);
+		if (fence) {
+			drm_printf(m, ", engine: %s",
+				   to_request(fence)->engine->name);
+			dma_fence_put(fence);
+		}
+		drm_printf(m, " }\n");
+
+		if (show_request) {
+			list_for_each_entry_safe(rq, rn, &tl->requests, link)
+				show_request(m, rq,
+					     i915_request_is_active(rq) ? "  E" :
+					     i915_request_is_ready(rq) ? "  Q" :
+					     "  U");
+		}
+
+		mutex_unlock(&tl->mutex);
+		spin_lock(&timelines->lock);
+
+		/* Resume list iteration after reacquiring spinlock */
+		list_safe_reset_next(tl, tn, link);
+		if (atomic_dec_and_test(&tl->active_count))
+			list_del(&tl->link);
+
+		/* Defer the final release to after the spinlock */
+		if (refcount_dec_and_test(&tl->kref.refcount)) {
+			GEM_BUG_ON(atomic_read(&tl->active_count));
+			list_add(&tl->link, &free);
+		}
+	}
+	spin_unlock(&timelines->lock);
+
+	list_for_each_entry_safe(tl, tn, &free, link)
+		__intel_timeline_free(&tl->kref);
+}
+
 #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
 #include "gt/selftests/mock_timeline.c"
 #include "gt/selftest_timeline.c"
diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h
index 9882cd911d8e..9b88f220be2b 100644
--- a/drivers/gpu/drm/i915/gt/intel_timeline.h
+++ b/drivers/gpu/drm/i915/gt/intel_timeline.h
@@ -31,6 +31,8 @@
 #include "i915_syncmap.h"
 #include "intel_timeline_types.h"
 
+struct drm_printer;
+
 struct intel_timeline *
 __intel_timeline_create(struct intel_gt *gt,
 			struct i915_vma *global_hwsp,
@@ -106,4 +108,10 @@ int intel_timeline_read_hwsp(struct i915_request *from,
 void intel_gt_init_timelines(struct intel_gt *gt);
 void intel_gt_fini_timelines(struct intel_gt *gt);
 
+void intel_gt_show_timelines(struct intel_gt *gt,
+			     struct drm_printer *m,
+			     void (*show_request)(struct drm_printer *m,
+						  const struct i915_request *rq,
+						  const char *prefix));
+
 #endif
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 337293c7bb7d..498c82dcc7e9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1306,26 +1306,28 @@ static int i915_runtime_pm_status(struct seq_file *m, void *unused)
 
 static int i915_engine_info(struct seq_file *m, void *unused)
 {
-	struct drm_i915_private *dev_priv = node_to_i915(m->private);
+	struct drm_i915_private *i915 = node_to_i915(m->private);
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
 	struct drm_printer p;
 
-	wakeref = intel_runtime_pm_get(&dev_priv->runtime_pm);
+	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
 
 	seq_printf(m, "GT awake? %s [%d]\n",
-		   yesno(dev_priv->gt.awake),
-		   atomic_read(&dev_priv->gt.wakeref.count));
+		   yesno(i915->gt.awake),
+		   atomic_read(&i915->gt.wakeref.count));
 	seq_printf(m, "GT busy: %llu ms\n",
-		   ktime_to_ms(intel_gt_get_busy_time(&dev_priv->gt)));
+		   ktime_to_ms(intel_gt_get_busy_time(&i915->gt)));
 	seq_printf(m, "CS timestamp frequency: %u Hz\n",
-		   RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_hz);
+		   RUNTIME_INFO(i915)->cs_timestamp_frequency_hz);
 
 	p = drm_seq_file_printer(m);
-	for_each_uabi_engine(engine, dev_priv)
+	for_each_uabi_engine(engine, i915)
 		intel_engine_dump(engine, &p, "%s\n", engine->name);
 
-	intel_runtime_pm_put(&dev_priv->runtime_pm, wakeref);
+	intel_gt_show_timelines(&i915->gt, &p, NULL);
+
+	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
 
 	return 0;
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (6 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 13:00   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug Chris Wilson
                   ` (22 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Lift the list iteration defines for traversing the signaler/waiter lists
into i915_scheduler.h for reuse.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 10 ----------
 drivers/gpu/drm/i915/i915_scheduler_types.h | 10 ++++++++++
 2 files changed, 10 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a4b8c20d12a9..17cb7060eb29 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1836,16 +1836,6 @@ static void virtual_xfer_context(struct virtual_engine *ve,
 	}
 }
 
-#define for_each_waiter(p__, rq__) \
-	list_for_each_entry_lockless(p__, \
-				     &(rq__)->sched.waiters_list, \
-				     wait_link)
-
-#define for_each_signaler(p__, rq__) \
-	list_for_each_entry_rcu(p__, \
-				&(rq__)->sched.signalers_list, \
-				signal_link)
-
 static void defer_request(struct i915_request *rq, struct list_head * const pl)
 {
 	LIST_HEAD(list);
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
index f72e6c397b08..343ed44d5ed4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler_types.h
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -81,4 +81,14 @@ struct i915_dependency {
 #define I915_DEPENDENCY_WEAK		BIT(2)
 };
 
+#define for_each_waiter(p__, rq__) \
+	list_for_each_entry_lockless(p__, \
+				     &(rq__)->sched.waiters_list, \
+				     wait_link)
+
+#define for_each_signaler(p__, rq__) \
+	list_for_each_entry_rcu(p__, \
+				&(rq__)->sched.signalers_list, \
+				signal_link)
+
 #endif /* _I915_SCHEDULER_TYPES_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (7 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 13:06   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 11/28] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Chris Wilson
                   ` (21 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Include the signalers each request in the timeline is waiting on, as a
means to try and identify the cause of a stall. This can be quite
verbose, even as for now we only show each request in the timeline and
its immediate antecedents.

This generates output like:

Timeline 886: { count 1, ready: 0, inflight: 0, seqno: { current: 664, last: 666 }, engine: rcs0 }
  U 886:29a-  prio=0 @ 134ms: gem_exec_parall<4621>
  - U bc1:27a-  prio=0 @ 134ms: gem_exec_parall[4917]
Timeline 825: { count 1, ready: 0, inflight: 0, seqno: { current: 802, last: 804 }, engine: vcs0 }
  U 825:324  prio=0 @ 107ms: gem_exec_parall<4518>
  - U b75:140-  prio=0 @ 110ms: gem_exec_parall<5486>
Timeline b46: { count 1, ready: 0, inflight: 0, seqno: { current: 782, last: 784 }, engine: vcs0 }
  U b46:310-  prio=0 @ 70ms: gem_exec_parall<5428>
  - U c11:170-  prio=0 @ 70ms: gem_exec_parall[5501]
Timeline 96b: { count 1, ready: 0, inflight: 0, seqno: { current: 632, last: 634 }, engine: vcs0 }
  U 96b:27a-  prio=0 @ 67ms: gem_exec_parall<4878>
  - U b75:19e-  prio=0 @ 67ms: gem_exec_parall<5486>

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c   |  3 ++-
 drivers/gpu/drm/i915/i915_scheduler.c | 31 +++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  6 ++++++
 3 files changed, 39 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 498c82dcc7e9..f6e71119891f 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -45,6 +45,7 @@
 #include "i915_debugfs.h"
 #include "i915_debugfs_params.h"
 #include "i915_irq.h"
+#include "i915_scheduler.h"
 #include "i915_trace.h"
 #include "intel_pm.h"
 #include "intel_sideband.h"
@@ -1325,7 +1326,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
 	for_each_uabi_engine(engine, i915)
 		intel_engine_dump(engine, &p, "%s\n", engine->name);
 
-	intel_gt_show_timelines(&i915->gt, &p, NULL);
+	intel_gt_show_timelines(&i915->gt, &p, i915_request_show_with_schedule);
 
 	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index cbb880b10c65..8837ba672933 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -504,6 +504,37 @@ void i915_sched_node_fini(struct i915_sched_node *node)
 	spin_unlock_irq(&schedule_lock);
 }
 
+void i915_request_show_with_schedule(struct drm_printer *m,
+				     const struct i915_request *rq,
+				     const char *prefix)
+{
+	struct i915_dependency *dep;
+
+	i915_request_show(m, rq, prefix);
+	if (i915_request_completed(rq))
+		return;
+
+	rcu_read_lock();
+	for_each_signaler(dep, rq) {
+		const struct i915_request *signaler =
+			node_to_request(dep->signaler);
+
+		/* Dependencies along the same timeline are expected. */
+		if (signaler->timeline == rq->timeline)
+			continue;
+
+		if (i915_request_completed(signaler))
+			continue;
+
+		/* XXX ideally build indent into prefix */
+		i915_request_show(m, signaler,
+				  i915_request_is_active(signaler) ? "  - E" :
+				  i915_request_is_ready(signaler) ? "  - Q" :
+				  "  - U");
+	}
+	rcu_read_unlock();
+}
+
 static void i915_global_scheduler_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 6f0bf00fc569..34cee9a17801 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -13,6 +13,8 @@
 
 #include "i915_scheduler_types.h"
 
+struct drm_printer;
+
 #define priolist_for_each_request(it, plist, idx) \
 	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
 		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
@@ -54,4 +56,8 @@ static inline void i915_priolist_free(struct i915_priolist *p)
 		__i915_priolist_free(p);
 }
 
+void i915_request_show_with_schedule(struct drm_printer *m,
+				     const struct i915_request *rq,
+				     const char *prefix);
+
 #endif /* _I915_SCHEDULER_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 11/28] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (8 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 12/28] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
                   ` (20 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Move the register slow register write and readback from out of the
critical path for execlists submission and delay it until the following
worker, shaving off around 200us. Note that the same signal_irq_work() is
allowed to run concurrently on each CPU (but it will only be queued once,
once running though it can be requeued and reexecuted) so we have to
remember to lock the global interactions as we cannot rely on the
signal_irq_work() itself providing the serialisation (in constrast to a
tasklet).

By pushing the arm/disarm into the central signaling worker we can close
the race for disarming the interrupt (and dropping its associated
GT wakeref) on parking the engine. If we loose the race, that GT wakeref
may be held indefinitely, preventing the machine from sleeping while
the GPU is ostensibly idle.

v2: Move the self-arming parking of the signal_irq_work to a flush of
the irq-work from intel_breadcrumbs_park().

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2271
Fixes: dfeba1ae34c8 ("drm/i915/gt: Hold context/request reference while breadcrumbs are active")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 109 +++++++++++++-------
 1 file changed, 70 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index d8b206e53660..8d85683314e1 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -30,18 +30,21 @@
 #include "i915_trace.h"
 #include "intel_breadcrumbs.h"
 #include "intel_context.h"
+#include "intel_engine_pm.h"
 #include "intel_gt_pm.h"
 #include "intel_gt_requests.h"
 
-static void irq_enable(struct intel_engine_cs *engine)
+static bool irq_enable(struct intel_engine_cs *engine)
 {
 	if (!engine->irq_enable)
-		return;
+		return false;
 
 	/* Caller disables interrupts */
 	spin_lock(&engine->gt->irq_lock);
 	engine->irq_enable(engine);
 	spin_unlock(&engine->gt->irq_lock);
+
+	return true;
 }
 
 static void irq_disable(struct intel_engine_cs *engine)
@@ -57,12 +60,11 @@ static void irq_disable(struct intel_engine_cs *engine)
 
 static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
-	lockdep_assert_held(&b->irq_lock);
-
-	if (!b->irq_engine || b->irq_armed)
-		return;
-
-	if (!intel_gt_pm_get_if_awake(b->irq_engine->gt))
+	/*
+	 * Since we are waiting on a request, the GPU should be busy
+	 * and should have its own rpm reference.
+	 */
+	if (GEM_WARN_ON(!intel_gt_pm_get_if_awake(b->irq_engine->gt)))
 		return;
 
 	/*
@@ -73,25 +75,24 @@ static void __intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 	 */
 	WRITE_ONCE(b->irq_armed, true);
 
-	/*
-	 * Since we are waiting on a request, the GPU should be busy
-	 * and should have its own rpm reference. This is tracked
-	 * by i915->gt.awake, we can forgo holding our own wakref
-	 * for the interrupt as before i915->gt.awake is released (when
-	 * the driver is idle) we disarm the breadcrumbs.
-	 */
-
-	if (!b->irq_enabled++)
-		irq_enable(b->irq_engine);
+	/* Requests may have completed before we could enable the interrupt. */
+	if (!b->irq_enabled++ && irq_enable(b->irq_engine))
+		irq_work_queue(&b->irq_work);
 }
 
-static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
+static void intel_breadcrumbs_arm_irq(struct intel_breadcrumbs *b)
 {
-	lockdep_assert_held(&b->irq_lock);
-
-	if (!b->irq_engine || !b->irq_armed)
+	if (!b->irq_engine)
 		return;
 
+	spin_lock(&b->irq_lock);
+	if (!b->irq_armed)
+		__intel_breadcrumbs_arm_irq(b);
+	spin_unlock(&b->irq_lock);
+}
+
+static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
+{
 	GEM_BUG_ON(!b->irq_enabled);
 	if (!--b->irq_enabled)
 		irq_disable(b->irq_engine);
@@ -105,8 +106,6 @@ static void add_signaling_context(struct intel_breadcrumbs *b,
 {
 	intel_context_get(ce);
 	list_add_tail(&ce->signal_link, &b->signalers);
-	if (list_is_first(&ce->signal_link, &b->signalers))
-		__intel_breadcrumbs_arm_irq(b);
 }
 
 static void remove_signaling_context(struct intel_breadcrumbs *b,
@@ -197,7 +196,32 @@ static void signal_irq_work(struct irq_work *work)
 
 	spin_lock(&b->irq_lock);
 
-	if (list_empty(&b->signalers))
+	/*
+	 * Keep the irq armed until the interrupt after all listeners are gone.
+	 *
+	 * Enabling/disabling the interrupt is rather costly, roughly a couple
+	 * of hundred microseconds. If we are proactive and enable/disable
+	 * the interrupt around every request that wants a breadcrumb, we
+	 * quickly drown in the extra orders of magnitude of latency imposed
+	 * on request submission.
+	 *
+	 * So we try to be lazy, and keep the interrupts enabled until no
+	 * more listeners appear within a breadcrumb interrupt interval (that
+	 * is until a request completes that no one cares about). The
+	 * observation is that listeners come in batches, and will often
+	 * listen to a bunch of requests in succession. Though note on icl+,
+	 * interrupts are always enabled due to concerns with rc6 being
+	 * dysfunctional with per-engine interrupt masking.
+	 *
+	 * We also try to avoid raising too many interrupts, as they may
+	 * be generated by userspace batches and it is unfortunately rather
+	 * too easy to drown the CPU under a flood of GPU interrupts. Thus
+	 * whenever no one appears to be listening, we turn off the interrupts.
+	 * Fewer interrupts should conserve power -- at the very least, fewer
+	 * interrupt draw less ire from other users of the system and tools
+	 * like powertop.
+	 */
+	if (b->irq_armed && list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
 	list_splice_init(&b->signaled_requests, &signal);
@@ -251,6 +275,9 @@ static void signal_irq_work(struct irq_work *work)
 
 		i915_request_put(rq);
 	}
+
+	if (!READ_ONCE(b->irq_armed) && !list_empty(&b->signalers))
+		intel_breadcrumbs_arm_irq(b);
 }
 
 struct intel_breadcrumbs *
@@ -292,21 +319,22 @@ void intel_breadcrumbs_reset(struct intel_breadcrumbs *b)
 
 void intel_breadcrumbs_park(struct intel_breadcrumbs *b)
 {
-	unsigned long flags;
-
-	if (!READ_ONCE(b->irq_armed))
-		return;
-
-	spin_lock_irqsave(&b->irq_lock, flags);
-	__intel_breadcrumbs_disarm_irq(b);
-	spin_unlock_irqrestore(&b->irq_lock, flags);
-
-	if (!list_empty(&b->signalers))
-		irq_work_queue(&b->irq_work);
+	/* Kick the work once more to drain the signalers */
+	irq_work_sync(&b->irq_work);
+	while (unlikely(READ_ONCE(b->irq_armed))) {
+		local_irq_disable();
+		signal_irq_work(&b->irq_work);
+		local_irq_enable();
+		cond_resched();
+	}
+	GEM_BUG_ON(!list_empty(&b->signalers));
 }
 
 void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
 {
+	irq_work_sync(&b->irq_work);
+	GEM_BUG_ON(!list_empty(&b->signalers));
+	GEM_BUG_ON(b->irq_armed);
 	kfree(b);
 }
 
@@ -362,9 +390,12 @@ static void insert_breadcrumb(struct i915_request *rq,
 	GEM_BUG_ON(!check_signal_order(ce, rq));
 	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 
-	/* Check after attaching to irq, interrupt may have already fired. */
-	if (__request_completed(rq))
-		irq_work_queue(&b->irq_work);
+	/*
+	 * Defer enabling the interrupt to after HW submission and recheck
+	 * the request as it may have completed and raised the interrupt as
+	 * we were attaching it into the lists.
+	 */
+	irq_work_queue(&b->irq_work);
 }
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 12/28] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (9 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 11/28] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 13/28] drm/i915/gt: Don't cancel the interrupt shadow too early Chris Wilson
                   ` (19 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Make b->signaled_requests a lockless-list so that we can manipulate it
outside of the b->irq_lock.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 34 ++++++++++++-------
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |  2 +-
 drivers/gpu/drm/i915/i915_request.h           |  6 +++-
 3 files changed, 28 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 8d85683314e1..43cfabb102ea 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -173,26 +173,34 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 		intel_engine_add_retire(b->irq_engine, tl);
 }
 
-static bool __signal_request(struct i915_request *rq, struct list_head *signals)
+static bool __signal_request(struct i915_request *rq)
 {
-	clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
-
 	if (!__dma_fence_signal(&rq->fence)) {
 		i915_request_put(rq);
 		return false;
 	}
 
-	list_add_tail(&rq->signal_link, signals);
 	return true;
 }
 
+static struct llist_node *
+slist_add(struct llist_node *node, struct llist_node *head)
+{
+	node->next = head;
+	return node;
+}
+
 static void signal_irq_work(struct irq_work *work)
 {
 	struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
 	const ktime_t timestamp = ktime_get();
+	struct llist_node *signal, *sn;
 	struct intel_context *ce, *cn;
 	struct list_head *pos, *next;
-	LIST_HEAD(signal);
+
+	signal = NULL;
+	if (unlikely(!llist_empty(&b->signaled_requests)))
+		signal = llist_del_all(&b->signaled_requests);
 
 	spin_lock(&b->irq_lock);
 
@@ -224,8 +232,6 @@ static void signal_irq_work(struct irq_work *work)
 	if (b->irq_armed && list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
-	list_splice_init(&b->signaled_requests, &signal);
-
 	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
 		GEM_BUG_ON(list_empty(&ce->signals));
 
@@ -242,7 +248,10 @@ static void signal_irq_work(struct irq_work *work)
 			 * spinlock as the callback chain may end up adding
 			 * more signalers to the same context or engine.
 			 */
-			__signal_request(rq, &signal);
+			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+			if (__signal_request(rq))
+				/* We own signal_node now, xfer to local list */
+				signal = slist_add(&rq->signal_node, signal);
 		}
 
 		/*
@@ -262,9 +271,9 @@ static void signal_irq_work(struct irq_work *work)
 
 	spin_unlock(&b->irq_lock);
 
-	list_for_each_safe(pos, next, &signal) {
+	llist_for_each_safe(signal, sn, signal) {
 		struct i915_request *rq =
-			list_entry(pos, typeof(*rq), signal_link);
+			llist_entry(signal, typeof(*rq), signal_node);
 		struct list_head cb_list;
 
 		spin_lock(&rq->lock);
@@ -291,7 +300,7 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 
 	spin_lock_init(&b->irq_lock);
 	INIT_LIST_HEAD(&b->signalers);
-	INIT_LIST_HEAD(&b->signaled_requests);
+	init_llist_head(&b->signaled_requests);
 
 	init_irq_work(&b->irq_work, signal_irq_work);
 
@@ -355,7 +364,8 @@ static void insert_breadcrumb(struct i915_request *rq,
 	 * its signal completion.
 	 */
 	if (__request_completed(rq)) {
-		if (__signal_request(rq, &b->signaled_requests))
+		if (__signal_request(rq) &&
+		    llist_add(&rq->signal_node, &b->signaled_requests))
 			irq_work_queue(&b->irq_work);
 		return;
 	}
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 8e53b9942695..3fa19820b37a 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -35,7 +35,7 @@ struct intel_breadcrumbs {
 	struct intel_engine_cs *irq_engine;
 
 	struct list_head signalers;
-	struct list_head signaled_requests;
+	struct llist_head signaled_requests;
 
 	struct irq_work irq_work; /* for use from inside irq_lock */
 
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 09609071b725..571a55d38278 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -177,7 +177,11 @@ struct i915_request {
 	struct intel_context *context;
 	struct intel_ring *ring;
 	struct intel_timeline __rcu *timeline;
-	struct list_head signal_link;
+
+	union {
+		struct list_head signal_link;
+		struct llist_node signal_node;
+	};
 
 	/*
 	 * The rcu epoch of when this request was allocated. Used to judiciously
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 13/28] drm/i915/gt: Don't cancel the interrupt shadow too early
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (10 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 12/28] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
                   ` (18 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

We currently want to keep the interrupt enabled until the interrupt after
which we have no more work to do. This heuristic was broken by us
kicking the irq-work on adding a completed request without attaching a
signaler -- hence it appearing to the irq-worker that an interrupt had
fired when we were idle.

Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 43cfabb102ea..cf6e05ea4d8f 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -229,7 +229,7 @@ static void signal_irq_work(struct irq_work *work)
 	 * interrupt draw less ire from other users of the system and tools
 	 * like powertop.
 	 */
-	if (b->irq_armed && list_empty(&b->signalers))
+	if (!signal && b->irq_armed && list_empty(&b->signalers))
 		__intel_breadcrumbs_disarm_irq(b);
 
 	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (11 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 13/28] drm/i915/gt: Don't cancel the interrupt shadow too early Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-18 11:05   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
                   ` (17 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since preempt-to-busy, we may unsubmit a request while it is still on
the HW and completes asynchronously. That means it may be retired and in
the process destroy the virtual engine (as the user has closed their
context), but that engine may still be holding onto the unsubmitted
compelted request. Therefore we need to potentially cleanup the old
request on destroying the virtual engine. We also have to keep the
virtual_engine alive until after the sibling's execlists_dequeue() have
finished peeking into the virtual engines, for which we serialise with
RCU.

v2: Be paranoid and flush the tasklet as well.
v3: And flush the tasklet before the engines, as the tasklet may
re-attach an rb_node after our removal from the siblings.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
 1 file changed, 54 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 17cb7060eb29..c11433884cf6 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -182,6 +182,7 @@
 struct virtual_engine {
 	struct intel_engine_cs base;
 	struct intel_context context;
+	struct rcu_work rcu;
 
 	/*
 	 * We allow only a single request through the virtual engine at a time
@@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
 	return &ve->base.execlists.default_priolist.requests[0];
 }
 
-static void virtual_context_destroy(struct kref *kref)
+static void rcu_virtual_context_destroy(struct work_struct *wrk)
 {
 	struct virtual_engine *ve =
-		container_of(kref, typeof(*ve), context.ref);
+		container_of(wrk, typeof(*ve), rcu.work);
 	unsigned int n;
 
-	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
-	GEM_BUG_ON(ve->request);
 	GEM_BUG_ON(ve->context.inflight);
 
+	/* Preempt-to-busy may leave a stale request behind. */
+	if (unlikely(ve->request)) {
+		struct i915_request *old;
+
+		spin_lock_irq(&ve->base.active.lock);
+
+		old = fetch_and_zero(&ve->request);
+		if (old) {
+			GEM_BUG_ON(!i915_request_completed(old));
+			__i915_request_submit(old);
+			i915_request_put(old);
+		}
+
+		spin_unlock_irq(&ve->base.active.lock);
+	}
+
+	/*
+	 * Flush the tasklet in case it is still running on another core.
+	 *
+	 * This needs to be done before we remove ourselves from the siblings'
+	 * rbtrees as in the case it is running in parallel, it may reinsert
+	 * the rb_node into a sibling.
+	 */
+	tasklet_kill(&ve->base.execlists.tasklet);
+
+	/* Decouple ourselves from the siblings, no more access allowed. */
 	for (n = 0; n < ve->num_siblings; n++) {
 		struct intel_engine_cs *sibling = ve->siblings[n];
 		struct rb_node *node = &ve->nodes[sibling->id].rb;
-		unsigned long flags;
 
 		if (RB_EMPTY_NODE(node))
 			continue;
 
-		spin_lock_irqsave(&sibling->active.lock, flags);
+		spin_lock_irq(&sibling->active.lock);
 
 		/* Detachment is lazily performed in the execlists tasklet */
 		if (!RB_EMPTY_NODE(node))
 			rb_erase_cached(node, &sibling->execlists.virtual);
 
-		spin_unlock_irqrestore(&sibling->active.lock, flags);
+		spin_unlock_irq(&sibling->active.lock);
 	}
 	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
+	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
 
 	if (ve->context.state)
 		__execlists_context_fini(&ve->context);
 	intel_context_fini(&ve->context);
 
 	intel_engine_free_request_pool(&ve->base);
+	intel_breadcrumbs_free(ve->base.breadcrumbs);
 
 	kfree(ve->bonds);
 	kfree(ve);
 }
 
+static void virtual_context_destroy(struct kref *kref)
+{
+	struct virtual_engine *ve =
+		container_of(kref, typeof(*ve), context.ref);
+
+	GEM_BUG_ON(!list_empty(&ve->context.signals));
+
+	/*
+	 * When destroying the virtual engine, we have to be aware that
+	 * it may still be in use from an hardirq/softirq context causing
+	 * the resubmission of a completed request (background completion
+	 * due to preempt-to-busy). Before we can free the engine, we need
+	 * to flush the submission code and tasklets that are still potentially
+	 * accessing the engine. Flushing the tasklets require process context,
+	 * and since we can guard the resubmit onto the engine with an RCU read
+	 * lock, we can delegate the free of the engine to an RCU worker.
+	 */
+	INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy);
+	queue_rcu_work(system_wq, &ve->rcu);
+}
+
 static void virtual_engine_initial_hint(struct virtual_engine *ve)
 {
 	int swp;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (12 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-18 11:36   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
                   ` (16 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Allow a brief period for continued access to a dead intel_context by
deferring the release of the struct until after an RCU grace period.
As we are using a dedicated slab cache for the contexts, we can defer
the release of the slab pages via RCU, with the caveat that individual
structs may be reused from the freelist within an RCU grace period. To
handle that, we have to avoid clearing members of the zombie struct.

This is required for a later patch to handle locking around virtual
requests in the signaler, as those requests may want to move between
engines and be destroyed while we are holding b->irq_lock on a physical
engine.

v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we
don't need to reset the debug code, at the loss of having the mutex
debug code spot us attempting to destroy a locked mutex.
v3: As the intended use will remain strongly referenced counted, with
very little inflight access across reuse, drop the ctor.
v4: Drop the unrequired change to remove the temporary reference around
dropping the active context, and add back some more missing ctor
operations.
v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced
later] maybe accessed under RCU and so needs special care not to be
reinitialised.
v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context.c       | 12 +++++++++---
 drivers/gpu/drm/i915/gt/intel_context_types.h | 11 ++++++++++-
 2 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index 92a3f25c4006..d3a835212167 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -25,11 +25,18 @@ static struct intel_context *intel_context_alloc(void)
 	return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
 }
 
-void intel_context_free(struct intel_context *ce)
+static void rcu_context_free(struct rcu_head *rcu)
 {
+	struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
+
 	kmem_cache_free(global.slab_ce, ce);
 }
 
+void intel_context_free(struct intel_context *ce)
+{
+	call_rcu(&ce->rcu, rcu_context_free);
+}
+
 struct intel_context *
 intel_context_create(struct intel_engine_cs *engine)
 {
@@ -356,8 +363,7 @@ static int __intel_context_active(struct i915_active *active)
 }
 
 void
-intel_context_init(struct intel_context *ce,
-		   struct intel_engine_cs *engine)
+intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 {
 	GEM_BUG_ON(!engine->cops);
 	GEM_BUG_ON(!engine->gt->vm);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 552cb57a2e8c..20cb5835d1c3 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -44,7 +44,16 @@ struct intel_context_ops {
 };
 
 struct intel_context {
-	struct kref ref;
+	/*
+	 * Note: Some fields may be accessed under RCU.
+	 *
+	 * Unless otherwise noted a field can safely be assumed to be protected
+	 * by strong reference counting.
+	 */
+	union {
+		struct kref ref; /* no kref_get_unless_zero()! */
+		struct rcu_head rcu;
+	};
 
 	struct intel_engine_cs *engine;
 	struct intel_engine_cs *inflight;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (13 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-18 11:35   ` Tvrtko Ursulin
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 17/28] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel Chris Wilson
                   ` (15 subsequent siblings)
  30 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

As we funnel more and more contexts into the breadcrumbs on an engine,
the hold time of b->irq_lock grows. As we may then contend with the
b->irq_lock during request submission, this increases the burden upon
the engine->active.lock and so directly impacts both our execution
latency and client latency. If we split the b->irq_lock by introducing a
per-context spinlock to manage the signalers within a context, we then
only need the b->irq_lock for enabling/disabling the interrupt and can
avoid taking the lock for walking the list of contexts within the signal
worker. Even with the current setup, this greatly reduces the number of
times we have to take and fight for b->irq_lock.

Furthermore, this closes the race between enabling the signaling context
while it is in the process of being signaled and removed:

<4>[  416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870).
<4>[  416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
<4>[  416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
<4>[  416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G     U            5.8.0-CI-CI_DRM_8852+ #1
<4>[  416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
<4>[  416.208627] RIP: 0010:__list_add_valid+0x4d/0x70
<4>[  416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8
<4>[  416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086
<4>[  416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105
<4>[  416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff
<4>[  416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001
<4>[  416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8
<4>[  416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910
<4>[  416.208669] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
<4>[  416.208671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
<4>[  416.208675] PKRU: 55555554
<4>[  416.208677] Call Trace:
<4>[  416.208679]  <IRQ>
<4>[  416.208751]  i915_request_enable_breadcrumb+0x278/0x400 [i915]
<4>[  416.208839]  __i915_request_submit+0xca/0x2a0 [i915]
<4>[  416.208892]  __execlists_submission_tasklet+0x480/0x1830 [i915]
<4>[  416.208942]  execlists_submission_tasklet+0xc4/0x130 [i915]
<4>[  416.208947]  tasklet_action_common.isra.17+0x6c/0x1c0
<4>[  416.208954]  __do_softirq+0xdf/0x498
<4>[  416.208960]  ? handle_fasteoi_irq+0x150/0x150
<4>[  416.208964]  asm_call_on_stack+0xf/0x20
<4>[  416.208966]  </IRQ>
<4>[  416.208969]  do_softirq_own_stack+0xa1/0xc0
<4>[  416.208972]  irq_exit_rcu+0xb5/0xc0
<4>[  416.208976]  common_interrupt+0xf7/0x260
<4>[  416.208980]  asm_common_interrupt+0x1e/0x40
<4>[  416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410
<4>[  416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48
<4>[  416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206
<4>[  416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000
<4>[  416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f
<4>[  416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000
<4>[  416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f
<4>[  416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003
<4>[  416.209012]  cpuidle_enter+0x24/0x40
<4>[  416.209016]  do_idle+0x22f/0x2d0
<4>[  416.209022]  cpu_startup_entry+0x14/0x20
<4>[  416.209025]  start_secondary+0x158/0x1a0
<4>[  416.209030]  secondary_startup_64+0xa4/0xb0
<4>[  416.209039] irq event stamp: 10186977
<4>[  416.209042] hardirqs last  enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0
<4>[  416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50
<4>[  416.209047] softirqs last  enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70
<4>[  416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20

<4>[  416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100)
<4>[  416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
<4>[  416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
<4>[  416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G     U  W         5.8.0-CI-CI_DRM_8852+ #1
<4>[  416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
<4>[  416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90
<4>[  416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b
<4>[  416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086
<4>[  416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104
<4>[  416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff
<4>[  416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001
<4>[  416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18
<4>[  416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880
<4>[  416.209317] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
<4>[  416.209317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
<4>[  416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
<4>[  416.209317] PKRU: 55555554
<4>[  416.209317] Call Trace:
<4>[  416.209317]  <IRQ>
<4>[  416.209317]  remove_signaling_context.isra.13+0xd/0x70 [i915]
<4>[  416.209513]  signal_irq_work+0x1f7/0x4b0 [i915]

This is caused by virtual engines where although we take the breadcrumb
lock on each of the active engines, they may be different engines on
different requests, It turns out that the b->irq_lock was not a
sufficient proxy for the engine->active.lock in the case of more than
one request, so introduce an explicit lock around ce->signals.

v2: ce->signal_lock is acquired with only RCU protection and so must be
treated carefully and not cleared during reallocation. We also then need
to confirm that the ce we lock is the same as we found in the breadcrumb
list.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276
Fixes: f94343d0a622 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs")
Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 168 ++++++++----------
 .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |   6 +-
 drivers/gpu/drm/i915/gt/intel_context.c       |   3 +-
 drivers/gpu/drm/i915/gt/intel_context_types.h |  12 +-
 drivers/gpu/drm/i915/i915_request.h           |   6 +-
 5 files changed, 90 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index cf6e05ea4d8f..a24cc1ff08a0 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -101,18 +101,37 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
 	intel_gt_pm_put_async(b->irq_engine->gt);
 }
 
+static void intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
+{
+	spin_lock(&b->irq_lock);
+	if (b->irq_armed)
+		__intel_breadcrumbs_disarm_irq(b);
+	spin_unlock(&b->irq_lock);
+}
+
 static void add_signaling_context(struct intel_breadcrumbs *b,
 				  struct intel_context *ce)
 {
-	intel_context_get(ce);
-	list_add_tail(&ce->signal_link, &b->signalers);
+	lockdep_assert_held(&ce->signal_lock);
+
+	spin_lock(&b->signalers_lock);
+	list_add_rcu(&ce->signal_link, &b->signalers);
+	spin_unlock(&b->signalers_lock);
 }
 
-static void remove_signaling_context(struct intel_breadcrumbs *b,
+static bool remove_signaling_context(struct intel_breadcrumbs *b,
 				     struct intel_context *ce)
 {
-	list_del(&ce->signal_link);
-	intel_context_put(ce);
+	lockdep_assert_held(&ce->signal_lock);
+
+	if (!list_empty(&ce->signals))
+		return false;
+
+	spin_lock(&b->signalers_lock);
+	list_del_rcu(&ce->signal_link);
+	spin_unlock(&b->signalers_lock);
+
+	return true;
 }
 
 static inline bool __request_completed(const struct i915_request *rq)
@@ -175,6 +194,8 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
 
 static bool __signal_request(struct i915_request *rq)
 {
+	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
+
 	if (!__dma_fence_signal(&rq->fence)) {
 		i915_request_put(rq);
 		return false;
@@ -195,15 +216,12 @@ static void signal_irq_work(struct irq_work *work)
 	struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
 	const ktime_t timestamp = ktime_get();
 	struct llist_node *signal, *sn;
-	struct intel_context *ce, *cn;
-	struct list_head *pos, *next;
+	struct intel_context *ce;
 
 	signal = NULL;
 	if (unlikely(!llist_empty(&b->signaled_requests)))
 		signal = llist_del_all(&b->signaled_requests);
 
-	spin_lock(&b->irq_lock);
-
 	/*
 	 * Keep the irq armed until the interrupt after all listeners are gone.
 	 *
@@ -229,47 +247,44 @@ static void signal_irq_work(struct irq_work *work)
 	 * interrupt draw less ire from other users of the system and tools
 	 * like powertop.
 	 */
-	if (!signal && b->irq_armed && list_empty(&b->signalers))
-		__intel_breadcrumbs_disarm_irq(b);
+	if (!signal && READ_ONCE(b->irq_armed) && list_empty(&b->signalers))
+		intel_breadcrumbs_disarm_irq(b);
 
-	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
-		GEM_BUG_ON(list_empty(&ce->signals));
+	rcu_read_lock();
+	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
+		struct i915_request *rq;
 
-		list_for_each_safe(pos, next, &ce->signals) {
-			struct i915_request *rq =
-				list_entry(pos, typeof(*rq), signal_link);
+		list_for_each_entry_rcu(rq, &ce->signals, signal_link) {
+			bool release;
 
-			GEM_BUG_ON(!check_signal_order(ce, rq));
 			if (!__request_completed(rq))
 				break;
 
+			if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL,
+						&rq->fence.flags))
+				break;
+
 			/*
 			 * Queue for execution after dropping the signaling
 			 * spinlock as the callback chain may end up adding
 			 * more signalers to the same context or engine.
 			 */
-			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
+			spin_lock(&ce->signal_lock);
+			list_del_rcu(&rq->signal_link);
+			release = remove_signaling_context(b, ce);
+			spin_unlock(&ce->signal_lock);
+
 			if (__signal_request(rq))
 				/* We own signal_node now, xfer to local list */
 				signal = slist_add(&rq->signal_node, signal);
-		}
 
-		/*
-		 * We process the list deletion in bulk, only using a list_add
-		 * (not list_move) above but keeping the status of
-		 * rq->signal_link known with the I915_FENCE_FLAG_SIGNAL bit.
-		 */
-		if (!list_is_first(pos, &ce->signals)) {
-			/* Advance the list to the first incomplete request */
-			__list_del_many(&ce->signals, pos);
-			if (&ce->signals == pos) { /* now empty */
+			if (release) {
 				add_retire(b, ce->timeline);
-				remove_signaling_context(b, ce);
+				intel_context_put(ce);
 			}
 		}
 	}
-
-	spin_unlock(&b->irq_lock);
+	rcu_read_unlock();
 
 	llist_for_each_safe(signal, sn, signal) {
 		struct i915_request *rq =
@@ -298,14 +313,15 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
 	if (!b)
 		return NULL;
 
-	spin_lock_init(&b->irq_lock);
+	b->irq_engine = irq_engine;
+
+	spin_lock_init(&b->signalers_lock);
 	INIT_LIST_HEAD(&b->signalers);
 	init_llist_head(&b->signaled_requests);
 
+	spin_lock_init(&b->irq_lock);
 	init_irq_work(&b->irq_work, signal_irq_work);
 
-	b->irq_engine = irq_engine;
-
 	return b;
 }
 
@@ -347,9 +363,9 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
 	kfree(b);
 }
 
-static void insert_breadcrumb(struct i915_request *rq,
-			      struct intel_breadcrumbs *b)
+static void insert_breadcrumb(struct i915_request *rq)
 {
+	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
 	struct intel_context *ce = rq->context;
 	struct list_head *pos;
 
@@ -371,6 +387,7 @@ static void insert_breadcrumb(struct i915_request *rq,
 	}
 
 	if (list_empty(&ce->signals)) {
+		intel_context_get(ce);
 		add_signaling_context(b, ce);
 		pos = &ce->signals;
 	} else {
@@ -396,8 +413,9 @@ static void insert_breadcrumb(struct i915_request *rq,
 				break;
 		}
 	}
-	list_add(&rq->signal_link, pos);
+	list_add_rcu(&rq->signal_link, pos);
 	GEM_BUG_ON(!check_signal_order(ce, rq));
+	GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags));
 	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
 
 	/*
@@ -410,7 +428,7 @@ static void insert_breadcrumb(struct i915_request *rq,
 
 bool i915_request_enable_breadcrumb(struct i915_request *rq)
 {
-	struct intel_breadcrumbs *b;
+	struct intel_context *ce = rq->context;
 
 	/* Serialises with i915_request_retire() using rq->lock */
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
@@ -425,67 +443,30 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
 	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
 		return true;
 
-	/*
-	 * rq->engine is locked by rq->engine->active.lock. That however
-	 * is not known until after rq->engine has been dereferenced and
-	 * the lock acquired. Hence we acquire the lock and then validate
-	 * that rq->engine still matches the lock we hold for it.
-	 *
-	 * Here, we are using the breadcrumb lock as a proxy for the
-	 * rq->engine->active.lock, and we know that since the breadcrumb
-	 * will be serialised within i915_request_submit/i915_request_unsubmit,
-	 * the engine cannot change while active as long as we hold the
-	 * breadcrumb lock on that engine.
-	 *
-	 * From the dma_fence_enable_signaling() path, we are outside of the
-	 * request submit/unsubmit path, and so we must be more careful to
-	 * acquire the right lock.
-	 */
-	b = READ_ONCE(rq->engine)->breadcrumbs;
-	spin_lock(&b->irq_lock);
-	while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) {
-		spin_unlock(&b->irq_lock);
-		b = READ_ONCE(rq->engine)->breadcrumbs;
-		spin_lock(&b->irq_lock);
-	}
-
-	/*
-	 * Now that we are finally serialised with request submit/unsubmit,
-	 * [with b->irq_lock] and with i915_request_retire() [via checking
-	 * SIGNALED with rq->lock] confirm the request is indeed active. If
-	 * it is no longer active, the breadcrumb will be attached upon
-	 * i915_request_submit().
-	 */
+	spin_lock(&ce->signal_lock);
 	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
-		insert_breadcrumb(rq, b);
-
-	spin_unlock(&b->irq_lock);
+		insert_breadcrumb(rq);
+	spin_unlock(&ce->signal_lock);
 
 	return true;
 }
 
 void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
-	struct intel_breadcrumbs *b = rq->engine->breadcrumbs;
+	struct intel_context *ce = rq->context;
+	bool release;
 
-	/*
-	 * We must wait for b->irq_lock so that we know the interrupt handler
-	 * has released its reference to the intel_context and has completed
-	 * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if
-	 * required).
-	 */
-	spin_lock(&b->irq_lock);
-	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
-		struct intel_context *ce = rq->context;
+	if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
+		return;
 
-		list_del(&rq->signal_link);
-		if (list_empty(&ce->signals))
-			remove_signaling_context(b, ce);
+	spin_lock(&ce->signal_lock);
+	list_del_rcu(&rq->signal_link);
+	release = remove_signaling_context(rq->engine->breadcrumbs, ce);
+	spin_unlock(&ce->signal_lock);
+	if (release)
+		intel_context_put(ce);
 
-		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
-		i915_request_put(rq);
-	}
-	spin_unlock(&b->irq_lock);
+	i915_request_put(rq);
 }
 
 static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p)
@@ -495,18 +476,17 @@ static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p)
 
 	drm_printf(p, "Signals:\n");
 
-	spin_lock_irq(&b->irq_lock);
-	list_for_each_entry(ce, &b->signalers, signal_link) {
-		list_for_each_entry(rq, &ce->signals, signal_link) {
+	rcu_read_lock();
+	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
+		list_for_each_entry_rcu(rq, &ce->signals, signal_link)
 			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
 				   rq->fence.context, rq->fence.seqno,
 				   i915_request_completed(rq) ? "!" :
 				   i915_request_started(rq) ? "*" :
 				   "",
 				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
-		}
 	}
-	spin_unlock_irq(&b->irq_lock);
+	rcu_read_unlock();
 }
 
 void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index 3fa19820b37a..a74bb3062bd8 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -29,18 +29,16 @@
  * the overhead of waking that client is much preferred.
  */
 struct intel_breadcrumbs {
-	spinlock_t irq_lock; /* protects the lists used in hardirq context */
-
 	/* Not all breadcrumbs are attached to physical HW */
 	struct intel_engine_cs *irq_engine;
 
+	spinlock_t signalers_lock; /* protects the list of signalers */
 	struct list_head signalers;
 	struct llist_head signaled_requests;
 
+	spinlock_t irq_lock; /* protects the interrupt from hardirq context */
 	struct irq_work irq_work; /* for use from inside irq_lock */
-
 	unsigned int irq_enabled;
-
 	bool irq_armed;
 };
 
diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
index d3a835212167..349e7fa1488d 100644
--- a/drivers/gpu/drm/i915/gt/intel_context.c
+++ b/drivers/gpu/drm/i915/gt/intel_context.c
@@ -379,7 +379,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
 
 	ce->vm = i915_vm_get(engine->gt->vm);
 
-	INIT_LIST_HEAD(&ce->signal_link);
+	/* NB ce->signal_link/lock is used under RCU */
+	spin_lock_init(&ce->signal_lock);
 	INIT_LIST_HEAD(&ce->signals);
 
 	mutex_init(&ce->pin_mutex);
diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 20cb5835d1c3..52fa9c132746 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -25,6 +25,7 @@ DECLARE_EWMA(runtime, 3, 8);
 struct i915_gem_context;
 struct i915_gem_ww_ctx;
 struct i915_vma;
+struct intel_breadcrumbs;
 struct intel_context;
 struct intel_ring;
 
@@ -63,8 +64,15 @@ struct intel_context {
 	struct i915_address_space *vm;
 	struct i915_gem_context __rcu *gem_context;
 
-	struct list_head signal_link;
-	struct list_head signals;
+	/*
+	 * @signal_lock protects the list of requests that need signaling,
+	 * @signals. While there are any requests that need signaling,
+	 * we add the context to the breadcrumbs worker, and remove it
+	 * upon completion/cancellation of the last request.
+	 */
+	struct list_head signal_link; /* Accessed under RCU */
+	struct list_head signals; /* Guarded by signal_lock */
+	spinlock_t signal_lock; /* protects signals, the list of requests */
 
 	struct i915_vma *state;
 	struct intel_ring *ring;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 571a55d38278..26a19b84a586 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -178,10 +178,8 @@ struct i915_request {
 	struct intel_ring *ring;
 	struct intel_timeline __rcu *timeline;
 
-	union {
-		struct list_head signal_link;
-		struct llist_node signal_node;
-	};
+	struct list_head signal_link;
+	struct llist_node signal_node;
 
 	/*
 	 * The rcu epoch of when this request was allocated. Used to judiciously
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 17/28] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (14 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 18/28] drm/i915/gt: Decouple completed requests on unwind Chris Wilson
                   ` (14 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

If while we are cancelling the breadcrumb signaling, we find that the
request is already completed, move it to the irq signaler and let it be
signaled.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c | 20 ++++++++++++++++----
 1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index a24cc1ff08a0..f5f6feed0fa6 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -363,6 +363,14 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
 	kfree(b);
 }
 
+static void irq_signal_request(struct i915_request *rq,
+			       struct intel_breadcrumbs *b)
+{
+	if (__signal_request(rq) &&
+	    llist_add(&rq->signal_node, &b->signaled_requests))
+		irq_work_queue(&b->irq_work);
+}
+
 static void insert_breadcrumb(struct i915_request *rq)
 {
 	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
@@ -380,9 +388,7 @@ static void insert_breadcrumb(struct i915_request *rq)
 	 * its signal completion.
 	 */
 	if (__request_completed(rq)) {
-		if (__signal_request(rq) &&
-		    llist_add(&rq->signal_node, &b->signaled_requests))
-			irq_work_queue(&b->irq_work);
+		irq_signal_request(rq, b);
 		return;
 	}
 
@@ -453,6 +459,7 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
 
 void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
+	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
 	struct intel_context *ce = rq->context;
 	bool release;
 
@@ -461,11 +468,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
 
 	spin_lock(&ce->signal_lock);
 	list_del_rcu(&rq->signal_link);
-	release = remove_signaling_context(rq->engine->breadcrumbs, ce);
+	release = remove_signaling_context(b, ce);
 	spin_unlock(&ce->signal_lock);
 	if (release)
 		intel_context_put(ce);
 
+	if (__request_completed(rq)) {
+		irq_signal_request(rq, b);
+		return;
+	}
+
 	i915_request_put(rq);
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 18/28] drm/i915/gt: Decouple completed requests on unwind
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (15 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 17/28] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 19/28] drm/i915/gt: Check for a completed last request once Chris Wilson
                   ` (13 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since the introduction of preempt-to-busy, requests can complete in the
background, even while they are not on the engine->active.requests list.
As such, the engine->active.request list itself is not in strict
retirement order, and we have to scan the entire list while unwinding to
not miss any. However, if the request is completed we currently leave it
on the list [until retirement], but we could just as simply remove it
and stop treating it as active. We would only have to then traverse it
once while unwinding in quick succession.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 6 ++++--
 drivers/gpu/drm/i915/i915_request.c | 3 ++-
 2 files changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index c11433884cf6..0bcbd734a11d 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1116,8 +1116,10 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	list_for_each_entry_safe_reverse(rq, rn,
 					 &engine->active.requests,
 					 sched.link) {
-		if (i915_request_completed(rq))
-			continue; /* XXX */
+		if (i915_request_completed(rq)) {
+			list_del_init(&rq->sched.link);
+			continue;
+		}
 
 		__i915_request_unsubmit(rq);
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index cebe07a85625..c3b7e8a0dae7 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -321,7 +321,8 @@ bool i915_request_retire(struct i915_request *rq)
 	 * after removing the breadcrumb and signaling it, so that we do not
 	 * inadvertently attach the breadcrumb to a completed request.
 	 */
-	remove_from_engine(rq);
+	if (!list_empty(&rq->sched.link))
+		remove_from_engine(rq);
 	GEM_BUG_ON(!llist_empty(&rq->execute_cb));
 
 	__list_del_entry(&rq->link); /* poison neither prev/next (RCU walks) */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 19/28] drm/i915/gt: Check for a completed last request once
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (16 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 18/28] drm/i915/gt: Decouple completed requests on unwind Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 20/28] drm/i915/gt: Replace direct submit with direct call to tasklet Chris Wilson
                   ` (12 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Pull the repeated check for the last active request being completed to a
single spot, when deciding whether or not execlist preemption is
required.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 15 ++++-----------
 1 file changed, 4 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 0bcbd734a11d..7a78ef34ad65 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2141,12 +2141,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 */
 
 	if ((last = *active)) {
-		if (need_preempt(engine, last, rb)) {
-			if (i915_request_completed(last)) {
-				tasklet_hi_schedule(&execlists->tasklet);
-				return;
-			}
-
+		if (i915_request_completed(last)) {
+			goto check_secondary;
+		} else if (need_preempt(engine, last, rb)) {
 			ENGINE_TRACE(engine,
 				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
 				     last->fence.context,
@@ -2174,11 +2171,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			last = NULL;
 		} else if (need_timeslice(engine, last, rb) &&
 			   timeslice_expired(execlists, last)) {
-			if (i915_request_completed(last)) {
-				tasklet_hi_schedule(&execlists->tasklet);
-				return;
-			}
-
 			ENGINE_TRACE(engine,
 				     "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n",
 				     last->fence.context,
@@ -2214,6 +2206,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * we hopefully coalesce several updates into a single
 			 * submission.
 			 */
+check_secondary:
 			if (!list_is_last(&last->sched.link,
 					  &engine->active.requests)) {
 				/*
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 20/28] drm/i915/gt: Replace direct submit with direct call to tasklet
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (17 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 19/28] drm/i915/gt: Check for a completed last request once Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: ce->inflight updates are now serialised Chris Wilson
                   ` (11 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rather than having special case code for opportunistically calling
process_csb() and performing a direct submit while holding the engine
spinlock for submitting the request, simply call the tasklet directly.
This allows us to retain the direct submission path, including the CS
draining to allow fast/immediate submissions, without requiring any
duplicated code paths, and most importantly greatly simplifying the
control flow by removing reentrancy. This will enable us to close a few
races in the virtual engines in the next few patches.

The trickiest part here is to ensure that paired operations (such as
schedule_in/schedule_out) remain under consistent locking domains,
e.g. when pulled outside of the engine->active.lock

v2: Use bh kicking, see commit 3c53776e29f8 ("Mark HI and TASKLET
softirq synchronous").
v3: Update engine-reset to be tasklet aware

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/gt/intel_engine_cs.c     |  35 +++--
 drivers/gpu/drm/i915/gt/intel_engine_pm.c     |   2 +-
 drivers/gpu/drm/i915/gt/intel_engine_types.h  |   3 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 134 +++++++-----------
 drivers/gpu/drm/i915/gt/intel_reset.c         |  60 +++++---
 drivers/gpu/drm/i915/gt/intel_reset.h         |   2 +
 drivers/gpu/drm/i915/gt/selftest_context.c    |   2 +-
 drivers/gpu/drm/i915/gt/selftest_hangcheck.c  |   7 +-
 drivers/gpu/drm/i915/gt/selftest_lrc.c        |  27 ++--
 drivers/gpu/drm/i915/gt/selftest_reset.c      |   8 +-
 drivers/gpu/drm/i915/i915_request.c           |  12 +-
 drivers/gpu/drm/i915/i915_request.h           |   1 +
 drivers/gpu/drm/i915/i915_scheduler.c         |   4 -
 drivers/gpu/drm/i915/selftests/i915_request.c |   6 +-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |   3 +
 15 files changed, 158 insertions(+), 148 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
index c3bb2e9546e6..8ba731694a4f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
@@ -1002,32 +1002,39 @@ static unsigned long stop_timeout(const struct intel_engine_cs *engine)
 	return READ_ONCE(engine->props.stop_timeout_ms);
 }
 
-int intel_engine_stop_cs(struct intel_engine_cs *engine)
+static int __intel_engine_stop_cs(struct intel_engine_cs *engine,
+				  int fast_timeout_us,
+				  int slow_timeout_ms)
 {
 	struct intel_uncore *uncore = engine->uncore;
-	const u32 base = engine->mmio_base;
-	const i915_reg_t mode = RING_MI_MODE(base);
+	const i915_reg_t mode = RING_MI_MODE(engine->mmio_base);
 	int err;
 
+	intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING));
+	err = __intel_wait_for_register_fw(engine->uncore, mode,
+					   MODE_IDLE, MODE_IDLE,
+					   fast_timeout_us,
+					   slow_timeout_ms,
+					   NULL);
+
+	/* A final mmio read to let GPU writes be hopefully flushed to memory */
+	intel_uncore_posting_read_fw(uncore, mode);
+	return err;
+}
+
+int intel_engine_stop_cs(struct intel_engine_cs *engine)
+{
+	int err = 0;
+
 	if (INTEL_GEN(engine->i915) < 3)
 		return -ENODEV;
 
 	ENGINE_TRACE(engine, "\n");
-
-	intel_uncore_write_fw(uncore, mode, _MASKED_BIT_ENABLE(STOP_RING));
-
-	err = 0;
-	if (__intel_wait_for_register_fw(uncore,
-					 mode, MODE_IDLE, MODE_IDLE,
-					 1000, stop_timeout(engine),
-					 NULL)) {
+	if (__intel_engine_stop_cs(engine, 1000, stop_timeout(engine))) {
 		ENGINE_TRACE(engine, "timed out on STOP_RING -> IDLE\n");
 		err = -ETIMEDOUT;
 	}
 
-	/* A final mmio read to let GPU writes be hopefully flushed to memory */
-	intel_uncore_posting_read_fw(uncore, mode);
-
 	return err;
 }
 
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_pm.c b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
index 499b09cb4acf..99574378047f 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_pm.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_pm.c
@@ -136,7 +136,7 @@ __queue_and_release_pm(struct i915_request *rq,
 		list_add_tail(&tl->link, &timelines->active_list);
 
 	/* Hand the request over to HW and so engine_retire() */
-	__i915_request_queue(rq, NULL);
+	__i915_request_queue_bh(rq);
 
 	/* Let new submissions commence (and maybe retire this timeline) */
 	__intel_wakeref_defer_park(&engine->wakeref);
diff --git a/drivers/gpu/drm/i915/gt/intel_engine_types.h b/drivers/gpu/drm/i915/gt/intel_engine_types.h
index ee6312601c56..e71eef157231 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_engine_types.h
@@ -183,7 +183,8 @@ struct intel_engine_execlists {
 	 * Reserve the upper 16b for tracking internal errors.
 	 */
 	u32 error_interrupt;
-#define ERROR_CSB BIT(31)
+#define ERROR_CSB	BIT(31)
+#define ERROR_PREEMPT	BIT(30)
 
 	/**
 	 * @reset_ccid: Active CCID [EXECLISTS_STATUS_HI] at the time of reset
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 7a78ef34ad65..469c347dc71f 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1378,8 +1378,7 @@ __execlists_schedule_in(struct i915_request *rq)
 	return engine;
 }
 
-static inline struct i915_request *
-execlists_schedule_in(struct i915_request *rq, int idx)
+static inline void execlists_schedule_in(struct i915_request *rq, int idx)
 {
 	struct intel_context * const ce = rq->context;
 	struct intel_engine_cs *old;
@@ -1396,7 +1395,6 @@ execlists_schedule_in(struct i915_request *rq, int idx)
 	} while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old)));
 
 	GEM_BUG_ON(intel_context_inflight(ce) != rq->engine);
-	return i915_request_get(rq);
 }
 
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
@@ -2071,8 +2069,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	struct i915_request **port = execlists->pending;
 	struct i915_request ** const last_port = port + execlists->port_mask;
-	struct i915_request * const *active;
-	struct i915_request *last;
+	struct i915_request *last = *execlists->active;
 	struct rb_node *rb;
 	bool submit = false;
 
@@ -2098,6 +2095,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
+	spin_lock(&engine->active.lock);
+
 	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
 		struct virtual_engine *ve =
 			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
@@ -2125,10 +2124,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * the active context to interject the preemption request,
 	 * i.e. we will retrigger preemption following the ack in case
 	 * of trouble.
-	 */
-	active = READ_ONCE(execlists->active);
-
-	/*
+	 *
 	 * In theory we can skip over completed contexts that have not
 	 * yet been processed by events (as those events are in flight):
 	 *
@@ -2140,7 +2136,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * completed and barf.
 	 */
 
-	if ((last = *active)) {
+	if (last) {
 		if (i915_request_completed(last)) {
 			goto check_secondary;
 		} else if (need_preempt(engine, last, rb)) {
@@ -2213,6 +2209,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				 * Even if ELSP[1] is occupied and not worthy
 				 * of timeslices, our queue might be.
 				 */
+				spin_unlock(&engine->active.lock);
 				start_timeslice(engine, queue_prio(execlists));
 				return;
 			}
@@ -2248,6 +2245,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 			if (last && !can_merge_rq(last, rq)) {
 				spin_unlock(&ve->base.active.lock);
+				spin_unlock(&engine->active.lock);
 				start_timeslice(engine, rq_prio(rq));
 				return; /* leave this for another sibling */
 			}
@@ -2365,8 +2363,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 			if (__i915_request_submit(rq)) {
 				if (!merge) {
-					*port = execlists_schedule_in(last, port - execlists->pending);
-					port++;
+					*port++ = i915_request_get(last);
 					last = NULL;
 				}
 
@@ -2385,8 +2382,9 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		rb_erase_cached(&p->node, &execlists->queue);
 		i915_priolist_free(p);
 	}
-
 done:
+	*port++ = i915_request_get(last);
+
 	/*
 	 * Here be a bit of magic! Or sleight-of-hand, whichever you prefer.
 	 *
@@ -2404,36 +2402,43 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * interrupt for secondary ports).
 	 */
 	execlists->queue_priority_hint = queue_prio(execlists);
+	spin_unlock(&engine->active.lock);
 
 	if (submit) {
-		*port = execlists_schedule_in(last, port - execlists->pending);
-		execlists->switch_priority_hint =
-			switch_prio(engine, *execlists->pending);
-
 		/*
 		 * Skip if we ended up with exactly the same set of requests,
 		 * e.g. trying to timeslice a pair of ordered contexts
 		 */
-		if (!memcmp(active, execlists->pending,
-			    (port - execlists->pending + 1) * sizeof(*port))) {
-			do
-				execlists_schedule_out(fetch_and_zero(port));
-			while (port-- != execlists->pending);
-
+		if (!memcmp(execlists->active,
+			    execlists->pending,
+			    (port - execlists->pending) * sizeof(*port)))
 			goto skip_submit;
-		}
-		clear_ports(port + 1, last_port - port);
+
+		*port = NULL;
+		while (port-- != execlists->pending)
+			execlists_schedule_in(*port, port - execlists->pending);
+
+		execlists->switch_priority_hint =
+			switch_prio(engine, *execlists->pending);
 
 		WRITE_ONCE(execlists->yield, -1);
-		set_preempt_timeout(engine, *active);
+		set_preempt_timeout(engine, *execlists->active);
 		execlists_submit_ports(engine);
 	} else {
 		start_timeslice(engine, execlists->queue_priority_hint);
 skip_submit:
 		ring_set_paused(engine, 0);
+		*execlists->pending = NULL;
 	}
 }
 
+static void execlists_dequeue_irq(struct intel_engine_cs *engine)
+{
+	local_irq_disable(); /* Suspend interrupts across request submission */
+	execlists_dequeue(engine);
+	local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
+}
+
 static void
 cancel_port_requests(struct intel_engine_execlists * const execlists)
 {
@@ -2771,16 +2776,6 @@ static void process_csb(struct intel_engine_cs *engine)
 	invalidate_csb_entries(&buf[0], &buf[num_entries - 1]);
 }
 
-static void __execlists_submission_tasklet(struct intel_engine_cs *const engine)
-{
-	lockdep_assert_held(&engine->active.lock);
-	if (!READ_ONCE(engine->execlists.pending[0])) {
-		rcu_read_lock(); /* protect peeking at execlists->active */
-		execlists_dequeue(engine);
-		rcu_read_unlock();
-	}
-}
-
 static void __execlists_hold(struct i915_request *rq)
 {
 	LIST_HEAD(list);
@@ -3169,7 +3164,7 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine)
 	if (!timer_expired(t))
 		return false;
 
-	return READ_ONCE(engine->execlists.pending[0]);
+	return engine->execlists.pending[0];
 }
 
 /*
@@ -3179,10 +3174,12 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine)
 static void execlists_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
-	bool timeout = preempt_timeout(engine);
 
 	process_csb(engine);
 
+	if (unlikely(preempt_timeout(engine)))
+		engine->execlists.error_interrupt |= ERROR_PREEMPT;
+
 	if (unlikely(READ_ONCE(engine->execlists.error_interrupt))) {
 		const char *msg;
 
@@ -3191,6 +3188,8 @@ static void execlists_submission_tasklet(unsigned long data)
 			msg = "CS error"; /* thrown by a user payload */
 		else if (engine->execlists.error_interrupt & ERROR_CSB)
 			msg = "invalid CSB event";
+		else if (engine->execlists.error_interrupt & ERROR_PREEMPT)
+			msg = "preemption time out";
 		else
 			msg = "internal error";
 
@@ -3198,17 +3197,8 @@ static void execlists_submission_tasklet(unsigned long data)
 		execlists_reset(engine, msg);
 	}
 
-	if (!READ_ONCE(engine->execlists.pending[0]) || timeout) {
-		unsigned long flags;
-
-		spin_lock_irqsave(&engine->active.lock, flags);
-		__execlists_submission_tasklet(engine);
-		spin_unlock_irqrestore(&engine->active.lock, flags);
-
-		/* Recheck after serialising with direct-submission */
-		if (unlikely(timeout && preempt_timeout(engine)))
-			execlists_reset(engine, "preemption time out");
-	}
+	if (!engine->execlists.pending[0])
+		execlists_dequeue_irq(engine);
 }
 
 static void __execlists_kick(struct intel_engine_execlists *execlists)
@@ -3239,26 +3229,16 @@ static void queue_request(struct intel_engine_cs *engine,
 	set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 }
 
-static void __submit_queue_imm(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists * const execlists = &engine->execlists;
-
-	if (reset_in_progress(execlists))
-		return; /* defer until we restart the engine following reset */
-
-	__execlists_submission_tasklet(engine);
-}
-
-static void submit_queue(struct intel_engine_cs *engine,
+static bool submit_queue(struct intel_engine_cs *engine,
 			 const struct i915_request *rq)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
 
 	if (rq_prio(rq) <= execlists->queue_priority_hint)
-		return;
+		return false;
 
 	execlists->queue_priority_hint = rq_prio(rq);
-	__submit_queue_imm(engine);
+	return true;
 }
 
 static bool ancestor_on_hold(const struct intel_engine_cs *engine,
@@ -3268,25 +3248,11 @@ static bool ancestor_on_hold(const struct intel_engine_cs *engine,
 	return !list_empty(&engine->active.hold) && hold_request(rq);
 }
 
-static void flush_csb(struct intel_engine_cs *engine)
-{
-	struct intel_engine_execlists *el = &engine->execlists;
-
-	if (READ_ONCE(el->pending[0]) && tasklet_trylock(&el->tasklet)) {
-		if (!reset_in_progress(el))
-			process_csb(engine);
-		tasklet_unlock(&el->tasklet);
-	}
-}
-
 static void execlists_submit_request(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 	unsigned long flags;
 
-	/* Hopefully we clear execlists->pending[] to let us through */
-	flush_csb(engine);
-
 	/* Will be called from irq-context when using foreign fences. */
 	spin_lock_irqsave(&engine->active.lock, flags);
 
@@ -3300,7 +3266,8 @@ static void execlists_submit_request(struct i915_request *request)
 		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 		GEM_BUG_ON(list_empty(&request->sched.link));
 
-		submit_queue(engine, request);
+		if (submit_queue(engine, request))
+			__execlists_kick(&engine->execlists);
 	}
 
 	spin_unlock_irqrestore(&engine->active.lock, flags);
@@ -4214,7 +4181,6 @@ static int execlists_resume(struct intel_engine_cs *engine)
 static void execlists_reset_prepare(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	unsigned long flags;
 
 	ENGINE_TRACE(engine, "depth<-%d\n",
 		     atomic_read(&execlists->tasklet.count));
@@ -4231,10 +4197,6 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
 	__tasklet_disable_sync_once(&execlists->tasklet);
 	GEM_BUG_ON(!reset_in_progress(execlists));
 
-	/* And flush any current direct submission. */
-	spin_lock_irqsave(&engine->active.lock, flags);
-	spin_unlock_irqrestore(&engine->active.lock, flags);
-
 	/*
 	 * We stop engines, otherwise we might get failed reset and a
 	 * dead gpu (on elk). Also as modern gpu as kbl can suffer
@@ -4479,12 +4441,12 @@ static void execlists_reset_finish(struct intel_engine_cs *engine)
 	 * to sleep before we restart and reload a context.
 	 */
 	GEM_BUG_ON(!reset_in_progress(execlists));
-	if (!RB_EMPTY_ROOT(&execlists->queue.rb_root))
-		execlists->tasklet.func(execlists->tasklet.data);
+	GEM_BUG_ON(engine->execlists.pending[0]);
 
+	/* And kick in case we missed a new request submission. */
 	if (__tasklet_enable(&execlists->tasklet))
-		/* And kick in case we missed a new request submission. */
-		tasklet_hi_schedule(&execlists->tasklet);
+		__execlists_kick(execlists);
+
 	ENGINE_TRACE(engine, "depth->%d\n",
 		     atomic_read(&execlists->tasklet.count));
 }
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.c b/drivers/gpu/drm/i915/gt/intel_reset.c
index 3654c955e6be..77997114b551 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.c
+++ b/drivers/gpu/drm/i915/gt/intel_reset.c
@@ -40,20 +40,19 @@ static void rmw_clear_fw(struct intel_uncore *uncore, i915_reg_t reg, u32 clr)
 	intel_uncore_rmw_fw(uncore, reg, clr, 0);
 }
 
-static void engine_skip_context(struct i915_request *rq)
+static void skip_context(struct i915_request *rq)
 {
-	struct intel_engine_cs *engine = rq->engine;
 	struct intel_context *hung_ctx = rq->context;
 
-	if (!i915_request_is_active(rq))
-		return;
+	list_for_each_entry_from_rcu(rq, &hung_ctx->timeline->requests, link) {
+		if (!i915_request_is_active(rq))
+			return;
 
-	lockdep_assert_held(&engine->active.lock);
-	list_for_each_entry_continue(rq, &engine->active.requests, sched.link)
 		if (rq->context == hung_ctx) {
 			i915_request_set_error_once(rq, -EIO);
 			__i915_request_skip(rq);
 		}
+	}
 }
 
 static void client_mark_guilty(struct i915_gem_context *ctx, bool banned)
@@ -160,7 +159,7 @@ void __i915_request_reset(struct i915_request *rq, bool guilty)
 		i915_request_set_error_once(rq, -EIO);
 		__i915_request_skip(rq);
 		if (mark_guilty(rq))
-			engine_skip_context(rq);
+			skip_context(rq);
 	} else {
 		i915_request_set_error_once(rq, -EAGAIN);
 		mark_innocent(rq);
@@ -754,8 +753,10 @@ static int gt_reset(struct intel_gt *gt, intel_engine_mask_t stalled_mask)
 	if (err)
 		return err;
 
+	local_bh_disable();
 	for_each_engine(engine, gt, id)
 		__intel_engine_reset(engine, stalled_mask & engine->mask);
+	local_bh_enable();
 
 	intel_ggtt_restore_fences(gt->ggtt);
 
@@ -833,9 +834,11 @@ static void __intel_gt_set_wedged(struct intel_gt *gt)
 	set_bit(I915_WEDGED, &gt->reset.flags);
 
 	/* Mark all executing requests as skipped */
+	local_bh_disable();
 	for_each_engine(engine, gt, id)
 		if (engine->reset.cancel)
 			engine->reset.cancel(engine);
+	local_bh_enable();
 
 	reset_finish(gt, awake);
 
@@ -1110,20 +1113,7 @@ static inline int intel_gt_reset_engine(struct intel_engine_cs *engine)
 	return __intel_gt_reset(engine->gt, engine->mask);
 }
 
-/**
- * intel_engine_reset - reset GPU engine to recover from a hang
- * @engine: engine to reset
- * @msg: reason for GPU reset; or NULL for no drm_notice()
- *
- * Reset a specific GPU engine. Useful if a hang is detected.
- * Returns zero on successful reset or otherwise an error code.
- *
- * Procedure is:
- *  - identifies the request that caused the hang and it is dropped
- *  - reset engine (which will force the engine to idle)
- *  - re-init/configure engine
- */
-int intel_engine_reset(struct intel_engine_cs *engine, const char *msg)
+int __intel_engine_reset_bh(struct intel_engine_cs *engine, const char *msg)
 {
 	struct intel_gt *gt = engine->gt;
 	bool uses_guc = intel_engine_in_guc_submission_mode(engine);
@@ -1174,6 +1164,30 @@ int intel_engine_reset(struct intel_engine_cs *engine, const char *msg)
 	return ret;
 }
 
+/**
+ * intel_engine_reset - reset GPU engine to recover from a hang
+ * @engine: engine to reset
+ * @msg: reason for GPU reset; or NULL for no drm_notice()
+ *
+ * Reset a specific GPU engine. Useful if a hang is detected.
+ * Returns zero on successful reset or otherwise an error code.
+ *
+ * Procedure is:
+ *  - identifies the request that caused the hang and it is dropped
+ *  - reset engine (which will force the engine to idle)
+ *  - re-init/configure engine
+ */
+int intel_engine_reset(struct intel_engine_cs *engine, const char *msg)
+{
+	int err;
+
+	local_bh_disable();
+	err = __intel_engine_reset_bh(engine, msg);
+	local_bh_enable();
+
+	return err;
+}
+
 static void intel_gt_reset_global(struct intel_gt *gt,
 				  u32 engine_mask,
 				  const char *reason)
@@ -1260,18 +1274,20 @@ void intel_gt_handle_error(struct intel_gt *gt,
 	 * single reset fails.
 	 */
 	if (intel_has_reset_engine(gt) && !intel_gt_is_wedged(gt)) {
+		local_bh_disable();
 		for_each_engine_masked(engine, gt, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
 					     &gt->reset.flags))
 				continue;
 
-			if (intel_engine_reset(engine, msg) == 0)
+			if (__intel_engine_reset_bh(engine, msg) == 0)
 				engine_mask &= ~engine->mask;
 
 			clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id,
 					      &gt->reset.flags);
 		}
+		local_bh_enable();
 	}
 
 	if (!engine_mask)
diff --git a/drivers/gpu/drm/i915/gt/intel_reset.h b/drivers/gpu/drm/i915/gt/intel_reset.h
index a0eec7c11c0c..7dbf5cc8a333 100644
--- a/drivers/gpu/drm/i915/gt/intel_reset.h
+++ b/drivers/gpu/drm/i915/gt/intel_reset.h
@@ -34,6 +34,8 @@ void intel_gt_reset(struct intel_gt *gt,
 		    const char *reason);
 int intel_engine_reset(struct intel_engine_cs *engine,
 		       const char *reason);
+int __intel_engine_reset_bh(struct intel_engine_cs *engine,
+			    const char *reason);
 
 void __i915_request_reset(struct i915_request *rq, bool guilty);
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_context.c b/drivers/gpu/drm/i915/gt/selftest_context.c
index 1f4020e906a8..db738d400168 100644
--- a/drivers/gpu/drm/i915/gt/selftest_context.c
+++ b/drivers/gpu/drm/i915/gt/selftest_context.c
@@ -25,7 +25,7 @@ static int request_sync(struct i915_request *rq)
 	/* Opencode i915_request_add() so we can keep the timeline locked. */
 	__i915_request_commit(rq);
 	rq->sched.attr.priority = I915_PRIORITY_BARRIER;
-	__i915_request_queue(rq, NULL);
+	__i915_request_queue_bh(rq);
 
 	timeout = i915_request_wait(rq, 0, HZ / 10);
 	if (timeout < 0)
diff --git a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
index fb5ebf930ab2..c28d1fcad673 100644
--- a/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
+++ b/drivers/gpu/drm/i915/gt/selftest_hangcheck.c
@@ -1576,12 +1576,17 @@ static int __igt_atomic_reset_engine(struct intel_engine_cs *engine,
 		  engine->name, mode, p->name);
 
 	tasklet_disable(t);
+	if (strcmp(p->name, "softirq"))
+		local_bh_disable();
 	p->critical_section_begin();
 
-	err = intel_engine_reset(engine, NULL);
+	err = __intel_engine_reset_bh(engine, NULL);
 
 	p->critical_section_end();
+	if (strcmp(p->name, "softirq"))
+		local_bh_enable();
 	tasklet_enable(t);
+	tasklet_hi_schedule(t);
 
 	if (err)
 		pr_err("i915_reset_engine(%s:%s) failed under %s\n",
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 95d41c01d0e0..37cb51c3f4f6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -623,8 +623,10 @@ static int live_hold_reset(void *arg)
 
 		/* We have our request executing, now remove it and reset */
 
+		local_bh_disable();
 		if (test_and_set_bit(I915_RESET_ENGINE + id,
 				     &gt->reset.flags)) {
+			local_bh_enable();
 			intel_gt_set_wedged(gt);
 			err = -EBUSY;
 			goto out;
@@ -638,12 +640,13 @@ static int live_hold_reset(void *arg)
 		execlists_hold(engine, rq);
 		GEM_BUG_ON(!i915_request_on_hold(rq));
 
-		intel_engine_reset(engine, NULL);
+		__intel_engine_reset_bh(engine, NULL);
 		GEM_BUG_ON(rq->fence.error != -EIO);
 
 		tasklet_enable(&engine->execlists.tasklet);
 		clear_and_wake_up_bit(I915_RESET_ENGINE + id,
 				      &gt->reset.flags);
+		local_bh_enable();
 
 		/* Check that we do not resubmit the held request */
 		if (!i915_request_wait(rq, 0, HZ / 5)) {
@@ -4569,8 +4572,10 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	GEM_BUG_ON(engine == ve->engine);
 
 	/* Take ownership of the reset and tasklet */
+	local_bh_disable();
 	if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
 			     &gt->reset.flags)) {
+		local_bh_enable();
 		intel_gt_set_wedged(gt);
 		err = -EBUSY;
 		goto out_heartbeat;
@@ -4590,12 +4595,13 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	execlists_hold(engine, rq);
 	GEM_BUG_ON(!i915_request_on_hold(rq));
 
-	intel_engine_reset(engine, NULL);
+	__intel_engine_reset_bh(engine, NULL);
 	GEM_BUG_ON(rq->fence.error != -EIO);
 
 	/* Release our grasp on the engine, letting CS flow again */
 	tasklet_enable(&engine->execlists.tasklet);
 	clear_and_wake_up_bit(I915_RESET_ENGINE + engine->id, &gt->reset.flags);
+	local_bh_enable();
 
 	/* Check that we do not resubmit the held request */
 	i915_request_get(rq);
@@ -6242,16 +6248,17 @@ static void garbage_reset(struct intel_engine_cs *engine,
 	const unsigned int bit = I915_RESET_ENGINE + engine->id;
 	unsigned long *lock = &engine->gt->reset.flags;
 
-	if (test_and_set_bit(bit, lock))
-		return;
-
-	tasklet_disable(&engine->execlists.tasklet);
+	local_bh_disable();
+	if (!test_and_set_bit(bit, lock)) {
+		tasklet_disable(&engine->execlists.tasklet);
 
-	if (!rq->fence.error)
-		intel_engine_reset(engine, NULL);
+		if (!rq->fence.error)
+			__intel_engine_reset_bh(engine, NULL);
 
-	tasklet_enable(&engine->execlists.tasklet);
-	clear_and_wake_up_bit(bit, lock);
+		tasklet_enable(&engine->execlists.tasklet);
+		clear_and_wake_up_bit(bit, lock);
+	}
+	local_bh_enable();
 }
 
 static struct i915_request *garbage(struct intel_context *ce,
diff --git a/drivers/gpu/drm/i915/gt/selftest_reset.c b/drivers/gpu/drm/i915/gt/selftest_reset.c
index ef5aeebbeeb0..4dbd5bc840c3 100644
--- a/drivers/gpu/drm/i915/gt/selftest_reset.c
+++ b/drivers/gpu/drm/i915/gt/selftest_reset.c
@@ -326,11 +326,16 @@ static int igt_atomic_engine_reset(void *arg)
 		for (p = igt_atomic_phases; p->name; p++) {
 			GEM_TRACE("intel_engine_reset(%s) under %s\n",
 				  engine->name, p->name);
+			if (strcmp(p->name, "softirq"))
+				local_bh_disable();
 
 			p->critical_section_begin();
-			err = intel_engine_reset(engine, NULL);
+			err = __intel_engine_reset_bh(engine, NULL);
 			p->critical_section_end();
 
+			if (strcmp(p->name, "softirq"))
+				local_bh_enable();
+
 			if (err) {
 				pr_err("intel_engine_reset(%s) failed under %s\n",
 				       engine->name, p->name);
@@ -340,6 +345,7 @@ static int igt_atomic_engine_reset(void *arg)
 
 		intel_engine_pm_put(engine);
 		tasklet_enable(&engine->execlists.tasklet);
+		tasklet_hi_schedule(&engine->execlists.tasklet);
 		if (err)
 			break;
 	}
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c3b7e8a0dae7..05e0242de6b4 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1583,6 +1583,12 @@ struct i915_request *__i915_request_commit(struct i915_request *rq)
 	return __i915_request_add_to_timeline(rq);
 }
 
+void __i915_request_queue_bh(struct i915_request *rq)
+{
+	i915_sw_fence_commit(&rq->semaphore);
+	i915_sw_fence_commit(&rq->submit);
+}
+
 void __i915_request_queue(struct i915_request *rq,
 			  const struct i915_sched_attr *attr)
 {
@@ -1599,8 +1605,10 @@ void __i915_request_queue(struct i915_request *rq,
 	 */
 	if (attr && rq->engine->schedule)
 		rq->engine->schedule(rq, attr);
-	i915_sw_fence_commit(&rq->semaphore);
-	i915_sw_fence_commit(&rq->submit);
+
+	local_bh_disable();
+	__i915_request_queue_bh(rq);
+	local_bh_enable(); /* kick tasklets */
 }
 
 void i915_request_add(struct i915_request *rq)
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 26a19b84a586..fd36ba3a4862 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -315,6 +315,7 @@ void __i915_request_skip(struct i915_request *rq);
 struct i915_request *__i915_request_commit(struct i915_request *request);
 void __i915_request_queue(struct i915_request *rq,
 			  const struct i915_sched_attr *attr);
+void __i915_request_queue_bh(struct i915_request *rq);
 
 bool i915_request_retire(struct i915_request *rq);
 void i915_request_retire_upto(struct i915_request *rq);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8837ba672933..b217d40eedd1 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -458,14 +458,10 @@ int i915_sched_node_add_dependency(struct i915_sched_node *node,
 	if (!dep)
 		return -ENOMEM;
 
-	local_bh_disable();
-
 	if (!__i915_sched_node_add_dependency(node, signal, dep,
 					      flags | I915_DEPENDENCY_ALLOC))
 		i915_dependency_free(dep);
 
-	local_bh_enable(); /* kick submission tasklet */
-
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index e424a6d1a68c..b8c5920d1ff3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1932,9 +1932,7 @@ static int measure_inter_request(struct intel_context *ce)
 		intel_ring_advance(rq, cs);
 		i915_request_add(rq);
 	}
-	local_bh_disable();
 	i915_sw_fence_commit(submit);
-	local_bh_enable();
 	intel_engine_flush_submission(ce->engine);
 	heap_fence_put(submit);
 
@@ -2220,11 +2218,9 @@ static int measure_completion(struct intel_context *ce)
 		intel_ring_advance(rq, cs);
 
 		dma_fence_add_callback(&rq->fence, &cb.base, signal_cb);
-
-		local_bh_disable();
 		i915_request_add(rq);
-		local_bh_enable();
 
+		intel_engine_flush_submission(ce->engine);
 		if (wait_for(READ_ONCE(sema[i]) == -1, 50)) {
 			err = -EIO;
 			goto err;
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index ec0ecb4e4ca6..e09ce8067b9c 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -219,6 +219,9 @@ void igt_spinner_fini(struct igt_spinner *spin)
 
 bool igt_wait_for_spinner(struct igt_spinner *spin, struct i915_request *rq)
 {
+	if (i915_request_is_ready(rq))
+		intel_engine_flush_submission(rq->engine);
+
 	return !(wait_for_us(i915_seqno_passed(hws_seqno(spin, rq),
 					       rq->fence.seqno),
 			     100) &&
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 21/28] drm/i915/gt: ce->inflight updates are now serialised
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (18 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 20/28] drm/i915/gt: Replace direct submit with direct call to tasklet Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 22/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
                   ` (10 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Since schedule-in and schedule-out are now both always under the tasklet
bitlock, we can reduce the individual atomic operations to simple
instructions and worry less.

This notably eliminates the race observed with intel_context_inflight in
__engine_unpark().

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2583
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 52 ++++++++++++++---------------
 1 file changed, 25 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 469c347dc71f..9fa2dc2699c8 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1358,11 +1358,11 @@ __execlists_schedule_in(struct i915_request *rq)
 		ce->lrc.ccid = ce->tag;
 	} else {
 		/* We don't need a strict matching tag, just different values */
-		unsigned int tag = ffs(READ_ONCE(engine->context_tag));
+		unsigned int tag = __ffs(engine->context_tag);
 
-		GEM_BUG_ON(tag == 0 || tag >= BITS_PER_LONG);
-		clear_bit(tag - 1, &engine->context_tag);
-		ce->lrc.ccid = tag << (GEN11_SW_CTX_ID_SHIFT - 32);
+		GEM_BUG_ON(tag >= BITS_PER_LONG);
+		__clear_bit(tag, &engine->context_tag);
+		ce->lrc.ccid = (1 + tag) << (GEN11_SW_CTX_ID_SHIFT - 32);
 
 		BUILD_BUG_ON(BITS_PER_LONG > GEN12_MAX_CONTEXT_HW_ID);
 	}
@@ -1375,6 +1375,8 @@ __execlists_schedule_in(struct i915_request *rq)
 	execlists_context_status_change(rq, INTEL_CONTEXT_SCHEDULE_IN);
 	intel_engine_context_in(engine);
 
+	CE_TRACE(ce, "schedule-in, ccid:%x\n", ce->lrc.ccid);
+
 	return engine;
 }
 
@@ -1386,13 +1388,10 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx)
 	GEM_BUG_ON(!intel_engine_pm_is_awake(rq->engine));
 	trace_i915_request_in(rq, idx);
 
-	old = READ_ONCE(ce->inflight);
-	do {
-		if (!old) {
-			WRITE_ONCE(ce->inflight, __execlists_schedule_in(rq));
-			break;
-		}
-	} while (!try_cmpxchg(&ce->inflight, &old, ptr_inc(old)));
+	old = ce->inflight;
+	if (!old)
+		old = __execlists_schedule_in(rq);
+	WRITE_ONCE(ce->inflight, ptr_inc(old));
 
 	GEM_BUG_ON(intel_context_inflight(ce) != rq->engine);
 }
@@ -1406,12 +1405,11 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
 
-static inline void
-__execlists_schedule_out(struct i915_request *rq,
-			 struct intel_engine_cs * const engine,
-			 unsigned int ccid)
+static inline void __execlists_schedule_out(struct i915_request *rq)
 {
 	struct intel_context * const ce = rq->context;
+	struct intel_engine_cs * const engine = rq->engine;
+	unsigned int ccid;
 
 	/*
 	 * NB process_csb() is not under the engine->active.lock and hence
@@ -1419,6 +1417,8 @@ __execlists_schedule_out(struct i915_request *rq,
 	 * refrain from doing non-trivial work here.
 	 */
 
+	CE_TRACE(ce, "schedule-out, ccid:%x\n", ce->lrc.ccid);
+
 	if (IS_ENABLED(CONFIG_DRM_I915_DEBUG_GEM))
 		execlists_check_context(ce, engine, "after");
 
@@ -1430,12 +1430,13 @@ __execlists_schedule_out(struct i915_request *rq,
 	    i915_request_completed(rq))
 		intel_engine_add_retire(engine, ce->timeline);
 
+	ccid = ce->lrc.ccid;
 	ccid >>= GEN11_SW_CTX_ID_SHIFT - 32;
 	ccid &= GEN12_MAX_CONTEXT_HW_ID;
 	if (ccid < BITS_PER_LONG) {
 		GEM_BUG_ON(ccid == 0);
 		GEM_BUG_ON(test_bit(ccid - 1, &engine->context_tag));
-		set_bit(ccid - 1, &engine->context_tag);
+		__set_bit(ccid - 1, &engine->context_tag);
 	}
 
 	intel_context_update_runtime(ce);
@@ -1456,26 +1457,23 @@ __execlists_schedule_out(struct i915_request *rq,
 	 */
 	if (ce->engine != engine)
 		kick_siblings(rq, ce);
-
-	intel_context_put(ce);
 }
 
 static inline void
 execlists_schedule_out(struct i915_request *rq)
 {
 	struct intel_context * const ce = rq->context;
-	struct intel_engine_cs *cur, *old;
-	u32 ccid;
 
 	trace_i915_request_out(rq);
 
-	ccid = rq->context->lrc.ccid;
-	old = READ_ONCE(ce->inflight);
-	do
-		cur = ptr_unmask_bits(old, 2) ? ptr_dec(old) : NULL;
-	while (!try_cmpxchg(&ce->inflight, &old, cur));
-	if (!cur)
-		__execlists_schedule_out(rq, old, ccid);
+	GEM_BUG_ON(!ce->inflight);
+	ce->inflight = ptr_dec(ce->inflight);
+	if (!intel_context_inflight_count(ce)) {
+		GEM_BUG_ON(ce->inflight != rq->engine);
+		__execlists_schedule_out(rq);
+		WRITE_ONCE(ce->inflight, NULL);
+		intel_context_put(ce);
+	}
 
 	i915_request_put(rq);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 22/28] drm/i915/gt: Use virtual_engine during execlists_dequeue
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (19 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: ce->inflight updates are now serialised Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 23/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
                   ` (9 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Rather than going back and forth between the rb_node entry and the
virtual_engine type, store the ve local and reuse it. As the
container_of conversion from rb_node to virtual_engine requires a
variable offset, performing that conversion just once shaves off a bit
of code.

v2: Keep a single virtual engine lookup, for typical use.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 239 ++++++++++++----------------
 1 file changed, 105 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 9fa2dc2699c8..760449d03462 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -454,9 +454,15 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 	return ((p->priority + 1) << I915_USER_PRIORITY_SHIFT) - ffs(p->used);
 }
 
+static int virtual_prio(const struct intel_engine_execlists *el)
+{
+	struct rb_node *rb = rb_first_cached(&el->virtual);
+
+	return rb ? rb_entry(rb, struct ve_node, rb)->prio : INT_MIN;
+}
+
 static inline bool need_preempt(const struct intel_engine_cs *engine,
-				const struct i915_request *rq,
-				struct rb_node *rb)
+				const struct i915_request *rq)
 {
 	int last_prio;
 
@@ -493,25 +499,6 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, sched.link)) > last_prio)
 		return true;
 
-	if (rb) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
-		bool preempt = false;
-
-		if (engine == ve->siblings[0]) { /* only preempt one sibling */
-			struct i915_request *next;
-
-			rcu_read_lock();
-			next = READ_ONCE(ve->request);
-			if (next)
-				preempt = rq_prio(next) > last_prio;
-			rcu_read_unlock();
-		}
-
-		if (preempt)
-			return preempt;
-	}
-
 	/*
 	 * If the inflight context did not trigger the preemption, then maybe
 	 * it was the set of queued requests? Pick the highest priority in
@@ -522,7 +509,8 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * ELSP[0] or ELSP[1] as, thanks again to PI, if it was the same
 	 * context, it's priority would not exceed ELSP[0] aka last_prio.
 	 */
-	return queue_prio(&engine->execlists) > last_prio;
+	return max(virtual_prio(&engine->execlists),
+		   queue_prio(&engine->execlists)) > last_prio;
 }
 
 __maybe_unused static inline bool
@@ -1808,6 +1796,35 @@ static bool virtual_matches(const struct virtual_engine *ve,
 	return true;
 }
 
+static struct virtual_engine *
+first_virtual_engine(struct intel_engine_cs *engine)
+{
+	struct intel_engine_execlists *el = &engine->execlists;
+	struct rb_node *rb = rb_first_cached(&el->virtual);
+
+	while (rb) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		/* lazily cleanup after another engine handled rq */
+		if (!rq) {
+			rb_erase_cached(rb, &el->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&el->virtual);
+			continue;
+		}
+
+		if (!virtual_matches(ve, rq, engine)) {
+			rb = rb_next(rb);
+			continue;
+		}
+		return ve;
+	}
+
+	return NULL;
+}
+
 static void virtual_xfer_context(struct virtual_engine *ve,
 				 struct intel_engine_cs *engine)
 {
@@ -1896,32 +1913,15 @@ static void defer_active(struct intel_engine_cs *engine)
 
 static bool
 need_timeslice(const struct intel_engine_cs *engine,
-	       const struct i915_request *rq,
-	       const struct rb_node *rb)
+	       const struct i915_request *rq)
 {
 	int hint;
 
 	if (!intel_engine_has_timeslices(engine))
 		return false;
 
-	hint = engine->execlists.queue_priority_hint;
-
-	if (rb) {
-		const struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
-		const struct intel_engine_cs *inflight =
-			intel_context_inflight(&ve->context);
-
-		if (!inflight || inflight == engine) {
-			struct i915_request *next;
-
-			rcu_read_lock();
-			next = READ_ONCE(ve->request);
-			if (next)
-				hint = max(hint, rq_prio(next));
-			rcu_read_unlock();
-		}
-	}
+	hint = max(engine->execlists.queue_priority_hint,
+		   virtual_prio(&engine->execlists));
 
 	if (!list_is_last(&rq->sched.link, &engine->active.requests))
 		hint = max(hint, rq_prio(list_next_entry(rq, sched.link)));
@@ -2068,6 +2068,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	struct i915_request **port = execlists->pending;
 	struct i915_request ** const last_port = port + execlists->port_mask;
 	struct i915_request *last = *execlists->active;
+	struct virtual_engine *ve;
 	struct rb_node *rb;
 	bool submit = false;
 
@@ -2095,26 +2096,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 
 	spin_lock(&engine->active.lock);
 
-	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
-		struct i915_request *rq = READ_ONCE(ve->request);
-
-		if (!rq) { /* lazily cleanup after another engine handled rq */
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
-
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
-
-		break;
-	}
-
 	/*
 	 * If the queue is higher priority than the last
 	 * request in the currently active context, submit afresh.
@@ -2137,7 +2118,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	if (last) {
 		if (i915_request_completed(last)) {
 			goto check_secondary;
-		} else if (need_preempt(engine, last, rb)) {
+		} else if (need_preempt(engine, last)) {
 			ENGINE_TRACE(engine,
 				     "preempting last=%llx:%lld, prio=%d, hint=%d\n",
 				     last->fence.context,
@@ -2163,7 +2144,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			__unwind_incomplete_requests(engine);
 
 			last = NULL;
-		} else if (need_timeslice(engine, last, rb) &&
+		} else if (need_timeslice(engine, last) &&
 			   timeslice_expired(execlists, last)) {
 			ENGINE_TRACE(engine,
 				     "expired last=%llx:%lld, prio=%d, hint=%d, yield?=%s\n",
@@ -2214,96 +2195,86 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 	}
 
-	while (rb) { /* XXX virtual is always taking precedence */
-		struct virtual_engine *ve =
-			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+	/* XXX virtual is always taking precedence */
+	while ((ve = first_virtual_engine(engine))) {
 		struct i915_request *rq;
 
 		spin_lock(&ve->base.active.lock);
 
 		rq = ve->request;
-		if (unlikely(!rq)) { /* lost the race to a sibling */
-			spin_unlock(&ve->base.active.lock);
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
-			rb = rb_first_cached(&execlists->virtual);
-			continue;
-		}
+		if (unlikely(!rq)) /* lost the race to a sibling */
+			goto unlock;
 
-		GEM_BUG_ON(rq != ve->request);
 		GEM_BUG_ON(rq->engine != &ve->base);
 		GEM_BUG_ON(rq->context != &ve->context);
 
-		if (rq_prio(rq) >= queue_prio(execlists)) {
-			if (!virtual_matches(ve, rq, engine)) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_next(rb);
-				continue;
-			}
+		if (unlikely(rq_prio(rq) < queue_prio(execlists))) {
+			spin_unlock(&ve->base.active.lock);
+			break;
+		}
 
-			if (last && !can_merge_rq(last, rq)) {
-				spin_unlock(&ve->base.active.lock);
-				spin_unlock(&engine->active.lock);
-				start_timeslice(engine, rq_prio(rq));
-				return; /* leave this for another sibling */
-			}
+		GEM_BUG_ON(!virtual_matches(ve, rq, engine));
 
-			ENGINE_TRACE(engine,
-				     "virtual rq=%llx:%lld%s, new engine? %s\n",
-				     rq->fence.context,
-				     rq->fence.seqno,
-				     i915_request_completed(rq) ? "!" :
-				     i915_request_started(rq) ? "*" :
-				     "",
-				     yesno(engine != ve->siblings[0]));
-
-			WRITE_ONCE(ve->request, NULL);
-			WRITE_ONCE(ve->base.execlists.queue_priority_hint,
-				   INT_MIN);
-			rb_erase_cached(rb, &execlists->virtual);
-			RB_CLEAR_NODE(rb);
+		if (last && !can_merge_rq(last, rq)) {
+			spin_unlock(&ve->base.active.lock);
+			spin_unlock(&engine->active.lock);
+			start_timeslice(engine, rq_prio(rq));
+			return; /* leave this for another sibling */
+		}
 
-			GEM_BUG_ON(!(rq->execution_mask & engine->mask));
-			WRITE_ONCE(rq->engine, engine);
+		ENGINE_TRACE(engine,
+			     "virtual rq=%llx:%lld%s, new engine? %s\n",
+			     rq->fence.context,
+			     rq->fence.seqno,
+			     i915_request_completed(rq) ? "!" :
+			     i915_request_started(rq) ? "*" :
+			     "",
+			     yesno(engine != ve->siblings[0]));
 
-			if (__i915_request_submit(rq)) {
-				/*
-				 * Only after we confirm that we will submit
-				 * this request (i.e. it has not already
-				 * completed), do we want to update the context.
-				 *
-				 * This serves two purposes. It avoids
-				 * unnecessary work if we are resubmitting an
-				 * already completed request after timeslicing.
-				 * But more importantly, it prevents us altering
-				 * ve->siblings[] on an idle context, where
-				 * we may be using ve->siblings[] in
-				 * virtual_context_enter / virtual_context_exit.
-				 */
-				virtual_xfer_context(ve, engine);
-				GEM_BUG_ON(ve->siblings[0] != engine);
+		WRITE_ONCE(ve->request, NULL);
+		WRITE_ONCE(ve->base.execlists.queue_priority_hint, INT_MIN);
 
-				submit = true;
-				last = rq;
-			}
-			i915_request_put(rq);
+		rb = &ve->nodes[engine->id].rb;
+		rb_erase_cached(rb, &execlists->virtual);
+		RB_CLEAR_NODE(rb);
+
+		GEM_BUG_ON(!(rq->execution_mask & engine->mask));
+		WRITE_ONCE(rq->engine, engine);
 
+		if (__i915_request_submit(rq)) {
 			/*
-			 * Hmm, we have a bunch of virtual engine requests,
-			 * but the first one was already completed (thanks
-			 * preempt-to-busy!). Keep looking at the veng queue
-			 * until we have no more relevant requests (i.e.
-			 * the normal submit queue has higher priority).
+			 * Only after we confirm that we will submit
+			 * this request (i.e. it has not already
+			 * completed), do we want to update the context.
+			 *
+			 * This serves two purposes. It avoids
+			 * unnecessary work if we are resubmitting an
+			 * already completed request after timeslicing.
+			 * But more importantly, it prevents us altering
+			 * ve->siblings[] on an idle context, where
+			 * we may be using ve->siblings[] in
+			 * virtual_context_enter / virtual_context_exit.
 			 */
-			if (!submit) {
-				spin_unlock(&ve->base.active.lock);
-				rb = rb_first_cached(&execlists->virtual);
-				continue;
-			}
+			virtual_xfer_context(ve, engine);
+			GEM_BUG_ON(ve->siblings[0] != engine);
+
+			submit = true;
+			last = rq;
 		}
 
+		i915_request_put(rq);
+unlock:
 		spin_unlock(&ve->base.active.lock);
-		break;
+
+		/*
+		 * Hmm, we have a bunch of virtual engine requests,
+		 * but the first one was already completed (thanks
+		 * preempt-to-busy!). Keep looking at the veng queue
+		 * until we have no more relevant requests (i.e.
+		 * the normal submit queue has higher priority).
+		 */
+		if (submit)
+			break;
 	}
 
 	while ((rb = rb_first_cached(&execlists->queue))) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 23/28] drm/i915/gt: Decouple inflight virtual engines
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (20 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 22/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 24/28] drm/i915/gt: Defer schedule_out until after the next dequeue Chris Wilson
                   ` (8 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Once a virtual engine has been bound to a sibling, it will remain bound
until we finally schedule out the last active request. We can not rebind
the context to a new sibling while it is inflight as the context save
will conflict, hence we wait. As we cannot then use any other sibliing
while the context is inflight, only kick the bound sibling while it
inflight and upon scheduling out the kick the rest (so that we can swap
engines on timeslicing if the previously bound engine becomes
oversubscribed).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 28 ++++++++++++----------------
 1 file changed, 12 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 760449d03462..95b11ce6a67b 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1387,9 +1387,8 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx)
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
-	struct i915_request *next = READ_ONCE(ve->request);
 
-	if (next == rq || (next && next->execution_mask & ~rq->execution_mask))
+	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
 
@@ -1808,17 +1807,13 @@ first_virtual_engine(struct intel_engine_cs *engine)
 		struct i915_request *rq = READ_ONCE(ve->request);
 
 		/* lazily cleanup after another engine handled rq */
-		if (!rq) {
+		if (!rq || !virtual_matches(ve, rq, engine)) {
 			rb_erase_cached(rb, &el->virtual);
 			RB_CLEAR_NODE(rb);
 			rb = rb_first_cached(&el->virtual);
 			continue;
 		}
 
-		if (!virtual_matches(ve, rq, engine)) {
-			rb = rb_next(rb);
-			continue;
-		}
 		return ve;
 	}
 
@@ -5591,7 +5586,6 @@ static void virtual_submission_tasklet(unsigned long data)
 	if (unlikely(!mask))
 		return;
 
-	local_irq_disable();
 	for (n = 0; n < ve->num_siblings; n++) {
 		struct intel_engine_cs *sibling = READ_ONCE(ve->siblings[n]);
 		struct ve_node * const node = &ve->nodes[sibling->id];
@@ -5601,20 +5595,19 @@ static void virtual_submission_tasklet(unsigned long data)
 		if (!READ_ONCE(ve->request))
 			break; /* already handled by a sibling's tasklet */
 
+		spin_lock_irq(&sibling->active.lock);
+
 		if (unlikely(!(mask & sibling->mask))) {
 			if (!RB_EMPTY_NODE(&node->rb)) {
-				spin_lock(&sibling->active.lock);
 				rb_erase_cached(&node->rb,
 						&sibling->execlists.virtual);
 				RB_CLEAR_NODE(&node->rb);
-				spin_unlock(&sibling->active.lock);
 			}
-			continue;
-		}
 
-		spin_lock(&sibling->active.lock);
+			goto unlock_engine;
+		}
 
-		if (!RB_EMPTY_NODE(&node->rb)) {
+		if (unlikely(!RB_EMPTY_NODE(&node->rb))) {
 			/*
 			 * Cheat and avoid rebalancing the tree if we can
 			 * reuse this node in situ.
@@ -5654,9 +5647,12 @@ static void virtual_submission_tasklet(unsigned long data)
 		if (first && prio > sibling->execlists.queue_priority_hint)
 			tasklet_hi_schedule(&sibling->execlists.tasklet);
 
-		spin_unlock(&sibling->active.lock);
+unlock_engine:
+		spin_unlock_irq(&sibling->active.lock);
+
+		if (intel_context_inflight(&ve->context))
+			break;
 	}
-	local_irq_enable();
 }
 
 static void virtual_submit_request(struct i915_request *rq)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 24/28] drm/i915/gt: Defer schedule_out until after the next dequeue
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (21 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 23/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
@ 2020-11-17 11:30 ` Chris Wilson
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Remove virtual breadcrumb before transfer Chris Wilson
                   ` (7 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:30 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Inside schedule_out, we do extra work upon idling the context, such as
updating the runtime, kicking off retires, kicking virtual engines.
However, if we are in a series of processing single requests per
contexts, we may find ourselves scheduling out the context, only to
immediately schedule it back in during dequeue. This is just extra work
that we can avoid if we keep the context marked as inflight across the
dequeue. This becomes more significant later on for minimising virtual
engine misses.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_context_types.h |   4 +-
 drivers/gpu/drm/i915/gt/intel_lrc.c           | 123 ++++++++++++------
 2 files changed, 82 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
index 52fa9c132746..10830eeab0f7 100644
--- a/drivers/gpu/drm/i915/gt/intel_context_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
@@ -58,8 +58,8 @@ struct intel_context {
 
 	struct intel_engine_cs *engine;
 	struct intel_engine_cs *inflight;
-#define intel_context_inflight(ce) ptr_mask_bits(READ_ONCE((ce)->inflight), 2)
-#define intel_context_inflight_count(ce) ptr_unmask_bits(READ_ONCE((ce)->inflight), 2)
+#define intel_context_inflight(ce) ptr_mask_bits(READ_ONCE((ce)->inflight), 3)
+#define intel_context_inflight_count(ce) ptr_unmask_bits(READ_ONCE((ce)->inflight), 3)
 
 	struct i915_address_space *vm;
 	struct i915_gem_context __rcu *gem_context;
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 95b11ce6a67b..899452726228 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2044,19 +2044,6 @@ static void set_preempt_timeout(struct intel_engine_cs *engine,
 		     active_preempt_timeout(engine, rq));
 }
 
-static inline void clear_ports(struct i915_request **ports, int count)
-{
-	memset_p((void **)ports, NULL, count);
-}
-
-static inline void
-copy_ports(struct i915_request **dst, struct i915_request **src, int count)
-{
-	/* A memcpy_p() would be very useful here! */
-	while (count--)
-		WRITE_ONCE(*dst++, *src++); /* avoid write tearing */
-}
-
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -2392,6 +2379,8 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		start_timeslice(engine, execlists->queue_priority_hint);
 skip_submit:
 		ring_set_paused(engine, 0);
+		while (port-- != execlists->pending)
+			i915_request_put(*port);
 		*execlists->pending = NULL;
 	}
 }
@@ -2403,22 +2392,38 @@ static void execlists_dequeue_irq(struct intel_engine_cs *engine)
 	local_irq_enable(); /* flush irq_work (e.g. breadcrumb enabling) */
 }
 
-static void
-cancel_port_requests(struct intel_engine_execlists * const execlists)
+static inline void clear_ports(struct i915_request **ports, int count)
+{
+	memset_p((void **)ports, NULL, count);
+}
+
+static inline void
+copy_ports(struct i915_request **dst, struct i915_request **src, int count)
+{
+	/* A memcpy_p() would be very useful here! */
+	while (count--)
+		WRITE_ONCE(*dst++, *src++); /* avoid write tearing */
+}
+
+static struct i915_request **
+cancel_port_requests(struct intel_engine_execlists * const execlists,
+		     struct i915_request **inactive)
 {
 	struct i915_request * const *port;
 
 	for (port = execlists->pending; *port; port++)
-		execlists_schedule_out(*port);
+		*inactive++ = *port;
 	clear_ports(execlists->pending, ARRAY_SIZE(execlists->pending));
 
 	/* Mark the end of active before we overwrite *active */
 	for (port = xchg(&execlists->active, execlists->pending); *port; port++)
-		execlists_schedule_out(*port);
+		*inactive++ = *port;
 	clear_ports(execlists->inflight, ARRAY_SIZE(execlists->inflight));
 
 	smp_wmb(); /* complete the seqlock for execlists_active() */
 	WRITE_ONCE(execlists->active, execlists->inflight);
+
+	return inactive;
 }
 
 static inline void
@@ -2546,7 +2551,8 @@ csb_read(const struct intel_engine_cs *engine, u64 * const csb)
 	return entry;
 }
 
-static void process_csb(struct intel_engine_cs *engine)
+static struct i915_request **
+process_csb(struct intel_engine_cs *engine, struct i915_request **inactive)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
 	u64 * const buf = execlists->csb_status;
@@ -2575,7 +2581,7 @@ static void process_csb(struct intel_engine_cs *engine)
 	head = execlists->csb_head;
 	tail = READ_ONCE(*execlists->csb_write);
 	if (unlikely(head == tail))
-		return;
+		return inactive;
 
 	/*
 	 * We will consume all events from HW, or at least pretend to.
@@ -2655,7 +2661,7 @@ static void process_csb(struct intel_engine_cs *engine)
 			/* cancel old inflight, prepare for switch */
 			trace_ports(execlists, "preempted", old);
 			while (*old)
-				execlists_schedule_out(*old++);
+				*inactive++ = *old++;
 
 			/* switch pending to inflight */
 			GEM_BUG_ON(!assert_pending_valid(execlists, "promote"));
@@ -2717,7 +2723,7 @@ static void process_csb(struct intel_engine_cs *engine)
 					     regs[CTX_RING_TAIL]);
 			}
 
-			execlists_schedule_out(*execlists->active++);
+			*inactive++ = *execlists->active++;
 
 			GEM_BUG_ON(execlists->active - execlists->inflight >
 				   execlists_num_ports(execlists));
@@ -2738,6 +2744,15 @@ static void process_csb(struct intel_engine_cs *engine)
 	 * invalidation before.
 	 */
 	invalidate_csb_entries(&buf[0], &buf[num_entries - 1]);
+
+	return inactive;
+}
+
+static void post_process_csb(struct i915_request **port,
+			     struct i915_request **last)
+{
+	while (port != last)
+		execlists_schedule_out(*port++);
 }
 
 static void __execlists_hold(struct i915_request *rq)
@@ -3010,8 +3025,8 @@ active_context(struct intel_engine_cs *engine, u32 ccid)
 	for (port = el->active; (rq = *port); port++) {
 		if (rq->context->lrc.ccid == ccid) {
 			ENGINE_TRACE(engine,
-				     "ccid found at active:%zd\n",
-				     port - el->active);
+				     "ccid:%x found at active:%zd\n",
+				     ccid, port - el->active);
 			return rq;
 		}
 	}
@@ -3019,8 +3034,8 @@ active_context(struct intel_engine_cs *engine, u32 ccid)
 	for (port = el->pending; (rq = *port); port++) {
 		if (rq->context->lrc.ccid == ccid) {
 			ENGINE_TRACE(engine,
-				     "ccid found at pending:%zd\n",
-				     port - el->pending);
+				     "ccid:%x found at pending:%zd\n",
+				     ccid, port - el->pending);
 			return rq;
 		}
 	}
@@ -3138,8 +3153,11 @@ static bool preempt_timeout(const struct intel_engine_cs *const engine)
 static void execlists_submission_tasklet(unsigned long data)
 {
 	struct intel_engine_cs * const engine = (struct intel_engine_cs *)data;
+	struct i915_request *post[2 * EXECLIST_MAX_PORTS];
+	struct i915_request **inactive;
 
-	process_csb(engine);
+	inactive = process_csb(engine, post);
+	GEM_BUG_ON(inactive - post > ARRAY_SIZE(post));
 
 	if (unlikely(preempt_timeout(engine)))
 		engine->execlists.error_interrupt |= ERROR_PREEMPT;
@@ -3163,6 +3181,8 @@ static void execlists_submission_tasklet(unsigned long data)
 
 	if (!engine->execlists.pending[0])
 		execlists_dequeue_irq(engine);
+
+	post_process_csb(post, inactive);
 }
 
 static void __execlists_kick(struct intel_engine_execlists *execlists)
@@ -4108,8 +4128,6 @@ static void enable_execlists(struct intel_engine_cs *engine)
 	ENGINE_POSTING_READ(engine, RING_HWS_PGA);
 
 	enable_error_interrupt(engine);
-
-	engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0);
 }
 
 static bool unexpected_starting_state(struct intel_engine_cs *engine)
@@ -4198,22 +4216,29 @@ static void __execlists_reset_reg_state(const struct intel_context *ce,
 	__reset_stop_ring(regs, engine);
 }
 
-static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
+static struct i915_request **reset_csb(struct intel_engine_cs *engine,
+				       struct i915_request **inactive)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
-	struct intel_context *ce;
-	struct i915_request *rq;
-	u32 head;
 
 	mb(); /* paranoia: read the CSB pointers from after the reset */
 	clflush(execlists->csb_write);
 	mb();
 
-	process_csb(engine); /* drain preemption events */
+	inactive = process_csb(engine, inactive); /* drain preemption events */
 
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	reset_csb_pointers(engine);
 
+	return inactive;
+}
+
+static void execlists_reset_active(struct intel_engine_cs *engine, bool stalled)
+{
+	struct intel_context *ce;
+	struct i915_request *rq;
+	u32 head;
+
 	/*
 	 * Save the currently executing context, even if we completed
 	 * its request, it was still running at the time of the
@@ -4221,7 +4246,7 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 */
 	rq = active_context(engine, engine->execlists.reset_ccid);
 	if (!rq)
-		goto unwind;
+		return;
 
 	ce = rq->context;
 	GEM_BUG_ON(!i915_vma_is_pinned(ce->state));
@@ -4284,11 +4309,20 @@ static void __execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	__execlists_reset_reg_state(ce, engine);
 	__execlists_update_reg_state(ce, engine, head);
 	ce->lrc.desc |= CTX_DESC_FORCE_RESTORE; /* paranoid: GPU was reset! */
+}
 
-unwind:
-	/* Push back any incomplete requests for replay after the reset. */
-	cancel_port_requests(execlists);
-	__unwind_incomplete_requests(engine);
+static void execlists_reset_csb(struct intel_engine_cs *engine, bool stalled)
+{
+	struct intel_engine_execlists * const execlists = &engine->execlists;
+	struct i915_request *post[2 * EXECLIST_MAX_PORTS];
+	struct i915_request **inactive;
+
+	inactive = reset_csb(engine, post);
+
+	execlists_reset_active(engine, true);
+
+	inactive = cancel_port_requests(execlists, inactive);
+	post_process_csb(post, inactive);
 }
 
 static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled)
@@ -4297,10 +4331,12 @@ static void execlists_reset_rewind(struct intel_engine_cs *engine, bool stalled)
 
 	ENGINE_TRACE(engine, "\n");
 
-	spin_lock_irqsave(&engine->active.lock, flags);
-
-	__execlists_reset(engine, stalled);
+	/* Process the csb, find the guilty context and throw away */
+	execlists_reset_csb(engine, stalled);
 
+	/* Push back any incomplete requests for replay after the reset. */
+	spin_lock_irqsave(&engine->active.lock, flags);
+	__unwind_incomplete_requests(engine);
 	spin_unlock_irqrestore(&engine->active.lock, flags);
 }
 
@@ -4335,9 +4371,9 @@ static void execlists_reset_cancel(struct intel_engine_cs *engine)
 	 * submission's irq state, we also wish to remind ourselves that
 	 * it is irq state.)
 	 */
-	spin_lock_irqsave(&engine->active.lock, flags);
+	execlists_reset_csb(engine, true);
 
-	__execlists_reset(engine, true);
+	spin_lock_irqsave(&engine->active.lock, flags);
 
 	/* Mark all executing requests as skipped. */
 	list_for_each_entry(rq, &engine->active.requests, sched.link)
@@ -5163,6 +5199,7 @@ int intel_execlists_submission_setup(struct intel_engine_cs *engine)
 	else
 		execlists->csb_size = GEN11_CSB_ENTRIES;
 
+	engine->context_tag = GENMASK(BITS_PER_LONG - 2, 0);
 	if (INTEL_GEN(engine->i915) >= 11) {
 		execlists->ccid |= engine->instance << (GEN11_ENGINE_INSTANCE_SHIFT - 32);
 		execlists->ccid |= engine->class << (GEN11_ENGINE_CLASS_SHIFT - 32);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 25/28] drm/i915/gt: Remove virtual breadcrumb before transfer
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (22 preceding siblings ...)
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 24/28] drm/i915/gt: Defer schedule_out until after the next dequeue Chris Wilson
@ 2020-11-17 11:31 ` Chris Wilson
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 26/28] drm/i915/gt: Shrink the critical section for irq signaling Chris Wilson
                   ` (6 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

The issue with stale virtual breadcrumbs remain. Now we have the problem
that if the irq-signaler is still referencing the stale breadcrumb as we
transfer it to a new sibling, the list becomes spaghetti. This is a very
small window, but that doesn't stop it being hit infrequently. To
prevent the lists being tangled (the iterator starting on one engine's
b->signalers but walking onto another list), always decouple the virtual
breadcrumb on schedule-out and make sure that the walker has stepped out
of the lists.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c |  5 +++--
 drivers/gpu/drm/i915/gt/intel_lrc.c         | 15 +++++++++++++++
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index f5f6feed0fa6..5b84f51931d9 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -461,15 +461,16 @@ void i915_request_cancel_breadcrumb(struct i915_request *rq)
 {
 	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
 	struct intel_context *ce = rq->context;
+	unsigned long flags;
 	bool release;
 
 	if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
 		return;
 
-	spin_lock(&ce->signal_lock);
+	spin_lock_irqsave(&ce->signal_lock, flags);
 	list_del_rcu(&rq->signal_link);
 	release = remove_signaling_context(b, ce);
-	spin_unlock(&ce->signal_lock);
+	spin_unlock_irqrestore(&ce->signal_lock, flags);
 	if (release)
 		intel_context_put(ce);
 
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 899452726228..a94daf4c70a3 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1387,6 +1387,21 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx)
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+	struct intel_engine_cs *engine = rq->engine;
+
+	/* Flush concurrent rcu iterators in signal_irq_work */
+	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &rq->fence.flags)) {
+		/*
+		 * After this point, the rq may be transferred to a new
+		 * sibling, so before we clear ce->inflight make sure that
+		 * the context has been removed from the b->signalers and
+		 * furthermore we need to make sure that the concurrent
+		 * iterator in signal_irq_work is no longer following
+		 * ce->signal_link.
+		 */
+		i915_request_cancel_breadcrumb(rq);
+		irq_work_sync(&engine->breadcrumbs->irq_work);
+	}
 
 	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 26/28] drm/i915/gt: Shrink the critical section for irq signaling
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (23 preceding siblings ...)
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Remove virtual breadcrumb before transfer Chris Wilson
@ 2020-11-17 11:31 ` Chris Wilson
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
                   ` (5 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Let's only wait for the list iterator when decoupling the virtual
breadcrumb, as the signaling of all the requests may take a long time,
during which we do not want to keep the tasklet spinning.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_breadcrumbs.c       | 2 ++
 drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h | 1 +
 drivers/gpu/drm/i915/gt/intel_lrc.c               | 3 ++-
 3 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
index 5b84f51931d9..3e69d6c3b197 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
@@ -251,6 +251,7 @@ static void signal_irq_work(struct irq_work *work)
 		intel_breadcrumbs_disarm_irq(b);
 
 	rcu_read_lock();
+	atomic_inc(&b->signaler_active);
 	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
 		struct i915_request *rq;
 
@@ -284,6 +285,7 @@ static void signal_irq_work(struct irq_work *work)
 			}
 		}
 	}
+	atomic_dec(&b->signaler_active);
 	rcu_read_unlock();
 
 	llist_for_each_safe(signal, sn, signal) {
diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
index a74bb3062bd8..f672053d694d 100644
--- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
+++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
@@ -35,6 +35,7 @@ struct intel_breadcrumbs {
 	spinlock_t signalers_lock; /* protects the list of signalers */
 	struct list_head signalers;
 	struct llist_head signaled_requests;
+	atomic_t signaler_active;
 
 	spinlock_t irq_lock; /* protects the interrupt from hardirq context */
 	struct irq_work irq_work; /* for use from inside irq_lock */
diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index a94daf4c70a3..87ede3513e4c 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1400,7 +1400,8 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 		 * ce->signal_link.
 		 */
 		i915_request_cancel_breadcrumb(rq);
-		irq_work_sync(&engine->breadcrumbs->irq_work);
+		while (atomic_read(&engine->breadcrumbs->signaler_active))
+			cpu_relax();
 	}
 
 	if (READ_ONCE(ve->request))
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 27/28] drm/i915/gt: Resubmit the virtual engine on schedule-out
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (24 preceding siblings ...)
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 26/28] drm/i915/gt: Shrink the critical section for irq signaling Chris Wilson
@ 2020-11-17 11:31 ` Chris Wilson
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 28/28] drm/i915/gt: Simplify virtual engine handling for execlists_hold() Chris Wilson
                   ` (4 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Having recognised that we do not change the sibling until we schedule
out, we can then defer the decision to resubmit the virtual engine from
the unwind of the active queue to scheduling out of the virtual context.
This improves our resilence in virtual engine scheduling, and should
eliminate the rare cases of gem_exec_balance failing.

By keeping the unwind order intact on the local engine, we can preserve
data dependency ordering while doing a preempt-to-busy pass until we
have determined the new ELSP. This means that if we try to timeslice
between a virtual engine and a data-dependent ordinary request, the pair
will maintain their relative ordering and we will avoid the
resubmission, cancelling the timeslicing until further change.

The dilemma though is that we then may end up in a situation where the
'demotion' of the virtual request to an ordinary request in the engine
queue results in filling the ELSP[] with virtual requests instead of
spreading the load across the engines. To compensate for this, we mark
each virtual request and refuse to resubmit a virtual request in the
secondary ELSP slots, thus forcing subsequent virtual requests to be
scheduled out after timeslicing. By delaying the decision until we
schedule out, we will avoid unnecessary resubmission.

Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2079
Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2098
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c    | 122 +++++++++++++++----------
 drivers/gpu/drm/i915/gt/selftest_lrc.c |   2 +-
 2 files changed, 77 insertions(+), 47 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 87ede3513e4c..5d64295e2a9e 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -1111,38 +1111,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		__i915_request_unsubmit(rq);
 
-		/*
-		 * Push the request back into the queue for later resubmission.
-		 * If this request is not native to this physical engine (i.e.
-		 * it came from a virtual source), push it back onto the virtual
-		 * engine so that it can be moved across onto another physical
-		 * engine as load dictates.
-		 */
-		if (likely(rq->execution_mask == engine->mask)) {
-			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-			if (rq_prio(rq) != prio) {
-				prio = rq_prio(rq);
-				pl = i915_sched_lookup_priolist(engine, prio);
-			}
-			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
-
-			list_move(&rq->sched.link, pl);
-			set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+		if (rq_prio(rq) != prio) {
+			prio = rq_prio(rq);
+			pl = i915_sched_lookup_priolist(engine, prio);
+		}
+		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
-			/* Check in case we rollback so far we wrap [size/2] */
-			if (intel_ring_direction(rq->ring,
-						 rq->tail,
-						 rq->ring->tail + 8) > 0)
-				rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE;
+		list_move(&rq->sched.link, pl);
+		set_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
 
-			active = rq;
-		} else {
-			struct intel_engine_cs *owner = rq->context->engine;
+		/* Check in case we rollback so far we wrap [size/2] */
+		if (intel_ring_direction(rq->ring,
+					 rq->tail,
+					 rq->ring->tail + 8) > 0)
+			rq->context->lrc.desc |= CTX_DESC_FORCE_RESTORE;
 
-			WRITE_ONCE(rq->engine, owner);
-			owner->submit_request(rq);
-			active = NULL;
-		}
+		active = rq;
 	}
 
 	return active;
@@ -1384,6 +1369,20 @@ static inline void execlists_schedule_in(struct i915_request *rq, int idx)
 	GEM_BUG_ON(intel_context_inflight(ce) != rq->engine);
 }
 
+static void
+resubmit_virtual_request(struct i915_request *rq, struct virtual_engine *ve)
+{
+	struct intel_engine_cs *engine = rq->engine;
+
+	spin_lock_irq(&engine->active.lock);
+
+	clear_bit(I915_FENCE_FLAG_PQUEUE, &rq->fence.flags);
+	WRITE_ONCE(rq->engine, &ve->base);
+	ve->base.submit_request(rq);
+
+	spin_unlock_irq(&engine->active.lock);
+}
+
 static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 {
 	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
@@ -1404,6 +1403,16 @@ static void kick_siblings(struct i915_request *rq, struct intel_context *ce)
 			cpu_relax();
 	}
 
+	/*
+	 * This engine is now too busy to run this virtual request, so
+	 * see if we can find an alternative engine for it to execute on.
+	 * Once a request has become bonded to this engine, we treat it the
+	 * same as other native request.
+	 */
+	if (i915_request_in_priority_queue(rq) &&
+	    rq->execution_mask != engine->mask)
+		resubmit_virtual_request(rq, ve);
+
 	if (READ_ONCE(ve->request))
 		tasklet_hi_schedule(&ve->base.execlists.tasklet);
 }
@@ -1648,6 +1657,20 @@ assert_pending_valid(const struct intel_engine_execlists *execlists,
 		}
 		sentinel = i915_request_has_sentinel(rq);
 
+		/*
+		 * We want virtual requests to only be in the first slot so
+		 * that they are never stuck behind a hog and can be immediately
+		 * transferred onto the next idle engine.
+		 */
+		if (rq->execution_mask != engine->mask &&
+		    port != execlists->pending) {
+			GEM_TRACE_ERR("%s: virtual engine:%llx not in prime position[%zd]\n",
+				      engine->name,
+				      ce->timeline->fence_context,
+				      port - execlists->pending);
+			return false;
+		}
+
 		/* Hold tightly onto the lock to prevent concurrent retires! */
 		if (!spin_trylock_irqsave(&rq->lock, flags))
 			continue;
@@ -2314,6 +2337,15 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				if (i915_request_has_sentinel(last))
 					goto done;
 
+				/*
+				 * We avoid submitting virtual requests into
+				 * the secondary ports so that we can migrate
+				 * the request immediately to another engine
+				 * rather than wait for the primary request.
+				 */
+				if (rq->execution_mask != engine->mask)
+					goto done;
+
 				/*
 				 * If GVT overrides us we only ever submit
 				 * port[0], leaving port[1] empty. Note that we
@@ -5711,7 +5743,6 @@ static void virtual_submission_tasklet(unsigned long data)
 static void virtual_submit_request(struct i915_request *rq)
 {
 	struct virtual_engine *ve = to_virtual_engine(rq->engine);
-	struct i915_request *old;
 	unsigned long flags;
 
 	ENGINE_TRACE(&ve->base, "rq=%llx:%lld\n",
@@ -5722,28 +5753,27 @@ static void virtual_submit_request(struct i915_request *rq)
 
 	spin_lock_irqsave(&ve->base.active.lock, flags);
 
-	old = ve->request;
-	if (old) { /* background completion event from preempt-to-busy */
-		GEM_BUG_ON(!i915_request_completed(old));
-		__i915_request_submit(old);
-		i915_request_put(old);
-	}
-
+	/* By the time we resubmit a request, it may be completed */
 	if (i915_request_completed(rq)) {
 		__i915_request_submit(rq);
+		goto unlock;
+	}
 
-		ve->base.execlists.queue_priority_hint = INT_MIN;
-		ve->request = NULL;
-	} else {
-		ve->base.execlists.queue_priority_hint = rq_prio(rq);
-		ve->request = i915_request_get(rq);
+	if (ve->request) { /* background completion from preempt-to-busy */
+		GEM_BUG_ON(!i915_request_completed(ve->request));
+		__i915_request_submit(ve->request);
+		i915_request_put(ve->request);
+	}
 
-		GEM_BUG_ON(!list_empty(virtual_queue(ve)));
-		list_move_tail(&rq->sched.link, virtual_queue(ve));
+	ve->base.execlists.queue_priority_hint = rq_prio(rq);
+	ve->request = i915_request_get(rq);
 
-		tasklet_hi_schedule(&ve->base.execlists.tasklet);
-	}
+	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
+	list_move_tail(&rq->sched.link, virtual_queue(ve));
 
+	tasklet_hi_schedule(&ve->base.execlists.tasklet);
+
+unlock:
 	spin_unlock_irqrestore(&ve->base.active.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/gt/selftest_lrc.c b/drivers/gpu/drm/i915/gt/selftest_lrc.c
index 37cb51c3f4f6..fbbd8343d7f6 100644
--- a/drivers/gpu/drm/i915/gt/selftest_lrc.c
+++ b/drivers/gpu/drm/i915/gt/selftest_lrc.c
@@ -4589,7 +4589,7 @@ static int reset_virtual_engine(struct intel_gt *gt,
 	spin_lock_irq(&engine->active.lock);
 	__unwind_incomplete_requests(engine);
 	spin_unlock_irq(&engine->active.lock);
-	GEM_BUG_ON(rq->engine != ve->engine);
+	GEM_BUG_ON(rq->engine != engine);
 
 	/* Reset the engine while keeping our active request on hold */
 	execlists_hold(engine, rq);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* [Intel-gfx] [PATCH 28/28] drm/i915/gt: Simplify virtual engine handling for execlists_hold()
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (25 preceding siblings ...)
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
@ 2020-11-17 11:31 ` Chris Wilson
  2020-11-17 18:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks Patchwork
                   ` (3 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 11:31 UTC (permalink / raw)
  To: intel-gfx; +Cc: Chris Wilson

Now that the tasklet completely controls scheduling of the requests, and
we postpone scheduling out the old requests, we can keep a hanging
virtual request bound to the engine on which it hung, and remove it from
te queue. On release, it will be returned to the same engine and remain
in its queue until it is scheduled; after which point it will become
eligible for transfer to a sibling. Instead, we could opt to resubmit the
request along the virtual engine on unhold, making it eligible for load
balancing immediately -- but that seems like a pointless optimisation
for a hanging context.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/gt/intel_lrc.c | 29 -----------------------------
 1 file changed, 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
index 5d64295e2a9e..0565ae738a20 100644
--- a/drivers/gpu/drm/i915/gt/intel_lrc.c
+++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
@@ -2852,35 +2852,6 @@ static bool execlists_hold(struct intel_engine_cs *engine,
 		goto unlock;
 	}
 
-	if (rq->engine != engine) { /* preempted virtual engine */
-		struct virtual_engine *ve = to_virtual_engine(rq->engine);
-
-		/*
-		 * intel_context_inflight() is only protected by virtue
-		 * of process_csb() being called only by the tasklet (or
-		 * directly from inside reset while the tasklet is suspended).
-		 * Assert that neither of those are allowed to run while we
-		 * poke at the request queues.
-		 */
-		GEM_BUG_ON(!reset_in_progress(&engine->execlists));
-
-		/*
-		 * An unsubmitted request along a virtual engine will
-		 * remain on the active (this) engine until we are able
-		 * to process the context switch away (and so mark the
-		 * context as no longer in flight). That cannot have happened
-		 * yet, otherwise we would not be hanging!
-		 */
-		spin_lock(&ve->base.active.lock);
-		GEM_BUG_ON(intel_context_inflight(rq->context) != engine);
-		GEM_BUG_ON(ve->request != rq);
-		ve->request = NULL;
-		spin_unlock(&ve->base.active.lock);
-		i915_request_put(rq);
-
-		rq->engine = engine;
-	}
-
 	/*
 	 * Transfer this request onto the hold queue to prevent it
 	 * being resumbitted to HW (and potentially completed) before we have
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows Chris Wilson
@ 2020-11-17 11:42   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-17 11:42 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> The presumption was that some time would always elapse between recording
> the start and the finish of a context switch. This turns out to be a
> regular occurrence and emitting a debug statement superfluous.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 4 ++--
>   1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 8a51c1c3a091..52b84474f93a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1307,7 +1307,7 @@ static void reset_active(struct i915_request *rq,
>   static void st_update_runtime_underflow(struct intel_context *ce, s32 dt)
>   {
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> -	ce->runtime.num_underflow += dt < 0;
> +	ce->runtime.num_underflow++;
>   	ce->runtime.max_underflow = max_t(u32, ce->runtime.max_underflow, -dt);
>   #endif
>   }
> @@ -1324,7 +1324,7 @@ static void intel_context_update_runtime(struct intel_context *ce)
>   	ce->runtime.last = intel_context_get_runtime(ce);
>   	dt = ce->runtime.last - old;
>   
> -	if (unlikely(dt <= 0)) {
> +	if (unlikely(dt < 0)) {
>   		CE_TRACE(ce, "runtime underflow: last=%u, new=%u, delta=%d\n",
>   			 old, ce->runtime.last, dt);
>   		st_update_runtime_underflow(ce, dt);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time Chris Wilson
@ 2020-11-17 12:44   ` Tvrtko Ursulin
  2020-11-17 13:05     ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-17 12:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> Since we wake the GT up before executing a request, and go to sleep as
> soon as it is retired, the GT wake time not only represents how long the
> device is powered up, but also provides a summary, albeit an overestimate,
> of the device runtime.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_gt_pm.c    | 45 ++++++++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_gt_pm.h    |  2 ++
>   drivers/gpu/drm/i915/gt/intel_gt_types.h | 24 +++++++++++++
>   drivers/gpu/drm/i915/i915_debugfs.c      |  2 ++
>   drivers/gpu/drm/i915/i915_pmu.c          |  6 ++++
>   include/uapi/drm/i915_drm.h              |  3 +-
>   6 files changed, 81 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> index 274aa0dd7050..dd2f88bed65a 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> @@ -39,6 +39,24 @@ static void user_forcewake(struct intel_gt *gt, bool suspend)
>   	intel_gt_pm_put(gt);
>   }
>   
> +static void runtime_begin(struct intel_gt *gt)
> +{
> +	write_seqcount_begin(&gt->stats.lock);
> +	gt->stats.start = ktime_get();
> +	gt->stats.active = true;
> +	write_seqcount_end(&gt->stats.lock);
> +}
> +
> +static void runtime_end(struct intel_gt *gt)
> +{
> +	write_seqcount_begin(&gt->stats.lock);
> +	gt->stats.active = false;
> +	gt->stats.total =
> +		ktime_add(gt->stats.total,
> +			  ktime_sub(ktime_get(), gt->stats.start));
> +	write_seqcount_end(&gt->stats.lock);
> +}
> +
>   static int __gt_unpark(struct intel_wakeref *wf)
>   {
>   	struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
> @@ -67,6 +85,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
>   	i915_pmu_gt_unparked(i915);
>   
>   	intel_gt_unpark_requests(gt);
> +	runtime_begin(gt);
>   
>   	return 0;
>   }
> @@ -79,6 +98,7 @@ static int __gt_park(struct intel_wakeref *wf)
>   
>   	GT_TRACE(gt, "\n");
>   
> +	runtime_end(gt);
>   	intel_gt_park_requests(gt);
>   
>   	i915_vma_parked(gt);
> @@ -106,6 +126,7 @@ static const struct intel_wakeref_ops wf_ops = {
>   void intel_gt_pm_init_early(struct intel_gt *gt)
>   {
>   	intel_wakeref_init(&gt->wakeref, gt->uncore->rpm, &wf_ops);
> +	seqcount_mutex_init(&gt->stats.lock, &gt->wakeref.mutex);
>   }
>   
>   void intel_gt_pm_init(struct intel_gt *gt)
> @@ -339,6 +360,30 @@ int intel_gt_runtime_resume(struct intel_gt *gt)
>   	return intel_uc_runtime_resume(&gt->uc);
>   }
>   
> +static ktime_t __intel_gt_get_busy_time(const struct intel_gt *gt)
> +{
> +	ktime_t total = gt->stats.total;
> +
> +	if (gt->stats.active)
> +		total = ktime_add(total,
> +				  ktime_sub(ktime_get(), gt->stats.start));
> +
> +	return total;
> +}
> +
> +ktime_t intel_gt_get_busy_time(const struct intel_gt *gt)
> +{
> +	unsigned int seq;
> +	ktime_t total;
> +
> +	do {
> +		seq = read_seqcount_begin(&gt->stats.lock);

Any specific reasons for read_seqcount_being vs read_seqbegin etc, the 
latter being used in engine stats seems to have some kcsan integration 
and former a bit more low level.

> +		total = __intel_gt_get_busy_time(gt);
> +	} while (read_seqcount_retry(&gt->stats.lock, seq));
> +
> +	return total;
> +}

I wish there was an easy way to extract some sort of struct stats 
between engine stats and this and have common helpers, but sadly structs 
are not exactly identical.

> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftest_gt_pm.c"
>   #endif
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> index 60f0e2fbe55c..aa8f2cda946b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> @@ -58,6 +58,8 @@ int intel_gt_resume(struct intel_gt *gt);
>   void intel_gt_runtime_suspend(struct intel_gt *gt);
>   int intel_gt_runtime_resume(struct intel_gt *gt);
>   
> +ktime_t intel_gt_get_busy_time(const struct intel_gt *gt);
> +
>   static inline bool is_mock_gt(const struct intel_gt *gt)
>   {
>   	return I915_SELFTEST_ONLY(gt->awake == -ENODEV);
> diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> index 6d39a4a11bf3..c7bde529feab 100644
> --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> @@ -87,6 +87,30 @@ struct intel_gt {
>   
>   	u32 pm_guc_events;
>   
> +	struct {
> +		bool active;
> +
> +		/**
> +		 * @lock: Lock protecting the below fields.
> +		 */
> +		seqcount_mutex_t lock;
> +
> +		/**
> +		 * @total: Total time this engine was busy.
> +		 *
> +		 * Accumulated time not counting the most recent block in cases
> +		 * where engine is currently busy (active > 0).
> +		 */
> +		ktime_t total;
> +
> +		/**
> +		 * @start: Timestamp of the last idle to active transition.
> +		 *
> +		 * Idle is defined as active == 0, active is active > 0.
> +		 */
> +		ktime_t start;
> +	} stats;
> +
>   	struct intel_engine_cs *engine[I915_NUM_ENGINES];
>   	struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
>   					    [MAX_ENGINE_INSTANCE + 1];
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 77e76b665098..337293c7bb7d 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1316,6 +1316,8 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>   	seq_printf(m, "GT awake? %s [%d]\n",
>   		   yesno(dev_priv->gt.awake),
>   		   atomic_read(&dev_priv->gt.wakeref.count));
> +	seq_printf(m, "GT busy: %llu ms\n",
> +		   ktime_to_ms(intel_gt_get_busy_time(&dev_priv->gt)));

Would it be worth putting something in debugfs_gt_pm.c as well?

>   	seq_printf(m, "CS timestamp frequency: %u Hz\n",
>   		   RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_hz);
>   
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index cd786ad12be7..36fc60cf5725 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -488,6 +488,8 @@ config_status(struct drm_i915_private *i915, u64 config)
>   		if (!HAS_RC6(i915))
>   			return -ENODEV;
>   		break;
> +	case I915_PMU_BUSY_TIME:
> +		break;
>   	default:
>   		return -ENOENT;
>   	}
> @@ -595,6 +597,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
>   		case I915_PMU_RC6_RESIDENCY:
>   			val = get_rc6(&i915->gt);
>   			break;
> +		case I915_PMU_BUSY_TIME:
> +			val = ktime_to_ns(intel_gt_get_busy_time(&i915->gt));
> +			break;
>   		}
>   	}
>   
> @@ -898,6 +903,7 @@ create_event_attributes(struct i915_pmu *pmu)
>   		__event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
>   		__event(I915_PMU_INTERRUPTS, "interrupts", NULL),
>   		__event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
> +		__event(I915_PMU_BUSY_TIME, "busy-time", "ns"),
>   	};
>   	static const struct {
>   		enum drm_i915_pmu_engine_sample sample;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index fa1f3d62f9a6..b66b7c1fd564 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -177,8 +177,9 @@ enum drm_i915_pmu_engine_sample {
>   #define I915_PMU_REQUESTED_FREQUENCY	__I915_PMU_OTHER(1)
>   #define I915_PMU_INTERRUPTS		__I915_PMU_OTHER(2)
>   #define I915_PMU_RC6_RESIDENCY		__I915_PMU_OTHER(3)
> +#define I915_PMU_BUSY_TIME		__I915_PMU_OTHER(4)

I'd be tempted to call this I915_PMU_GT_BUSY_TIME - or even better awake 
time?

>   
> -#define I915_PMU_LAST I915_PMU_RC6_RESIDENCY
> +#define I915_PMU_LAST I915_PMU_BUSY_TIME
>   
>   /* Each region is a minimum of 16k, and there are at most 255 of them.
>    */
> 

Code looks fine, but should we be adding a pmu counter?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show()
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show() Chris Wilson
@ 2020-11-17 12:51   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-17 12:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> Extract i915_request_show for reuse in other request chain pretty
> printers.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_engine_cs.c | 47 ++---------------------
>   drivers/gpu/drm/i915/gt/intel_lrc.c       |  2 +-
>   drivers/gpu/drm/i915/gt/intel_lrc.h       |  2 +-
>   drivers/gpu/drm/i915/i915_request.c       | 39 +++++++++++++++++++
>   drivers/gpu/drm/i915/i915_request.h       |  5 +++
>   5 files changed, 50 insertions(+), 45 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_engine_cs.c b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> index 1ed84ee8ce41..c3bb2e9546e6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_engine_cs.c
> @@ -1294,45 +1294,6 @@ bool intel_engine_can_store_dword(struct intel_engine_cs *engine)
>   	}
>   }
>   
> -static int print_sched_attr(const struct i915_sched_attr *attr,
> -			    char *buf, int x, int len)
> -{
> -	if (attr->priority == I915_PRIORITY_INVALID)
> -		return x;
> -
> -	x += snprintf(buf + x, len - x,
> -		      " prio=%d", attr->priority);
> -
> -	return x;
> -}
> -
> -static void print_request(struct drm_printer *m,
> -			  struct i915_request *rq,
> -			  const char *prefix)
> -{
> -	const char *name = rq->fence.ops->get_timeline_name(&rq->fence);
> -	char buf[80] = "";
> -	int x = 0;
> -
> -	x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf));
> -
> -	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",
> -		   prefix,
> -		   rq->fence.context, rq->fence.seqno,
> -		   i915_request_completed(rq) ? "!" :
> -		   i915_request_started(rq) ? "*" :
> -		   !i915_sw_fence_signaled(&rq->semaphore) ? "&" :
> -		   "",
> -		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> -			    &rq->fence.flags) ? "+" :
> -		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> -			    &rq->fence.flags) ? "-" :
> -		   "",
> -		   buf,
> -		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
> -		   name);
> -}
> -
>   static struct intel_timeline *get_timeline(struct i915_request *rq)
>   {
>   	struct intel_timeline *tl;
> @@ -1530,7 +1491,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>   					intel_context_is_banned(rq->context) ? "*" : "");
>   			len += print_ring(hdr + len, sizeof(hdr) - len, rq);
>   			scnprintf(hdr + len, sizeof(hdr) - len, "rq: ");
> -			print_request(m, rq, hdr);
> +			i915_request_show(m, rq, hdr);
>   		}
>   		for (port = execlists->pending; (rq = *port); port++) {
>   			char hdr[160];
> @@ -1544,7 +1505,7 @@ static void intel_engine_print_registers(struct intel_engine_cs *engine,
>   					intel_context_is_banned(rq->context) ? "*" : "");
>   			len += print_ring(hdr + len, sizeof(hdr) - len, rq);
>   			scnprintf(hdr + len, sizeof(hdr) - len, "rq: ");
> -			print_request(m, rq, hdr);
> +			i915_request_show(m, rq, hdr);
>   		}
>   		rcu_read_unlock();
>   		execlists_active_unlock_bh(execlists);
> @@ -1688,7 +1649,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	if (rq) {
>   		struct intel_timeline *tl = get_timeline(rq);
>   
> -		print_request(m, rq, "\t\tactive ");
> +		i915_request_show(m, rq, "\t\tactive ");
>   
>   		drm_printf(m, "\t\tring->start:  0x%08x\n",
>   			   i915_ggtt_offset(rq->ring->vma));
> @@ -1726,7 +1687,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   		drm_printf(m, "\tDevice is asleep; skipping register dump\n");
>   	}
>   
> -	intel_execlists_show_requests(engine, m, print_request, 8);
> +	intel_execlists_show_requests(engine, m, i915_request_show, 8);
>   
>   	drm_printf(m, "HWSP:\n");
>   	hexdump(m, engine->status_page.addr, PAGE_SIZE);
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 52b84474f93a..a4b8c20d12a9 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -5980,7 +5980,7 @@ int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
>   void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   				   struct drm_printer *m,
>   				   void (*show_request)(struct drm_printer *m,
> -							struct i915_request *rq,
> +							const struct i915_request *rq,
>   							const char *prefix),
>   				   unsigned int max)
>   {
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.h b/drivers/gpu/drm/i915/gt/intel_lrc.h
> index c2d287f25497..32e6e204f544 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.h
> @@ -106,7 +106,7 @@ void intel_lr_context_reset(struct intel_engine_cs *engine,
>   void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   				   struct drm_printer *m,
>   				   void (*show_request)(struct drm_printer *m,
> -							struct i915_request *rq,
> +							const struct i915_request *rq,
>   							const char *prefix),
>   				   unsigned int max);
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0e813819b041..cebe07a85625 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1855,6 +1855,45 @@ long i915_request_wait(struct i915_request *rq,
>   	return timeout;
>   }
>   
> +static int print_sched_attr(const struct i915_sched_attr *attr,
> +			    char *buf, int x, int len)
> +{
> +	if (attr->priority == I915_PRIORITY_INVALID)
> +		return x;
> +
> +	x += snprintf(buf + x, len - x,
> +		      " prio=%d", attr->priority);
> +
> +	return x;
> +}
> +
> +void i915_request_show(struct drm_printer *m,
> +		       const struct i915_request *rq,
> +		       const char *prefix)
> +{
> +	const char *name = rq->fence.ops->get_timeline_name((struct dma_fence *)&rq->fence);
> +	char buf[80] = "";
> +	int x = 0;
> +
> +	x = print_sched_attr(&rq->sched.attr, buf, x, sizeof(buf));
> +
> +	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",

$ grep "llx:%lld" . -r | wc -l
20
$ grep "llx:%llx" . -r | wc -l
3

Bonus points if you make the seqno lld making it 21 to 2, one step 
closer to bliss. ;)

> +		   prefix,
> +		   rq->fence.context, rq->fence.seqno,
> +		   i915_request_completed(rq) ? "!" :
> +		   i915_request_started(rq) ? "*" :
> +		   !i915_sw_fence_signaled(&rq->semaphore) ? "&" :
> +		   "",
> +		   test_bit(DMA_FENCE_FLAG_SIGNALED_BIT,
> +			    &rq->fence.flags) ? "+" :
> +		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
> +			    &rq->fence.flags) ? "-" :
> +		   "",
> +		   buf,
> +		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
> +		   name);
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "selftests/mock_request.c"
>   #include "selftests/i915_request.c"
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 16b721080195..09609071b725 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -43,6 +43,7 @@
>   
>   struct drm_file;
>   struct drm_i915_gem_object;
> +struct drm_printer;
>   struct i915_request;
>   
>   struct i915_capture_list {
> @@ -369,6 +370,10 @@ long i915_request_wait(struct i915_request *rq,
>   #define I915_WAIT_PRIORITY	BIT(1) /* small priority bump for the request */
>   #define I915_WAIT_ALL		BIT(2) /* used by i915_gem_object_wait() */
>   
> +void i915_request_show(struct drm_printer *m,
> +		       const struct i915_request *rq,
> +		       const char *prefix);
> +
>   static inline bool i915_request_signaled(const struct i915_request *rq)
>   {
>   	/* The request may live longer than its HWSP, so check flags first! */
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging Chris Wilson
@ 2020-11-17 12:59   ` Tvrtko Ursulin
  2020-11-17 13:25     ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-17 12:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> Include the active timelines for debugfs/i915_engine_info, so that we
> can see which have unready requests inflight which are not shown
> otherwise.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_timeline.c | 79 ++++++++++++++++++++++++
>   drivers/gpu/drm/i915/gt/intel_timeline.h |  8 +++
>   drivers/gpu/drm/i915/i915_debugfs.c      | 18 +++---
>   3 files changed, 97 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.c b/drivers/gpu/drm/i915/gt/intel_timeline.c
> index 7ea94d201fe6..2b4ed4b2b67c 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.c
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.c
> @@ -617,6 +617,85 @@ void intel_gt_fini_timelines(struct intel_gt *gt)
>   	GEM_BUG_ON(!list_empty(&timelines->hwsp_free_list));
>   }
>   
> +void intel_gt_show_timelines(struct intel_gt *gt,
> +			     struct drm_printer *m,
> +			     void (*show_request)(struct drm_printer *m,
> +						  const struct i915_request *rq,
> +						  const char *prefix))
> +{
> +	struct intel_gt_timelines *timelines = &gt->timelines;
> +	struct intel_timeline *tl, *tn;
> +	LIST_HEAD(free);
> +
> +	spin_lock(&timelines->lock);
> +	list_for_each_entry_safe(tl, tn, &timelines->active_list, link) {
> +		unsigned long count, ready, inflight;
> +		struct i915_request *rq, *rn;
> +		struct dma_fence *fence;
> +
> +		if (!mutex_trylock(&tl->mutex))
> +			continue;

I suggest to print a marker like "Timeline %llx: busy" or so to avoid 
confusion.

> +
> +		intel_timeline_get(tl);
> +		GEM_BUG_ON(!atomic_read(&tl->active_count));
> +		atomic_inc(&tl->active_count); /* pin the list element */
> +		spin_unlock(&timelines->lock);
> +
> +		count = 0;
> +		ready = 0;
> +		inflight = 0;
> +		list_for_each_entry_safe(rq, rn, &tl->requests, link) {
> +			if (i915_request_completed(rq))
> +				continue;
> +
> +			count++;
> +			if (i915_request_is_ready(rq))
> +				ready++;
> +			if (i915_request_is_active(rq))
> +				inflight++;
> +		}
> +
> +		drm_printf(m, "Timeline %llx: { ", tl->fence_context);
> +		drm_printf(m, "count %lu, ready: %lu, inflight: %lu",
> +			   count, ready, inflight);
> +		drm_printf(m, ", seqno: { current: %d, last: %d }",
> +			   *tl->hwsp_seqno, tl->seqno);
> +		fence = i915_active_fence_get(&tl->last_request);
> +		if (fence) {
> +			drm_printf(m, ", engine: %s",
> +				   to_request(fence)->engine->name);
> +			dma_fence_put(fence);
> +		}
> +		drm_printf(m, " }\n");
> +
> +		if (show_request) {
> +			list_for_each_entry_safe(rq, rn, &tl->requests, link)
> +				show_request(m, rq,
> +					     i915_request_is_active(rq) ? "  E" :
> +					     i915_request_is_ready(rq) ? "  Q" :
> +					     "  U");

Can we get some consistency between the category counts and flags.

s/count/queued/ -> Q
ready -> R (also matches with term runnable)
active -> E ? hmmm E is consistent with the engine info dump.

Not ideal but perhaps every bit of more consistency is good.

> +		}
> +
> +		mutex_unlock(&tl->mutex);
> +		spin_lock(&timelines->lock);
> +
> +		/* Resume list iteration after reacquiring spinlock */
> +		list_safe_reset_next(tl, tn, link);
> +		if (atomic_dec_and_test(&tl->active_count))
> +			list_del(&tl->link);
> +
> +		/* Defer the final release to after the spinlock */
> +		if (refcount_dec_and_test(&tl->kref.refcount)) {
> +			GEM_BUG_ON(atomic_read(&tl->active_count));
> +			list_add(&tl->link, &free);
> +		}
> +	}
> +	spin_unlock(&timelines->lock);
> +
> +	list_for_each_entry_safe(tl, tn, &free, link)
> +		__intel_timeline_free(&tl->kref);
> +}
> +
>   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
>   #include "gt/selftests/mock_timeline.c"
>   #include "gt/selftest_timeline.c"
> diff --git a/drivers/gpu/drm/i915/gt/intel_timeline.h b/drivers/gpu/drm/i915/gt/intel_timeline.h
> index 9882cd911d8e..9b88f220be2b 100644
> --- a/drivers/gpu/drm/i915/gt/intel_timeline.h
> +++ b/drivers/gpu/drm/i915/gt/intel_timeline.h
> @@ -31,6 +31,8 @@
>   #include "i915_syncmap.h"
>   #include "intel_timeline_types.h"
>   
> +struct drm_printer;
> +
>   struct intel_timeline *
>   __intel_timeline_create(struct intel_gt *gt,
>   			struct i915_vma *global_hwsp,
> @@ -106,4 +108,10 @@ int intel_timeline_read_hwsp(struct i915_request *from,
>   void intel_gt_init_timelines(struct intel_gt *gt);
>   void intel_gt_fini_timelines(struct intel_gt *gt);
>   
> +void intel_gt_show_timelines(struct intel_gt *gt,
> +			     struct drm_printer *m,
> +			     void (*show_request)(struct drm_printer *m,
> +						  const struct i915_request *rq,
> +						  const char *prefix));
> +
>   #endif
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 337293c7bb7d..498c82dcc7e9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1306,26 +1306,28 @@ static int i915_runtime_pm_status(struct seq_file *m, void *unused)
>   
>   static int i915_engine_info(struct seq_file *m, void *unused)
>   {
> -	struct drm_i915_private *dev_priv = node_to_i915(m->private);
> +	struct drm_i915_private *i915 = node_to_i915(m->private);
>   	struct intel_engine_cs *engine;
>   	intel_wakeref_t wakeref;
>   	struct drm_printer p;
>   
> -	wakeref = intel_runtime_pm_get(&dev_priv->runtime_pm);
> +	wakeref = intel_runtime_pm_get(&i915->runtime_pm);
>   
>   	seq_printf(m, "GT awake? %s [%d]\n",
> -		   yesno(dev_priv->gt.awake),
> -		   atomic_read(&dev_priv->gt.wakeref.count));
> +		   yesno(i915->gt.awake),
> +		   atomic_read(&i915->gt.wakeref.count));
>   	seq_printf(m, "GT busy: %llu ms\n",
> -		   ktime_to_ms(intel_gt_get_busy_time(&dev_priv->gt)));
> +		   ktime_to_ms(intel_gt_get_busy_time(&i915->gt)));
>   	seq_printf(m, "CS timestamp frequency: %u Hz\n",
> -		   RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_hz);
> +		   RUNTIME_INFO(i915)->cs_timestamp_frequency_hz);
>   
>   	p = drm_seq_file_printer(m);
> -	for_each_uabi_engine(engine, dev_priv)
> +	for_each_uabi_engine(engine, i915)
>   		intel_engine_dump(engine, &p, "%s\n", engine->name);
>   
> -	intel_runtime_pm_put(&dev_priv->runtime_pm, wakeref);
> +	intel_gt_show_timelines(&i915->gt, &p, NULL);
> +
> +	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
>   
>   	return 0;
>   }
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
@ 2020-11-17 13:00   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-17 13:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> Lift the list iteration defines for traversing the signaler/waiter lists
> into i915_scheduler.h for reuse.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c         | 10 ----------
>   drivers/gpu/drm/i915/i915_scheduler_types.h | 10 ++++++++++
>   2 files changed, 10 insertions(+), 10 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index a4b8c20d12a9..17cb7060eb29 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -1836,16 +1836,6 @@ static void virtual_xfer_context(struct virtual_engine *ve,
>   	}
>   }
>   
> -#define for_each_waiter(p__, rq__) \
> -	list_for_each_entry_lockless(p__, \
> -				     &(rq__)->sched.waiters_list, \
> -				     wait_link)
> -
> -#define for_each_signaler(p__, rq__) \
> -	list_for_each_entry_rcu(p__, \
> -				&(rq__)->sched.signalers_list, \
> -				signal_link)
> -
>   static void defer_request(struct i915_request *rq, struct list_head * const pl)
>   {
>   	LIST_HEAD(list);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
> index f72e6c397b08..343ed44d5ed4 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler_types.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
> @@ -81,4 +81,14 @@ struct i915_dependency {
>   #define I915_DEPENDENCY_WEAK		BIT(2)
>   };
>   
> +#define for_each_waiter(p__, rq__) \
> +	list_for_each_entry_lockless(p__, \
> +				     &(rq__)->sched.waiters_list, \
> +				     wait_link)
> +
> +#define for_each_signaler(p__, rq__) \
> +	list_for_each_entry_rcu(p__, \
> +				&(rq__)->sched.signalers_list, \
> +				signal_link)
> +
>   #endif /* _I915_SCHEDULER_TYPES_H_ */
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time
  2020-11-17 12:44   ` Tvrtko Ursulin
@ 2020-11-17 13:05     ` Chris Wilson
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 13:05 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-17 12:44:00)
> 
> On 17/11/2020 11:30, Chris Wilson wrote:
> > Since we wake the GT up before executing a request, and go to sleep as
> > soon as it is retired, the GT wake time not only represents how long the
> > device is powered up, but also provides a summary, albeit an overestimate,
> > of the device runtime.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.c    | 45 ++++++++++++++++++++++++
> >   drivers/gpu/drm/i915/gt/intel_gt_pm.h    |  2 ++
> >   drivers/gpu/drm/i915/gt/intel_gt_types.h | 24 +++++++++++++
> >   drivers/gpu/drm/i915/i915_debugfs.c      |  2 ++
> >   drivers/gpu/drm/i915/i915_pmu.c          |  6 ++++
> >   include/uapi/drm/i915_drm.h              |  3 +-
> >   6 files changed, 81 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.c b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > index 274aa0dd7050..dd2f88bed65a 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.c
> > @@ -39,6 +39,24 @@ static void user_forcewake(struct intel_gt *gt, bool suspend)
> >       intel_gt_pm_put(gt);
> >   }
> >   
> > +static void runtime_begin(struct intel_gt *gt)
> > +{
> > +     write_seqcount_begin(&gt->stats.lock);
> > +     gt->stats.start = ktime_get();
> > +     gt->stats.active = true;
> > +     write_seqcount_end(&gt->stats.lock);
> > +}
> > +
> > +static void runtime_end(struct intel_gt *gt)
> > +{
> > +     write_seqcount_begin(&gt->stats.lock);
> > +     gt->stats.active = false;
> > +     gt->stats.total =
> > +             ktime_add(gt->stats.total,
> > +                       ktime_sub(ktime_get(), gt->stats.start));
> > +     write_seqcount_end(&gt->stats.lock);
> > +}
> > +
> >   static int __gt_unpark(struct intel_wakeref *wf)
> >   {
> >       struct intel_gt *gt = container_of(wf, typeof(*gt), wakeref);
> > @@ -67,6 +85,7 @@ static int __gt_unpark(struct intel_wakeref *wf)
> >       i915_pmu_gt_unparked(i915);
> >   
> >       intel_gt_unpark_requests(gt);
> > +     runtime_begin(gt);
> >   
> >       return 0;
> >   }
> > @@ -79,6 +98,7 @@ static int __gt_park(struct intel_wakeref *wf)
> >   
> >       GT_TRACE(gt, "\n");
> >   
> > +     runtime_end(gt);
> >       intel_gt_park_requests(gt);
> >   
> >       i915_vma_parked(gt);
> > @@ -106,6 +126,7 @@ static const struct intel_wakeref_ops wf_ops = {
> >   void intel_gt_pm_init_early(struct intel_gt *gt)
> >   {
> >       intel_wakeref_init(&gt->wakeref, gt->uncore->rpm, &wf_ops);
> > +     seqcount_mutex_init(&gt->stats.lock, &gt->wakeref.mutex);
> >   }
> >   
> >   void intel_gt_pm_init(struct intel_gt *gt)
> > @@ -339,6 +360,30 @@ int intel_gt_runtime_resume(struct intel_gt *gt)
> >       return intel_uc_runtime_resume(&gt->uc);
> >   }
> >   
> > +static ktime_t __intel_gt_get_busy_time(const struct intel_gt *gt)
> > +{
> > +     ktime_t total = gt->stats.total;
> > +
> > +     if (gt->stats.active)
> > +             total = ktime_add(total,
> > +                               ktime_sub(ktime_get(), gt->stats.start));
> > +
> > +     return total;
> > +}
> > +
> > +ktime_t intel_gt_get_busy_time(const struct intel_gt *gt)
> > +{
> > +     unsigned int seq;
> > +     ktime_t total;
> > +
> > +     do {
> > +             seq = read_seqcount_begin(&gt->stats.lock);
> 
> Any specific reasons for read_seqcount_being vs read_seqbegin etc, the 
> latter being used in engine stats seems to have some kcsan integration 
> and former a bit more low level.

The brand spanking new seqlock.h offers a lot more variety in lockdep
analysis, in particular we declare these as being guarded by the
wakeref.mutex (see seqlock_mutex_init()). read_seqcount_being() is not
as low-level as it once was, and now includes the lockdep and kcsan
checks (and is quite picky about being init'ed correctly.)

> > +             total = __intel_gt_get_busy_time(gt);
> > +     } while (read_seqcount_retry(&gt->stats.lock, seq));
> > +
> > +     return total;
> > +}
> 
> I wish there was an easy way to extract some sort of struct stats 
> between engine stats and this and have common helpers, but sadly structs 
> are not exactly identical.

Nor do they get used from exactly the same locking contexts.

And context-stats are very similar but different again. :|

> >   #if IS_ENABLED(CONFIG_DRM_I915_SELFTEST)
> >   #include "selftest_gt_pm.c"
> >   #endif
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_pm.h b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> > index 60f0e2fbe55c..aa8f2cda946b 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_pm.h
> > @@ -58,6 +58,8 @@ int intel_gt_resume(struct intel_gt *gt);
> >   void intel_gt_runtime_suspend(struct intel_gt *gt);
> >   int intel_gt_runtime_resume(struct intel_gt *gt);
> >   
> > +ktime_t intel_gt_get_busy_time(const struct intel_gt *gt);
> > +
> >   static inline bool is_mock_gt(const struct intel_gt *gt)
> >   {
> >       return I915_SELFTEST_ONLY(gt->awake == -ENODEV);
> > diff --git a/drivers/gpu/drm/i915/gt/intel_gt_types.h b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> > index 6d39a4a11bf3..c7bde529feab 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_gt_types.h
> > +++ b/drivers/gpu/drm/i915/gt/intel_gt_types.h
> > @@ -87,6 +87,30 @@ struct intel_gt {
> >   
> >       u32 pm_guc_events;
> >   
> > +     struct {
> > +             bool active;
> > +
> > +             /**
> > +              * @lock: Lock protecting the below fields.
> > +              */
> > +             seqcount_mutex_t lock;
> > +
> > +             /**
> > +              * @total: Total time this engine was busy.
> > +              *
> > +              * Accumulated time not counting the most recent block in cases
> > +              * where engine is currently busy (active > 0).
> > +              */
> > +             ktime_t total;
> > +
> > +             /**
> > +              * @start: Timestamp of the last idle to active transition.
> > +              *
> > +              * Idle is defined as active == 0, active is active > 0.
> > +              */
> > +             ktime_t start;
> > +     } stats;
> > +
> >       struct intel_engine_cs *engine[I915_NUM_ENGINES];
> >       struct intel_engine_cs *engine_class[MAX_ENGINE_CLASS + 1]
> >                                           [MAX_ENGINE_INSTANCE + 1];
> > diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> > index 77e76b665098..337293c7bb7d 100644
> > --- a/drivers/gpu/drm/i915/i915_debugfs.c
> > +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> > @@ -1316,6 +1316,8 @@ static int i915_engine_info(struct seq_file *m, void *unused)
> >       seq_printf(m, "GT awake? %s [%d]\n",
> >                  yesno(dev_priv->gt.awake),
> >                  atomic_read(&dev_priv->gt.wakeref.count));
> > +     seq_printf(m, "GT busy: %llu ms\n",
> > +                ktime_to_ms(intel_gt_get_busy_time(&dev_priv->gt)));
> 
> Would it be worth putting something in debugfs_gt_pm.c as well?

Closest we have at the moment is rps_boost_show, and that's a bit stale.

I was expecting we would have another engines under gt/; maybe one day.

> 
> >       seq_printf(m, "CS timestamp frequency: %u Hz\n",
> >                  RUNTIME_INFO(dev_priv)->cs_timestamp_frequency_hz);
> >   
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index cd786ad12be7..36fc60cf5725 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -488,6 +488,8 @@ config_status(struct drm_i915_private *i915, u64 config)
> >               if (!HAS_RC6(i915))
> >                       return -ENODEV;
> >               break;
> > +     case I915_PMU_BUSY_TIME:
> > +             break;
> >       default:
> >               return -ENOENT;
> >       }
> > @@ -595,6 +597,9 @@ static u64 __i915_pmu_event_read(struct perf_event *event)
> >               case I915_PMU_RC6_RESIDENCY:
> >                       val = get_rc6(&i915->gt);
> >                       break;
> > +             case I915_PMU_BUSY_TIME:
> > +                     val = ktime_to_ns(intel_gt_get_busy_time(&i915->gt));
> > +                     break;
> >               }
> >       }
> >   
> > @@ -898,6 +903,7 @@ create_event_attributes(struct i915_pmu *pmu)
> >               __event(I915_PMU_REQUESTED_FREQUENCY, "requested-frequency", "M"),
> >               __event(I915_PMU_INTERRUPTS, "interrupts", NULL),
> >               __event(I915_PMU_RC6_RESIDENCY, "rc6-residency", "ns"),
> > +             __event(I915_PMU_BUSY_TIME, "busy-time", "ns"),
> >       };
> >       static const struct {
> >               enum drm_i915_pmu_engine_sample sample;
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index fa1f3d62f9a6..b66b7c1fd564 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -177,8 +177,9 @@ enum drm_i915_pmu_engine_sample {
> >   #define I915_PMU_REQUESTED_FREQUENCY        __I915_PMU_OTHER(1)
> >   #define I915_PMU_INTERRUPTS         __I915_PMU_OTHER(2)
> >   #define I915_PMU_RC6_RESIDENCY              __I915_PMU_OTHER(3)
> > +#define I915_PMU_BUSY_TIME           __I915_PMU_OTHER(4)
> 
> I'd be tempted to call this I915_PMU_GT_BUSY_TIME - or even better awake 
> time?

awake is better than busy, but this is the same level as RC6_RESIDENCY,
so I was following its trend.

> > -#define I915_PMU_LAST I915_PMU_RC6_RESIDENCY
> > +#define I915_PMU_LAST I915_PMU_BUSY_TIME
> >   
> >   /* Each region is a minimum of 16k, and there are at most 255 of them.
> >    */
> > 
> 
> Code looks fine, but should we be adding a pmu counter?

It's the inverse of rc6 residency; so I can already put it to use.

Yes, it's useful.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug Chris Wilson
@ 2020-11-17 13:06   ` Tvrtko Ursulin
  2020-11-17 13:30     ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-17 13:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Include the signalers each request in the timeline is waiting on, as a
> means to try and identify the cause of a stall. This can be quite
> verbose, even as for now we only show each request in the timeline and
> its immediate antecedents.
> 
> This generates output like:
> 
> Timeline 886: { count 1, ready: 0, inflight: 0, seqno: { current: 664, last: 666 }, engine: rcs0 }

Applies to earlier patch:

I am still tempted to suggest replacing "current: %d, last: %d" with 
"seqno: %d/%d" for compactness and which is still completely intuitive 
to me.

And maybe instead of "engine: %s" just append the engine name direct as tag.

But up to you.

>    U 886:29a-  prio=0 @ 134ms: gem_exec_parall<4621>
>    - U bc1:27a-  prio=0 @ 134ms: gem_exec_parall[4917]
> Timeline 825: { count 1, ready: 0, inflight: 0, seqno: { current: 802, last: 804 }, engine: vcs0 }
>    U 825:324  prio=0 @ 107ms: gem_exec_parall<4518>
>    - U b75:140-  prio=0 @ 110ms: gem_exec_parall<5486>
> Timeline b46: { count 1, ready: 0, inflight: 0, seqno: { current: 782, last: 784 }, engine: vcs0 }
>    U b46:310-  prio=0 @ 70ms: gem_exec_parall<5428>
>    - U c11:170-  prio=0 @ 70ms: gem_exec_parall[5501]
> Timeline 96b: { count 1, ready: 0, inflight: 0, seqno: { current: 632, last: 634 }, engine: vcs0 }
>    U 96b:27a-  prio=0 @ 67ms: gem_exec_parall<4878>
>    - U b75:19e-  prio=0 @ 67ms: gem_exec_parall<5486>
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c   |  3 ++-
>   drivers/gpu/drm/i915/i915_scheduler.c | 31 +++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_scheduler.h |  6 ++++++
>   3 files changed, 39 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 498c82dcc7e9..f6e71119891f 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -45,6 +45,7 @@
>   #include "i915_debugfs.h"
>   #include "i915_debugfs_params.h"
>   #include "i915_irq.h"
> +#include "i915_scheduler.h"
>   #include "i915_trace.h"
>   #include "intel_pm.h"
>   #include "intel_sideband.h"
> @@ -1325,7 +1326,7 @@ static int i915_engine_info(struct seq_file *m, void *unused)
>   	for_each_uabi_engine(engine, i915)
>   		intel_engine_dump(engine, &p, "%s\n", engine->name);
>   
> -	intel_gt_show_timelines(&i915->gt, &p, NULL);
> +	intel_gt_show_timelines(&i915->gt, &p, i915_request_show_with_schedule);
>   
>   	intel_runtime_pm_put(&i915->runtime_pm, wakeref);
>   
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index cbb880b10c65..8837ba672933 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -504,6 +504,37 @@ void i915_sched_node_fini(struct i915_sched_node *node)
>   	spin_unlock_irq(&schedule_lock);
>   }
>   
> +void i915_request_show_with_schedule(struct drm_printer *m,
> +				     const struct i915_request *rq,
> +				     const char *prefix)
> +{
> +	struct i915_dependency *dep;
> +
> +	i915_request_show(m, rq, prefix);
> +	if (i915_request_completed(rq))
> +		return;
> +
> +	rcu_read_lock();
> +	for_each_signaler(dep, rq) {
> +		const struct i915_request *signaler =
> +			node_to_request(dep->signaler);
> +
> +		/* Dependencies along the same timeline are expected. */
> +		if (signaler->timeline == rq->timeline)
> +			continue;
> +
> +		if (i915_request_completed(signaler))
> +			continue;
> +
> +		/* XXX ideally build indent into prefix */
> +		i915_request_show(m, signaler,
> +				  i915_request_is_active(signaler) ? "  - E" :
> +				  i915_request_is_ready(signaler) ? "  - Q" :
> +				  "  - U");

This we will see what we agree on in the previous patch.

> +	}
> +	rcu_read_unlock();
> +}
> +
>   static void i915_global_scheduler_shrink(void)
>   {
>   	kmem_cache_shrink(global.slab_dependencies);
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 6f0bf00fc569..34cee9a17801 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -13,6 +13,8 @@
>   
>   #include "i915_scheduler_types.h"
>   
> +struct drm_printer;
> +
>   #define priolist_for_each_request(it, plist, idx) \
>   	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
>   		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> @@ -54,4 +56,8 @@ static inline void i915_priolist_free(struct i915_priolist *p)
>   		__i915_priolist_free(p);
>   }
>   
> +void i915_request_show_with_schedule(struct drm_printer *m,
> +				     const struct i915_request *rq,
> +				     const char *prefix);
> +
>   #endif /* _I915_SCHEDULER_H_ */
> 

Overall looks good to me.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging
  2020-11-17 12:59   ` Tvrtko Ursulin
@ 2020-11-17 13:25     ` Chris Wilson
  2020-11-18 15:51       ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 13:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-17 12:59:44)
> 
> On 17/11/2020 11:30, Chris Wilson wrote:
> > +             if (show_request) {
> > +                     list_for_each_entry_safe(rq, rn, &tl->requests, link)
> > +                             show_request(m, rq,
> > +                                          i915_request_is_active(rq) ? "  E" :
> > +                                          i915_request_is_ready(rq) ? "  Q" :
> > +                                          "  U");
> 
> Can we get some consistency between the category counts and flags.
> 
> s/count/queued/ -> Q

Hmm, if you are sure. Q would then not match with the engine info.

Still favouring count over queued; I think count indicates more clearly
that it is the superset, but queued may imply it excludes ready and
definitely sounds like it should not include inflight.

> ready -> R (also matches with term runnable)
> active -> E ? hmmm E is consistent with the engine info dump.
> 
> Not ideal but perhaps every bit of more consistency is good.

Not sold yet, but not happy with the current flags either.

If we go with 'R' for ready, we should also update engine info.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug
  2020-11-17 13:06   ` Tvrtko Ursulin
@ 2020-11-17 13:30     ` Chris Wilson
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-17 13:30 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-17 13:06:22)
> 
> On 17/11/2020 11:30, Chris Wilson wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Include the signalers each request in the timeline is waiting on, as a
> > means to try and identify the cause of a stall. This can be quite
> > verbose, even as for now we only show each request in the timeline and
> > its immediate antecedents.
> > 
> > This generates output like:
> > 
> > Timeline 886: { count 1, ready: 0, inflight: 0, seqno: { current: 664, last: 666 }, engine: rcs0 }
> 
> Applies to earlier patch:
> 
> I am still tempted to suggest replacing "current: %d, last: %d" with 
> "seqno: %d/%d" for compactness and which is still completely intuitive 
> to me.
> 
> And maybe instead of "engine: %s" just append the engine name direct as tag.
> 
> But up to you.

I trying to indoctrinate myself into using yaml for info dumps. With the
end goal being all the major files being yaml

> >    U 886:29a-  prio=0 @ 134ms: gem_exec_parall<4621>
> >    - U bc1:27a-  prio=0 @ 134ms: gem_exec_parall[4917]

And this admittedly is not yaml. A long way to go before it is second
nature.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (26 preceding siblings ...)
  2020-11-17 11:31 ` [Intel-gfx] [PATCH 28/28] drm/i915/gt: Simplify virtual engine handling for execlists_hold() Chris Wilson
@ 2020-11-17 18:54 ` Patchwork
  2020-11-17 18:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  30 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-11-17 18:54 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
URL   : https://patchwork.freedesktop.org/series/83951/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
ae32abe5ed3c drm/i915/selftests: Improve granularity for mocs reset checks
d305f0a2eb8d drm/i915/selftests: Small tweak to put the termination conditions together
b441a9d6dec3 drm/i915/gem: Drop free_work for GEM contexts
3d97fa02b2bd drm/i915/gt: Ignore dt==0 for reporting underflows
f6559ba4c7c5 drm/i915/gt: Track the overall busy time
40da0fecec81 drm/i915/gt: Include semaphore status in print_request()
8e85dabc534e drm/i915: Lift i915_request_show()
38470cc9596c drm/i915/gt: Show all active timelines for debugging
471c4d05edc2 drm/i915: Lift waiter/signaler iterators
45a01d0bdc89 drm/i915: Show timeline dependencies for debug
-:13: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#13: 
Timeline 886: { count 1, ready: 0, inflight: 0, seqno: { current: 664, last: 666 }, engine: rcs0 }

total: 0 errors, 1 warnings, 0 checks, 68 lines checked
5ed9a448fb02 drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
49b7cb3fb2de drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
b9ab5ba2ea2d drm/i915/gt: Don't cancel the interrupt shadow too early
edd68892ac79 drm/i915/gt: Free stale request on destroying the virtual engine
25ee818fe9c7 drm/i915/gt: Protect context lifetime with RCU
7ef7a16bb08c drm/i915/gt: Split the breadcrumb spinlock between global and contexts
-:21: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#21: 
<4>[  416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870).

total: 0 errors, 1 warnings, 0 checks, 354 lines checked
a9d88246410b drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel
299dfa2c30b7 drm/i915/gt: Decouple completed requests on unwind
80ee04c69c11 drm/i915/gt: Check for a completed last request once
c43eabedf48d drm/i915/gt: Replace direct submit with direct call to tasklet
414c9d2c58db drm/i915/gt: ce->inflight updates are now serialised
14776ee687ac drm/i915/gt: Use virtual_engine during execlists_dequeue
3af258c2b1e2 drm/i915/gt: Decouple inflight virtual engines
56b37beddaa4 drm/i915/gt: Defer schedule_out until after the next dequeue
3ed0a652501e drm/i915/gt: Remove virtual breadcrumb before transfer
063c2759bd5c drm/i915/gt: Shrink the critical section for irq signaling
5440ca67d640 drm/i915/gt: Resubmit the virtual engine on schedule-out
239e47e2f688 drm/i915/gt: Simplify virtual engine handling for execlists_hold()


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✗ Fi.CI.SPARSE: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (27 preceding siblings ...)
  2020-11-17 18:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks Patchwork
@ 2020-11-17 18:56 ` Patchwork
  2020-11-17 19:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
  2020-11-17 22:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
  30 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-11-17 18:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
URL   : https://patchwork.freedesktop.org/series/83951/
State : warning

== Summary ==

$ dim sparse --fast origin/drm-tip
Sparse version: v0.6.2
Fast mode used, each commit won't be checked separately.
-
+drivers/gpu/drm/i915/gt/intel_reset.c:1328:5: warning: context imbalance in 'intel_gt_reset_trylock' - different lock contexts for basic block
+drivers/gpu/drm/i915/gt/selftest_reset.c:100:20:    expected void *in
+drivers/gpu/drm/i915/gt/selftest_reset.c:100:20:    got void [noderef] __iomem *[assigned] s
+drivers/gpu/drm/i915/gt/selftest_reset.c:100:20: warning: incorrect type in assignment (different address spaces)
+drivers/gpu/drm/i915/gt/selftest_reset.c:101:46:    expected void const *src
+drivers/gpu/drm/i915/gt/selftest_reset.c:101:46:    got void [noderef] __iomem *[assigned] s
+drivers/gpu/drm/i915/gt/selftest_reset.c:101:46: warning: incorrect type in argument 2 (different address spaces)
+drivers/gpu/drm/i915/gt/selftest_reset.c:136:20:    expected void *in
+drivers/gpu/drm/i915/gt/selftest_reset.c:136:20:    got void [noderef] __iomem *[assigned] s
+drivers/gpu/drm/i915/gt/selftest_reset.c:136:20: warning: incorrect type in assignment (different address spaces)
+drivers/gpu/drm/i915/gt/selftest_reset.c:137:46:    expected void const *src
+drivers/gpu/drm/i915/gt/selftest_reset.c:137:46:    got void [noderef] __iomem *[assigned] s
+drivers/gpu/drm/i915/gt/selftest_reset.c:137:46: warning: incorrect type in argument 2 (different address spaces)
+drivers/gpu/drm/i915/gt/selftest_reset.c:98:34:    expected unsigned int [usertype] *s
+drivers/gpu/drm/i915/gt/selftest_reset.c:98:34:    got void [noderef] __iomem *[assigned] s
+drivers/gpu/drm/i915/gt/selftest_reset.c:98:34: warning: incorrect type in argument 1 (different address spaces)
+drivers/gpu/drm/i915/gvt/mmio.c:290:23: warning: memcpy with byte count of 279040
+drivers/gpu/drm/i915/i915_perf.c:1442:15: warning: memset with byte count of 16777216
+drivers/gpu/drm/i915/i915_perf.c:1496:15: warning: memset with byte count of 16777216
+./include/linux/seqlock.h:838:24: warning: trying to copy expression type 31
+./include/linux/seqlock.h:838:24: warning: trying to copy expression type 31
+./include/linux/seqlock.h:864:16: warning: trying to copy expression type 31
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen11_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen12_fwtable_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read64' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_read8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen6_write8' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write16' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write32' - different lock contexts for basic block
+./include/linux/spinlock.h:409:9: warning: context imbalance in 'gen8_write8' - different lock contexts for basic block


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✓ Fi.CI.BAT: success for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (28 preceding siblings ...)
  2020-11-17 18:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
@ 2020-11-17 19:24 ` Patchwork
  2020-11-17 22:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork
  30 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-11-17 19:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 6946 bytes --]

== Series Details ==

Series: series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
URL   : https://patchwork.freedesktop.org/series/83951/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_9348 -> Patchwork_18920
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/index.html

New tests
---------

  New tests have been introduced between CI_DRM_9348 and Patchwork_18920:

### New CI tests (1) ###

  * boot:
    - Statuses : 41 pass(s)
    - Exec time: [0.0] s

  

Known issues
------------

  Here are the changes found in Patchwork_18920 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_module_load@reload:
    - fi-tgl-u2:          [PASS][1] -> [DMESG-WARN][2] ([i915#1982] / [k.org#205379])
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-tgl-u2/igt@i915_module_load@reload.html
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-tgl-u2/igt@i915_module_load@reload.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-byt-j1900:       [PASS][3] -> [DMESG-WARN][4] ([i915#1982]) +1 similar issue
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-byt-j1900/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-byt-j1900/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  * igt@kms_cursor_legacy@basic-flip-before-cursor-atomic:
    - fi-icl-u2:          [PASS][5] -> [DMESG-WARN][6] ([i915#1982]) +1 similar issue
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-icl-u2/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-icl-u2/igt@kms_cursor_legacy@basic-flip-before-cursor-atomic.html

  
#### Possible fixes ####

  * igt@core_hotunplug@unbind-rebind:
    - fi-icl-u2:          [DMESG-WARN][7] ([i915#1982]) -> [PASS][8] +1 similar issue
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-icl-u2/igt@core_hotunplug@unbind-rebind.html
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-icl-u2/igt@core_hotunplug@unbind-rebind.html

  * igt@i915_module_load@reload:
    - fi-byt-j1900:       [DMESG-WARN][9] ([i915#1982]) -> [PASS][10]
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-byt-j1900/igt@i915_module_load@reload.html
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-byt-j1900/igt@i915_module_load@reload.html

  * igt@kms_busy@basic@flip:
    - {fi-kbl-7560u}:     [DMESG-WARN][11] ([i915#1982]) -> [PASS][12]
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-kbl-7560u/igt@kms_busy@basic@flip.html
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-kbl-7560u/igt@kms_busy@basic@flip.html

  * igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic:
    - fi-bsw-kefka:       [DMESG-WARN][13] ([i915#1982]) -> [PASS][14]
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-bsw-kefka/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
    - fi-apl-guc:         [DMESG-WARN][15] ([i915#1635] / [i915#1982]) -> [PASS][16]
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/fi-apl-guc/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/fi-apl-guc/igt@kms_cursor_legacy@basic-busy-flip-before-cursor-atomic.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1759]: https://gitlab.freedesktop.org/drm/intel/issues/1759
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2524]: https://gitlab.freedesktop.org/drm/intel/issues/2524
  [k.org#205379]: https://bugzilla.kernel.org/show_bug.cgi?id=205379


Participating hosts (42 -> 41)
------------------------------

  Additional (3): fi-kbl-soraka fi-blb-e6850 fi-tgl-y 
  Missing    (4): fi-ilk-m540 fi-bsw-cyan fi-bdw-samus fi-hsw-4200u 


Build changes
-------------

  * Linux: CI_DRM_9348 -> Patchwork_18920

  CI-20190529: 20190529
  CI_DRM_9348: 8dc6152ada31b8d128be55f1dacc13c2b5b61666 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_5855: d9b3c7058efe41e5224dd1e43fac05dc6d049380 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_18920: 239e47e2f68899ae2922c4321ddf8e9983452d96 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

239e47e2f688 drm/i915/gt: Simplify virtual engine handling for execlists_hold()
5440ca67d640 drm/i915/gt: Resubmit the virtual engine on schedule-out
063c2759bd5c drm/i915/gt: Shrink the critical section for irq signaling
3ed0a652501e drm/i915/gt: Remove virtual breadcrumb before transfer
56b37beddaa4 drm/i915/gt: Defer schedule_out until after the next dequeue
3af258c2b1e2 drm/i915/gt: Decouple inflight virtual engines
14776ee687ac drm/i915/gt: Use virtual_engine during execlists_dequeue
414c9d2c58db drm/i915/gt: ce->inflight updates are now serialised
c43eabedf48d drm/i915/gt: Replace direct submit with direct call to tasklet
80ee04c69c11 drm/i915/gt: Check for a completed last request once
299dfa2c30b7 drm/i915/gt: Decouple completed requests on unwind
a9d88246410b drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel
7ef7a16bb08c drm/i915/gt: Split the breadcrumb spinlock between global and contexts
25ee818fe9c7 drm/i915/gt: Protect context lifetime with RCU
edd68892ac79 drm/i915/gt: Free stale request on destroying the virtual engine
b9ab5ba2ea2d drm/i915/gt: Don't cancel the interrupt shadow too early
49b7cb3fb2de drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock
5ed9a448fb02 drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission
45a01d0bdc89 drm/i915: Show timeline dependencies for debug
471c4d05edc2 drm/i915: Lift waiter/signaler iterators
38470cc9596c drm/i915/gt: Show all active timelines for debugging
8e85dabc534e drm/i915: Lift i915_request_show()
40da0fecec81 drm/i915/gt: Include semaphore status in print_request()
f6559ba4c7c5 drm/i915/gt: Track the overall busy time
3d97fa02b2bd drm/i915/gt: Ignore dt==0 for reporting underflows
b441a9d6dec3 drm/i915/gem: Drop free_work for GEM contexts
d305f0a2eb8d drm/i915/selftests: Small tweak to put the termination conditions together
ae32abe5ed3c drm/i915/selftests: Improve granularity for mocs reset checks

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/index.html

[-- Attachment #1.2: Type: text/html, Size: 8278 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* [Intel-gfx] ✗ Fi.CI.IGT: failure for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
  2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
                   ` (29 preceding siblings ...)
  2020-11-17 19:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
@ 2020-11-17 22:56 ` Patchwork
  30 siblings, 0 replies; 52+ messages in thread
From: Patchwork @ 2020-11-17 22:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 30313 bytes --]

== Series Details ==

Series: series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks
URL   : https://patchwork.freedesktop.org/series/83951/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_9348_full -> Patchwork_18920_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_18920_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_18920_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_18920_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_caching@read-writes:
    - shard-snb:          NOTRUN -> [INCOMPLETE][1]
   [1]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-snb5/igt@gem_caching@read-writes.html

  * igt@gem_exec_balancer@bonded-cork:
    - shard-tglb:         [PASS][2] -> [DMESG-WARN][3]
   [2]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb7/igt@gem_exec_balancer@bonded-cork.html
   [3]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@gem_exec_balancer@bonded-cork.html
    - shard-kbl:          [PASS][4] -> [DMESG-WARN][5]
   [4]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-kbl4/igt@gem_exec_balancer@bonded-cork.html
   [5]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl4/igt@gem_exec_balancer@bonded-cork.html
    - shard-iclb:         [PASS][6] -> [DMESG-WARN][7]
   [6]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb3/igt@gem_exec_balancer@bonded-cork.html
   [7]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb3/igt@gem_exec_balancer@bonded-cork.html

  * igt@perf_pmu@busy-idle-no-semaphores@rcs0:
    - shard-hsw:          [PASS][8] -> [DMESG-WARN][9]
   [8]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw4/igt@perf_pmu@busy-idle-no-semaphores@rcs0.html
   [9]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw7/igt@perf_pmu@busy-idle-no-semaphores@rcs0.html

  * igt@perf_pmu@other-init-4:
    - shard-tglb:         [PASS][10] -> [FAIL][11] +1 similar issue
   [10]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb7/igt@perf_pmu@other-init-4.html
   [11]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@perf_pmu@other-init-4.html
    - shard-glk:          [PASS][12] -> [FAIL][13]
   [12]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk2/igt@perf_pmu@other-init-4.html
   [13]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk4/igt@perf_pmu@other-init-4.html
    - shard-skl:          [PASS][14] -> [DMESG-FAIL][15]
   [14]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl9/igt@perf_pmu@other-init-4.html
   [15]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl9/igt@perf_pmu@other-init-4.html
    - shard-hsw:          [PASS][16] -> [FAIL][17] +1 similar issue
   [16]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw4/igt@perf_pmu@other-init-4.html
   [17]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw7/igt@perf_pmu@other-init-4.html
    - shard-kbl:          [PASS][18] -> [FAIL][19] +1 similar issue
   [18]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-kbl4/igt@perf_pmu@other-init-4.html
   [19]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl4/igt@perf_pmu@other-init-4.html

  * igt@perf_pmu@other-read-4:
    - shard-snb:          [PASS][20] -> [FAIL][21] +1 similar issue
   [20]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-snb4/igt@perf_pmu@other-read-4.html
   [21]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-snb2/igt@perf_pmu@other-read-4.html
    - shard-iclb:         [PASS][22] -> [FAIL][23] +1 similar issue
   [22]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb3/igt@perf_pmu@other-read-4.html
   [23]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb3/igt@perf_pmu@other-read-4.html
    - shard-glk:          [PASS][24] -> [DMESG-FAIL][25]
   [24]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk2/igt@perf_pmu@other-read-4.html
   [25]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk4/igt@perf_pmu@other-read-4.html

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@kms_plane_scaling@scaler-with-pixel-format@pipe-d-scaler-with-pixel-format}:
    - shard-tglb:         [PASS][26] -> [INCOMPLETE][27]
   [26]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb6/igt@kms_plane_scaling@scaler-with-pixel-format@pipe-d-scaler-with-pixel-format.html
   [27]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb7/igt@kms_plane_scaling@scaler-with-pixel-format@pipe-d-scaler-with-pixel-format.html

  
New tests
---------

  New tests have been introduced between CI_DRM_9348_full and Patchwork_18920_full:

### New CI tests (1) ###

  * boot:
    - Statuses : 199 pass(s)
    - Exec time: [0.0] s

  

Known issues
------------

  Here are the changes found in Patchwork_18920_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_whisper@basic-normal-all:
    - shard-glk:          [PASS][28] -> [DMESG-WARN][29] ([i915#118] / [i915#95]) +1 similar issue
   [28]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk4/igt@gem_exec_whisper@basic-normal-all.html
   [29]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk3/igt@gem_exec_whisper@basic-normal-all.html

  * igt@gem_workarounds@suspend-resume:
    - shard-apl:          [PASS][30] -> [INCOMPLETE][31] ([i915#1635])
   [30]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-apl1/igt@gem_workarounds@suspend-resume.html
   [31]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl3/igt@gem_workarounds@suspend-resume.html

  * igt@i915_module_load@reload-with-fault-injection:
    - shard-skl:          [PASS][32] -> [DMESG-WARN][33] ([i915#1982]) +2 similar issues
   [32]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl7/igt@i915_module_load@reload-with-fault-injection.html
   [33]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl8/igt@i915_module_load@reload-with-fault-injection.html

  * igt@i915_selftest@live@gt_heartbeat:
    - shard-skl:          [PASS][34] -> [DMESG-FAIL][35] ([i915#541])
   [34]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl2/igt@i915_selftest@live@gt_heartbeat.html
   [35]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl1/igt@i915_selftest@live@gt_heartbeat.html

  * igt@kms_cursor_crc@pipe-c-cursor-128x42-random:
    - shard-skl:          [PASS][36] -> [FAIL][37] ([i915#54])
   [36]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl5/igt@kms_cursor_crc@pipe-c-cursor-128x42-random.html
   [37]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl3/igt@kms_cursor_crc@pipe-c-cursor-128x42-random.html

  * igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions:
    - shard-hsw:          [PASS][38] -> [FAIL][39] ([i915#2295] / [i915#2370])
   [38]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw4/igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions.html
   [39]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw7/igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions.html

  * igt@kms_cursor_legacy@flip-vs-cursor-toggle:
    - shard-tglb:         [PASS][40] -> [FAIL][41] ([i915#2346])
   [40]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb8/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html
   [41]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@kms_cursor_legacy@flip-vs-cursor-toggle.html

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-snb:          [PASS][42] -> [DMESG-WARN][43] ([i915#42])
   [42]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-snb5/igt@kms_fbcon_fbt@fbc-suspend.html
   [43]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-snb6/igt@kms_fbcon_fbt@fbc-suspend.html

  * igt@kms_flip@plain-flip-ts-check@c-hdmi-a2:
    - shard-glk:          [PASS][44] -> [FAIL][45] ([i915#2122])
   [44]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk7/igt@kms_flip@plain-flip-ts-check@c-hdmi-a2.html
   [45]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk2/igt@kms_flip@plain-flip-ts-check@c-hdmi-a2.html

  * igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw:
    - shard-apl:          [PASS][46] -> [DMESG-WARN][47] ([i915#1635] / [i915#1982])
   [46]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-apl4/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html
   [47]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl3/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html
    - shard-glk:          [PASS][48] -> [DMESG-WARN][49] ([i915#1982])
   [48]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk2/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html
   [49]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk2/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html
    - shard-tglb:         [PASS][50] -> [DMESG-WARN][51] ([i915#1982])
   [50]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb8/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html
   [51]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb1/igt@kms_frontbuffer_tracking@fbc-1p-pri-indfb-multidraw.html

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite:
    - shard-skl:          [PASS][52] -> [FAIL][53] ([i915#49])
   [52]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl6/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html
   [53]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl5/igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-shrfb-draw-pwrite.html

  * igt@kms_frontbuffer_tracking@psr-rgb101010-draw-blt:
    - shard-iclb:         [PASS][54] -> [DMESG-WARN][55] ([i915#1982])
   [54]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb4/igt@kms_frontbuffer_tracking@psr-rgb101010-draw-blt.html
   [55]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb5/igt@kms_frontbuffer_tracking@psr-rgb101010-draw-blt.html

  * igt@kms_hdr@bpc-switch-dpms:
    - shard-skl:          [PASS][56] -> [FAIL][57] ([i915#1188])
   [56]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl8/igt@kms_hdr@bpc-switch-dpms.html
   [57]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl9/igt@kms_hdr@bpc-switch-dpms.html

  * igt@kms_psr2_su@page_flip:
    - shard-iclb:         [PASS][58] -> [SKIP][59] ([fdo#109642] / [fdo#111068])
   [58]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb2/igt@kms_psr2_su@page_flip.html
   [59]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb5/igt@kms_psr2_su@page_flip.html

  * igt@kms_psr@psr2_primary_page_flip:
    - shard-iclb:         [PASS][60] -> [SKIP][61] ([fdo#109441]) +2 similar issues
   [60]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb2/igt@kms_psr@psr2_primary_page_flip.html
   [61]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb5/igt@kms_psr@psr2_primary_page_flip.html

  * igt@kms_psr@suspend:
    - shard-skl:          [PASS][62] -> [INCOMPLETE][63] ([i915#198])
   [62]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl10/igt@kms_psr@suspend.html
   [63]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl6/igt@kms_psr@suspend.html

  * igt@perf_pmu@module-unload:
    - shard-iclb:         [PASS][64] -> [DMESG-WARN][65] ([i915#1982] / [i915#262])
   [64]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb3/igt@perf_pmu@module-unload.html
   [65]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb6/igt@perf_pmu@module-unload.html

  * igt@perf_pmu@other-init-4:
    - shard-apl:          [PASS][66] -> [DMESG-FAIL][67] ([i915#1635])
   [66]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-apl6/igt@perf_pmu@other-init-4.html
   [67]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl3/igt@perf_pmu@other-init-4.html

  * igt@sysfs_heartbeat_interval@mixed@rcs0:
    - shard-skl:          [PASS][68] -> [FAIL][69] ([i915#1731])
   [68]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl1/igt@sysfs_heartbeat_interval@mixed@rcs0.html
   [69]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl3/igt@sysfs_heartbeat_interval@mixed@rcs0.html

  
#### Possible fixes ####

  * igt@gem_caching@read-writes:
    - shard-hsw:          [FAIL][70] ([i915#1888]) -> [PASS][71]
   [70]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw8/igt@gem_caching@read-writes.html
   [71]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw4/igt@gem_caching@read-writes.html

  * igt@gem_exec_reloc@basic-many-active@rcs0:
    - shard-glk:          [FAIL][72] ([i915#2389]) -> [PASS][73]
   [72]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk8/igt@gem_exec_reloc@basic-many-active@rcs0.html
   [73]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk4/igt@gem_exec_reloc@basic-many-active@rcs0.html

  * igt@gem_exec_suspend@basic-s3:
    - shard-glk:          [FAIL][74] ([i915#1888]) -> [PASS][75] +1 similar issue
   [74]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk5/igt@gem_exec_suspend@basic-s3.html
   [75]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk8/igt@gem_exec_suspend@basic-s3.html

  * igt@gem_huc_copy@huc-copy:
    - shard-tglb:         [SKIP][76] ([i915#2190]) -> [PASS][77]
   [76]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb6/igt@gem_huc_copy@huc-copy.html
   [77]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb7/igt@gem_huc_copy@huc-copy.html

  * igt@i915_pm_rc6_residency@rc6-fence:
    - shard-hsw:          [WARN][78] ([i915#1519]) -> [PASS][79]
   [78]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw8/igt@i915_pm_rc6_residency@rc6-fence.html
   [79]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw4/igt@i915_pm_rc6_residency@rc6-fence.html

  * igt@i915_suspend@fence-restore-tiled2untiled:
    - shard-kbl:          [DMESG-WARN][80] ([i915#180]) -> [PASS][81]
   [80]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-kbl7/igt@i915_suspend@fence-restore-tiled2untiled.html
   [81]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl1/igt@i915_suspend@fence-restore-tiled2untiled.html

  * igt@i915_suspend@fence-restore-untiled:
    - shard-skl:          [INCOMPLETE][82] ([i915#198]) -> [PASS][83]
   [82]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl6/igt@i915_suspend@fence-restore-untiled.html
   [83]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl6/igt@i915_suspend@fence-restore-untiled.html

  * {igt@kms_async_flips@async-flip-with-page-flip-events}:
    - shard-kbl:          [FAIL][84] ([i915#2521]) -> [PASS][85]
   [84]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-kbl6/igt@kms_async_flips@async-flip-with-page-flip-events.html
   [85]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl7/igt@kms_async_flips@async-flip-with-page-flip-events.html

  * {igt@kms_async_flips@test-time-stamp}:
    - shard-tglb:         [FAIL][86] ([i915#2597]) -> [PASS][87]
   [86]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb8/igt@kms_async_flips@test-time-stamp.html
   [87]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@kms_async_flips@test-time-stamp.html

  * igt@kms_cursor_crc@pipe-b-cursor-64x21-onscreen:
    - shard-skl:          [FAIL][88] ([i915#54]) -> [PASS][89] +3 similar issues
   [88]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl4/igt@kms_cursor_crc@pipe-b-cursor-64x21-onscreen.html
   [89]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl4/igt@kms_cursor_crc@pipe-b-cursor-64x21-onscreen.html

  * igt@kms_cursor_edge_walk@pipe-c-256x256-bottom-edge:
    - shard-skl:          [DMESG-WARN][90] ([i915#1982]) -> [PASS][91] +44 similar issues
   [90]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl3/igt@kms_cursor_edge_walk@pipe-c-256x256-bottom-edge.html
   [91]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl10/igt@kms_cursor_edge_walk@pipe-c-256x256-bottom-edge.html

  * igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions-varying-size:
    - shard-hsw:          [FAIL][92] ([i915#2295] / [i915#2370]) -> [PASS][93]
   [92]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw1/igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions-varying-size.html
   [93]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw4/igt@kms_cursor_legacy@cursor-vs-flip-atomic-transitions-varying-size.html

  * igt@kms_cursor_legacy@flip-vs-cursor-crc-atomic:
    - shard-tglb:         [FAIL][94] ([i915#2346]) -> [PASS][95]
   [94]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb6/igt@kms_cursor_legacy@flip-vs-cursor-crc-atomic.html
   [95]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb7/igt@kms_cursor_legacy@flip-vs-cursor-crc-atomic.html

  * igt@kms_flip@2x-flip-vs-expired-vblank@ac-hdmi-a1-hdmi-a2:
    - shard-glk:          [FAIL][96] ([i915#79]) -> [PASS][97]
   [96]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk9/igt@kms_flip@2x-flip-vs-expired-vblank@ac-hdmi-a1-hdmi-a2.html
   [97]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk8/igt@kms_flip@2x-flip-vs-expired-vblank@ac-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@2x-plain-flip-ts-check-interruptible@ab-hdmi-a1-hdmi-a2:
    - shard-glk:          [FAIL][98] ([i915#2122]) -> [PASS][99]
   [98]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk3/igt@kms_flip@2x-plain-flip-ts-check-interruptible@ab-hdmi-a1-hdmi-a2.html
   [99]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk4/igt@kms_flip@2x-plain-flip-ts-check-interruptible@ab-hdmi-a1-hdmi-a2.html

  * igt@kms_flip@plain-flip-ts-check@c-edp1:
    - shard-skl:          [FAIL][100] ([i915#2122]) -> [PASS][101]
   [100]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl3/igt@kms_flip@plain-flip-ts-check@c-edp1.html
   [101]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl7/igt@kms_flip@plain-flip-ts-check@c-edp1.html

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-pgflip-blt:
    - shard-kbl:          [DMESG-WARN][102] ([i915#1982]) -> [PASS][103]
   [102]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-kbl7/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-pgflip-blt.html
   [103]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl3/igt@kms_frontbuffer_tracking@fbc-1p-primscrn-indfb-pgflip-blt.html

  * igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-gtt:
    - shard-snb:          [INCOMPLETE][104] -> [PASS][105]
   [104]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-snb2/igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-gtt.html
   [105]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-snb4/igt@kms_frontbuffer_tracking@fbc-rgb101010-draw-mmap-gtt.html

  * igt@kms_frontbuffer_tracking@psr-rgb101010-draw-blt:
    - shard-tglb:         [DMESG-WARN][106] ([i915#1982]) -> [PASS][107]
   [106]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb8/igt@kms_frontbuffer_tracking@psr-rgb101010-draw-blt.html
   [107]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb8/igt@kms_frontbuffer_tracking@psr-rgb101010-draw-blt.html

  * igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min:
    - shard-skl:          [FAIL][108] ([fdo#108145] / [i915#265]) -> [PASS][109]
   [108]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl1/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min.html
   [109]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl3/igt@kms_plane_alpha_blend@pipe-b-constant-alpha-min.html

  * igt@kms_plane_cursor@pipe-a-overlay-size-64:
    - shard-hsw:          [DMESG-WARN][110] ([i915#1982]) -> [PASS][111]
   [110]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw1/igt@kms_plane_cursor@pipe-a-overlay-size-64.html
   [111]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw4/igt@kms_plane_cursor@pipe-a-overlay-size-64.html

  * igt@kms_psr@psr2_sprite_plane_move:
    - shard-iclb:         [SKIP][112] ([fdo#109441]) -> [PASS][113] +2 similar issues
   [112]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb6/igt@kms_psr@psr2_sprite_plane_move.html
   [113]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb2/igt@kms_psr@psr2_sprite_plane_move.html

  * igt@perf@polling-parameterized:
    - shard-tglb:         [FAIL][114] ([i915#1542]) -> [PASS][115]
   [114]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb8/igt@perf@polling-parameterized.html
   [115]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@perf@polling-parameterized.html

  * igt@prime_vgem@basic-fence-flip:
    - shard-glk:          [DMESG-WARN][116] ([i915#1982]) -> [PASS][117]
   [116]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk3/igt@prime_vgem@basic-fence-flip.html
   [117]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk5/igt@prime_vgem@basic-fence-flip.html

  * igt@sw_sync@timeline_closed:
    - shard-apl:          [DMESG-WARN][118] ([i915#1635] / [i915#1982]) -> [PASS][119]
   [118]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-apl6/igt@sw_sync@timeline_closed.html
   [119]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl6/igt@sw_sync@timeline_closed.html

  
#### Warnings ####

  * igt@gem_softpin@noreloc-s3:
    - shard-tglb:         [INCOMPLETE][120] ([i915#1373] / [i915#1436] / [i915#456]) -> [DMESG-WARN][121] ([i915#1436])
   [120]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb1/igt@gem_softpin@noreloc-s3.html
   [121]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb2/igt@gem_softpin@noreloc-s3.html

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-cpu:
    - shard-snb:          [INCOMPLETE][122] -> [FAIL][123] ([i915#2546])
   [122]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-snb7/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-cpu.html
   [123]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-snb5/igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-cpu.html

  * igt@kms_plane_alpha_blend@pipe-a-alpha-7efc:
    - shard-skl:          [DMESG-FAIL][124] ([fdo#108145] / [i915#1982]) -> [FAIL][125] ([fdo#108145] / [i915#265]) +1 similar issue
   [124]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl4/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html
   [125]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl1/igt@kms_plane_alpha_blend@pipe-a-alpha-7efc.html

  * igt@runner@aborted:
    - shard-hsw:          [FAIL][126] ([i915#2295] / [i915#2439]) -> ([FAIL][127], [FAIL][128]) ([i915#2292] / [i915#2295] / [i915#2439] / [i915#2505])
   [126]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-hsw2/igt@runner@aborted.html
   [127]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw7/igt@runner@aborted.html
   [128]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-hsw2/igt@runner@aborted.html
    - shard-kbl:          [FAIL][129] ([i915#1611] / [i915#2295] / [i915#2439] / [i915#483]) -> ([FAIL][130], [FAIL][131]) ([i915#1611] / [i915#1784] / [i915#2295] / [i915#2426] / [i915#2439] / [i915#483])
   [129]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-kbl1/igt@runner@aborted.html
   [130]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl4/igt@runner@aborted.html
   [131]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-kbl4/igt@runner@aborted.html
    - shard-iclb:         ([FAIL][132], [FAIL][133]) ([i915#2295] / [i915#2439] / [i915#483]) -> ([FAIL][134], [FAIL][135], [FAIL][136]) ([i915#2295] / [i915#2426] / [i915#2439])
   [132]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb2/igt@runner@aborted.html
   [133]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-iclb1/igt@runner@aborted.html
   [134]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb3/igt@runner@aborted.html
   [135]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb3/igt@runner@aborted.html
   [136]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-iclb4/igt@runner@aborted.html
    - shard-apl:          ([FAIL][137], [FAIL][138]) ([i915#1611] / [i915#1635] / [i915#2295] / [i915#2439]) -> ([FAIL][139], [FAIL][140], [FAIL][141]) ([i915#1610] / [i915#1611] / [i915#1635] / [i915#2292] / [i915#2295] / [i915#2426] / [i915#2439])
   [137]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-apl3/igt@runner@aborted.html
   [138]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-apl8/igt@runner@aborted.html
   [139]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl1/igt@runner@aborted.html
   [140]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl8/igt@runner@aborted.html
   [141]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-apl3/igt@runner@aborted.html
    - shard-glk:          ([FAIL][142], [FAIL][143], [FAIL][144]) ([i915#1611] / [i915#1814] / [i915#2295] / [i915#2439] / [i915#483] / [i915#86] / [k.org#202321]) -> ([FAIL][145], [FAIL][146], [FAIL][147]) ([i915#1611] / [i915#2292] / [i915#2295] / [i915#2426] / [i915#2439] / [i915#483] / [i915#86] / [k.org#202321])
   [142]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk7/igt@runner@aborted.html
   [143]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk4/igt@runner@aborted.html
   [144]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-glk8/igt@runner@aborted.html
   [145]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk4/igt@runner@aborted.html
   [146]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk3/igt@runner@aborted.html
   [147]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-glk7/igt@runner@aborted.html
    - shard-tglb:         ([FAIL][148], [FAIL][149]) ([i915#2295] / [i915#2439]) -> ([FAIL][150], [FAIL][151], [FAIL][152]) ([i915#2295] / [i915#2426] / [i915#2439])
   [148]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb5/igt@runner@aborted.html
   [149]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-tglb1/igt@runner@aborted.html
   [150]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@runner@aborted.html
   [151]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb2/igt@runner@aborted.html
   [152]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-tglb6/igt@runner@aborted.html
    - shard-skl:          [FAIL][153] ([i915#1611] / [i915#2295] / [i915#2439]) -> ([FAIL][154], [FAIL][155]) ([i915#1611] / [i915#2292] / [i915#2295] / [i915#2426] / [i915#2439])
   [153]: https://intel-gfx-ci.01.org/tree/drm-tip/CI_DRM_9348/shard-skl10/igt@runner@aborted.html
   [154]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl8/igt@runner@aborted.html
   [155]: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/shard-skl9/igt@runner@aborted.html

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109642]: https://bugs.freedesktop.org/show_bug.cgi?id=109642
  [fdo#111068]: https://bugs.freedesktop.org/show_bug.cgi?id=111068
  [i915#118]: https://gitlab.freedesktop.org/drm/intel/issues/118
  [i915#1188]: https://gitlab.freedesktop.org/drm/intel/issues/1188
  [i915#1373]: https://gitlab.freedesktop.org/drm/intel/issues/1373
  [i915#1436]: https://gitlab.freedesktop.org/drm/intel/issues/1436
  [i915#1519]: https://gitlab.freedesktop.org/drm/intel/issues/1519
  [i915#1542]: https://gitlab.freedesktop.org/drm/intel/issues/1542
  [i915#1610]: https://gitlab.freedesktop.org/drm/intel/issues/1610
  [i915#1611]: https://gitlab.freedesktop.org/drm/intel/issues/1611
  [i915#1635]: https://gitlab.freedesktop.org/drm/intel/issues/1635
  [i915#1731]: https://gitlab.freedesktop.org/drm/intel/issues/1731
  [i915#1784]: https://gitlab.freedesktop.org/drm/intel/issues/1784
  [i915#180]: https://gitlab.freedesktop.org/drm/intel/issues/180
  [i915#1814]: https://gitlab.freedesktop.org/drm/intel/issues/1814
  [i915#1888]: https://gitlab.freedesktop.org/drm/intel/issues/1888
  [i915#198]: https://gitlab.freedesktop.org/drm/intel/issues/198
  [i915#1982]: https://gitlab.freedesktop.org/drm/intel/issues/1982
  [i915#2122]: https://gitlab.freedesktop.org/drm/intel/issues/2122
  [i915#2190]: https://gitlab.freedesktop.org/drm/intel/issues/2190
  [i915#2292]: https://gitlab.freedesktop.org/drm/intel/issues/2292
  [i915#2295]: https://gitlab.freedesktop.org/drm/intel/issues/2295
  [i915#2346]: https://gitlab.freedesktop.org/drm/intel/issues/2346
  [i915#2370]: https://gitlab.freedesktop.org/drm/intel/issues/2370
  [i915#2389]: https://gitlab.freedesktop.org/drm/intel/issues/2389
  [i915#2426]: https://gitlab.freedes

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_18920/index.html

[-- Attachment #1.2: Type: text/html, Size: 36499 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
@ 2020-11-18 11:05   ` Tvrtko Ursulin
  2020-11-18 11:24     ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-18 11:05 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> Since preempt-to-busy, we may unsubmit a request while it is still on
> the HW and completes asynchronously. That means it may be retired and in
> the process destroy the virtual engine (as the user has closed their
> context), but that engine may still be holding onto the unsubmitted
> compelted request. Therefore we need to potentially cleanup the old
> request on destroying the virtual engine. We also have to keep the
> virtual_engine alive until after the sibling's execlists_dequeue() have
> finished peeking into the virtual engines, for which we serialise with
> RCU.
> 
> v2: Be paranoid and flush the tasklet as well.
> v3: And flush the tasklet before the engines, as the tasklet may
> re-attach an rb_node after our removal from the siblings.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
>   1 file changed, 54 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> index 17cb7060eb29..c11433884cf6 100644
> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> @@ -182,6 +182,7 @@
>   struct virtual_engine {
>   	struct intel_engine_cs base;
>   	struct intel_context context;
> +	struct rcu_work rcu;
>   
>   	/*
>   	 * We allow only a single request through the virtual engine at a time
> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
>   	return &ve->base.execlists.default_priolist.requests[0];
>   }
>   
> -static void virtual_context_destroy(struct kref *kref)
> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
>   {
>   	struct virtual_engine *ve =
> -		container_of(kref, typeof(*ve), context.ref);
> +		container_of(wrk, typeof(*ve), rcu.work);
>   	unsigned int n;
>   
> -	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> -	GEM_BUG_ON(ve->request);
>   	GEM_BUG_ON(ve->context.inflight);
>   
> +	/* Preempt-to-busy may leave a stale request behind. */
> +	if (unlikely(ve->request)) {
> +		struct i915_request *old;
> +
> +		spin_lock_irq(&ve->base.active.lock);
> +
> +		old = fetch_and_zero(&ve->request);
> +		if (old) {
> +			GEM_BUG_ON(!i915_request_completed(old));
> +			__i915_request_submit(old);
> +			i915_request_put(old);
> +		}
> +
> +		spin_unlock_irq(&ve->base.active.lock);
> +	}
> +
> +	/*
> +	 * Flush the tasklet in case it is still running on another core.
> +	 *
> +	 * This needs to be done before we remove ourselves from the siblings'
> +	 * rbtrees as in the case it is running in parallel, it may reinsert
> +	 * the rb_node into a sibling.
> +	 */
> +	tasklet_kill(&ve->base.execlists.tasklet);

Can it still be running after an RCU period?

> +
> +	/* Decouple ourselves from the siblings, no more access allowed. */
>   	for (n = 0; n < ve->num_siblings; n++) {
>   		struct intel_engine_cs *sibling = ve->siblings[n];
>   		struct rb_node *node = &ve->nodes[sibling->id].rb;
> -		unsigned long flags;
>   
>   		if (RB_EMPTY_NODE(node))
>   			continue;
>   
> -		spin_lock_irqsave(&sibling->active.lock, flags);
> +		spin_lock_irq(&sibling->active.lock);
>   
>   		/* Detachment is lazily performed in the execlists tasklet */
>   		if (!RB_EMPTY_NODE(node))
>   			rb_erase_cached(node, &sibling->execlists.virtual);
>   
> -		spin_unlock_irqrestore(&sibling->active.lock, flags);
> +		spin_unlock_irq(&sibling->active.lock);
>   	}
>   	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> +	GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>   
>   	if (ve->context.state)
>   		__execlists_context_fini(&ve->context);
>   	intel_context_fini(&ve->context);
>   
>   	intel_engine_free_request_pool(&ve->base);
> +	intel_breadcrumbs_free(ve->base.breadcrumbs);

This looks to belong to some other patch.

Regards,

Tvrtko

>   
>   	kfree(ve->bonds);
>   	kfree(ve);
>   }
>   
> +static void virtual_context_destroy(struct kref *kref)
> +{
> +	struct virtual_engine *ve =
> +		container_of(kref, typeof(*ve), context.ref);
> +
> +	GEM_BUG_ON(!list_empty(&ve->context.signals));
> +
> +	/*
> +	 * When destroying the virtual engine, we have to be aware that
> +	 * it may still be in use from an hardirq/softirq context causing
> +	 * the resubmission of a completed request (background completion
> +	 * due to preempt-to-busy). Before we can free the engine, we need
> +	 * to flush the submission code and tasklets that are still potentially
> +	 * accessing the engine. Flushing the tasklets require process context,
> +	 * and since we can guard the resubmit onto the engine with an RCU read
> +	 * lock, we can delegate the free of the engine to an RCU worker.
> +	 */
> +	INIT_RCU_WORK(&ve->rcu, rcu_virtual_context_destroy);
> +	queue_rcu_work(system_wq, &ve->rcu);
> +}
> +
>   static void virtual_engine_initial_hint(struct virtual_engine *ve)
>   {
>   	int swp;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-18 11:05   ` Tvrtko Ursulin
@ 2020-11-18 11:24     ` Chris Wilson
  2020-11-18 11:38       ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-18 11:24 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
> 
> On 17/11/2020 11:30, Chris Wilson wrote:
> > Since preempt-to-busy, we may unsubmit a request while it is still on
> > the HW and completes asynchronously. That means it may be retired and in
> > the process destroy the virtual engine (as the user has closed their
> > context), but that engine may still be holding onto the unsubmitted
> > compelted request. Therefore we need to potentially cleanup the old
> > request on destroying the virtual engine. We also have to keep the
> > virtual_engine alive until after the sibling's execlists_dequeue() have
> > finished peeking into the virtual engines, for which we serialise with
> > RCU.
> > 
> > v2: Be paranoid and flush the tasklet as well.
> > v3: And flush the tasklet before the engines, as the tasklet may
> > re-attach an rb_node after our removal from the siblings.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
> >   1 file changed, 54 insertions(+), 7 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > index 17cb7060eb29..c11433884cf6 100644
> > --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> > @@ -182,6 +182,7 @@
> >   struct virtual_engine {
> >       struct intel_engine_cs base;
> >       struct intel_context context;
> > +     struct rcu_work rcu;
> >   
> >       /*
> >        * We allow only a single request through the virtual engine at a time
> > @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
> >       return &ve->base.execlists.default_priolist.requests[0];
> >   }
> >   
> > -static void virtual_context_destroy(struct kref *kref)
> > +static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >   {
> >       struct virtual_engine *ve =
> > -             container_of(kref, typeof(*ve), context.ref);
> > +             container_of(wrk, typeof(*ve), rcu.work);
> >       unsigned int n;
> >   
> > -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> > -     GEM_BUG_ON(ve->request);
> >       GEM_BUG_ON(ve->context.inflight);
> >   
> > +     /* Preempt-to-busy may leave a stale request behind. */
> > +     if (unlikely(ve->request)) {
> > +             struct i915_request *old;
> > +
> > +             spin_lock_irq(&ve->base.active.lock);
> > +
> > +             old = fetch_and_zero(&ve->request);
> > +             if (old) {
> > +                     GEM_BUG_ON(!i915_request_completed(old));
> > +                     __i915_request_submit(old);
> > +                     i915_request_put(old);
> > +             }
> > +
> > +             spin_unlock_irq(&ve->base.active.lock);
> > +     }
> > +
> > +     /*
> > +      * Flush the tasklet in case it is still running on another core.
> > +      *
> > +      * This needs to be done before we remove ourselves from the siblings'
> > +      * rbtrees as in the case it is running in parallel, it may reinsert
> > +      * the rb_node into a sibling.
> > +      */
> > +     tasklet_kill(&ve->base.execlists.tasklet);
> 
> Can it still be running after an RCU period?

I think there is a window between checking to see if the request is
completed and kicking the tasklet, that is not under the rcu lock and
opportunity for the request to be retired, and barrier flushed to drop
the context references.

I observed the leaked ve->request, but the tasklet_kill, iirc, is
speculation about possible windows. Admittedly all long test runs have
been with this patch in place for most of the last year.

> > +     /* Decouple ourselves from the siblings, no more access allowed. */
> >       for (n = 0; n < ve->num_siblings; n++) {
> >               struct intel_engine_cs *sibling = ve->siblings[n];
> >               struct rb_node *node = &ve->nodes[sibling->id].rb;
> > -             unsigned long flags;
> >   
> >               if (RB_EMPTY_NODE(node))
> >                       continue;
> >   
> > -             spin_lock_irqsave(&sibling->active.lock, flags);
> > +             spin_lock_irq(&sibling->active.lock);
> >   
> >               /* Detachment is lazily performed in the execlists tasklet */
> >               if (!RB_EMPTY_NODE(node))
> >                       rb_erase_cached(node, &sibling->execlists.virtual);
> >   
> > -             spin_unlock_irqrestore(&sibling->active.lock, flags);
> > +             spin_unlock_irq(&sibling->active.lock);
> >       }
> >       GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> > +     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> >   
> >       if (ve->context.state)
> >               __execlists_context_fini(&ve->context);
> >       intel_context_fini(&ve->context);
> >   
> >       intel_engine_free_request_pool(&ve->base);
> > +     intel_breadcrumbs_free(ve->base.breadcrumbs);
> 
> This looks to belong to some other patch.

Some might say I was fixing up an earlier oversight.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
@ 2020-11-18 11:35   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-18 11:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> As we funnel more and more contexts into the breadcrumbs on an engine,
> the hold time of b->irq_lock grows. As we may then contend with the
> b->irq_lock during request submission, this increases the burden upon
> the engine->active.lock and so directly impacts both our execution
> latency and client latency. If we split the b->irq_lock by introducing a
> per-context spinlock to manage the signalers within a context, we then
> only need the b->irq_lock for enabling/disabling the interrupt and can
> avoid taking the lock for walking the list of contexts within the signal
> worker. Even with the current setup, this greatly reduces the number of
> times we have to take and fight for b->irq_lock.
> 
> Furthermore, this closes the race between enabling the signaling context
> while it is in the process of being signaled and removed:
> 
> <4>[  416.208555] list_add corruption. prev->next should be next (ffff8881951d5910), but was dead000000000100. (prev=ffff8882781bb870).
> <4>[  416.208573] WARNING: CPU: 7 PID: 0 at lib/list_debug.c:28 __list_add_valid+0x4d/0x70
> <4>[  416.208575] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
> <4>[  416.208611] CPU: 7 PID: 0 Comm: swapper/7 Tainted: G     U            5.8.0-CI-CI_DRM_8852+ #1
> <4>[  416.208614] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
> <4>[  416.208627] RIP: 0010:__list_add_valid+0x4d/0x70
> <4>[  416.208631] Code: c3 48 89 d1 48 c7 c7 60 18 33 82 48 89 c2 e8 ea e0 b6 ff 0f 0b 31 c0 c3 48 89 c1 4c 89 c6 48 c7 c7 b0 18 33 82 e8 d3 e0 b6 ff <0f> 0b 31 c0 c3 48 89 f2 4c 89 c1 48 89 fe 48 c7 c7 00 19 33 82 e8
> <4>[  416.208633] RSP: 0018:ffffc90000280e18 EFLAGS: 00010086
> <4>[  416.208636] RAX: 0000000000000000 RBX: ffff888250a44880 RCX: 0000000000000105
> <4>[  416.208639] RDX: 0000000000000105 RSI: ffffffff82320c5b RDI: 00000000ffffffff
> <4>[  416.208641] RBP: ffff8882781bb870 R08: 0000000000000000 R09: 0000000000000001
> <4>[  416.208643] R10: 00000000054d2957 R11: 000000006abbd991 R12: ffff8881951d58c8
> <4>[  416.208646] R13: ffff888286073880 R14: ffff888286073848 R15: ffff8881951d5910
> <4>[  416.208669] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
> <4>[  416.208671] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  416.208673] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
> <4>[  416.208675] PKRU: 55555554
> <4>[  416.208677] Call Trace:
> <4>[  416.208679]  <IRQ>
> <4>[  416.208751]  i915_request_enable_breadcrumb+0x278/0x400 [i915]
> <4>[  416.208839]  __i915_request_submit+0xca/0x2a0 [i915]
> <4>[  416.208892]  __execlists_submission_tasklet+0x480/0x1830 [i915]
> <4>[  416.208942]  execlists_submission_tasklet+0xc4/0x130 [i915]
> <4>[  416.208947]  tasklet_action_common.isra.17+0x6c/0x1c0
> <4>[  416.208954]  __do_softirq+0xdf/0x498
> <4>[  416.208960]  ? handle_fasteoi_irq+0x150/0x150
> <4>[  416.208964]  asm_call_on_stack+0xf/0x20
> <4>[  416.208966]  </IRQ>
> <4>[  416.208969]  do_softirq_own_stack+0xa1/0xc0
> <4>[  416.208972]  irq_exit_rcu+0xb5/0xc0
> <4>[  416.208976]  common_interrupt+0xf7/0x260
> <4>[  416.208980]  asm_common_interrupt+0x1e/0x40
> <4>[  416.208985] RIP: 0010:cpuidle_enter_state+0xb6/0x410
> <4>[  416.208987] Code: 00 31 ff e8 9c 3e 89 ff 80 7c 24 0b 00 74 12 9c 58 f6 c4 02 0f 85 31 03 00 00 31 ff e8 e3 6c 90 ff e8 fe a4 94 ff fb 45 85 ed <0f> 88 c7 02 00 00 49 63 c5 4c 2b 24 24 48 8d 14 40 48 8d 14 90 48
> <4>[  416.208989] RSP: 0018:ffffc90000143e70 EFLAGS: 00000206
> <4>[  416.208991] RAX: 0000000000000007 RBX: ffffe8ffffda8070 RCX: 0000000000000000
> <4>[  416.208993] RDX: 0000000000000000 RSI: ffffffff8238b4ee RDI: ffffffff8233184f
> <4>[  416.208995] RBP: ffffffff826b4e00 R08: 0000000000000000 R09: 0000000000000000
> <4>[  416.208997] R10: 0000000000000001 R11: 0000000000000000 R12: 00000060e7f24a8f
> <4>[  416.208998] R13: 0000000000000003 R14: 0000000000000003 R15: 0000000000000003
> <4>[  416.209012]  cpuidle_enter+0x24/0x40
> <4>[  416.209016]  do_idle+0x22f/0x2d0
> <4>[  416.209022]  cpu_startup_entry+0x14/0x20
> <4>[  416.209025]  start_secondary+0x158/0x1a0
> <4>[  416.209030]  secondary_startup_64+0xa4/0xb0
> <4>[  416.209039] irq event stamp: 10186977
> <4>[  416.209042] hardirqs last  enabled at (10186976): [<ffffffff810b9363>] tasklet_action_common.isra.17+0xe3/0x1c0
> <4>[  416.209044] hardirqs last disabled at (10186977): [<ffffffff81a5e5ed>] _raw_spin_lock_irqsave+0xd/0x50
> <4>[  416.209047] softirqs last  enabled at (10186968): [<ffffffff810b9a1a>] irq_enter_rcu+0x6a/0x70
> <4>[  416.209049] softirqs last disabled at (10186969): [<ffffffff81c00f4f>] asm_call_on_stack+0xf/0x20
> 
> <4>[  416.209317] list_del corruption, ffff8882781bb870->next is LIST_POISON1 (dead000000000100)
> <4>[  416.209317] WARNING: CPU: 7 PID: 46 at lib/list_debug.c:47 __list_del_entry_valid+0x4e/0x90
> <4>[  416.209317] Modules linked in: i915(+) vgem snd_hda_codec_hdmi snd_hda_codec_realtek snd_hda_codec_generic ledtrig_audio mei_hdcp x86_pkg_temp_thermal coretemp ax88179_178a usbnet mii crct10dif_pclmul snd_intel_dspcfg crc32_pclmul snd_hda_codec snd_hwdep ghash_clmulni_intel snd_hda_core e1000e snd_pcm ptp pps_core mei_me mei prime_numbers intel_lpss_pci [last unloaded: i915]
> <4>[  416.209317] CPU: 7 PID: 46 Comm: ksoftirqd/7 Tainted: G     U  W         5.8.0-CI-CI_DRM_8852+ #1
> <4>[  416.209317] Hardware name: Intel Corporation Ice Lake Client Platform/IceLake Y LPDDR4x T4 RVP TLC, BIOS ICLSFWR1.R00.3212.A00.1905212112 05/21/2019
> <4>[  416.209317] RIP: 0010:__list_del_entry_valid+0x4e/0x90
> <4>[  416.209317] Code: 2e 48 8b 32 48 39 fe 75 3a 48 8b 50 08 48 39 f2 75 48 b8 01 00 00 00 c3 48 89 fe 48 89 c2 48 c7 c7 38 19 33 82 e8 62 e0 b6 ff <0f> 0b 31 c0 c3 48 89 fe 48 c7 c7 70 19 33 82 e8 4e e0 b6 ff 0f 0b
> <4>[  416.209317] RSP: 0018:ffffc90000280de8 EFLAGS: 00010086
> <4>[  416.209317] RAX: 0000000000000000 RBX: ffff8882781bb848 RCX: 0000000000010104
> <4>[  416.209317] RDX: 0000000000010104 RSI: ffffffff8238b4ee RDI: 00000000ffffffff
> <4>[  416.209317] RBP: ffff8882781bb880 R08: 0000000000000000 R09: 0000000000000001
> <4>[  416.209317] R10: 000000009fb6666e R11: 00000000feca9427 R12: ffffc90000280e18
> <4>[  416.209317] R13: ffff8881951d5930 R14: dead0000000000d8 R15: ffff8882781bb880
> <4>[  416.209317] FS:  0000000000000000(0000) GS:ffff88829c180000(0000) knlGS:0000000000000000
> <4>[  416.209317] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> <4>[  416.209317] CR2: 0000556231326c48 CR3: 0000000005610001 CR4: 0000000000760ee0
> <4>[  416.209317] PKRU: 55555554
> <4>[  416.209317] Call Trace:
> <4>[  416.209317]  <IRQ>
> <4>[  416.209317]  remove_signaling_context.isra.13+0xd/0x70 [i915]
> <4>[  416.209513]  signal_irq_work+0x1f7/0x4b0 [i915]
> 
> This is caused by virtual engines where although we take the breadcrumb
> lock on each of the active engines, they may be different engines on
> different requests, It turns out that the b->irq_lock was not a
> sufficient proxy for the engine->active.lock in the case of more than
> one request, so introduce an explicit lock around ce->signals.
> 
> v2: ce->signal_lock is acquired with only RCU protection and so must be
> treated carefully and not cleared during reallocation. We also then need
> to confirm that the ce we lock is the same as we found in the breadcrumb
> list.
> 
> Closes: https://gitlab.freedesktop.org/drm/intel/-/issues/2276
> Fixes: f94343d0a622 ("drm/i915: Remove requirement for holding i915_request.lock for breadcrumbs")
> Fixes: bda4d4db6dd6 ("drm/i915/gt: Replace intel_engine_transfer_stale_breadcrumbs")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/gt/intel_breadcrumbs.c   | 168 ++++++++----------
>   .../gpu/drm/i915/gt/intel_breadcrumbs_types.h |   6 +-
>   drivers/gpu/drm/i915/gt/intel_context.c       |   3 +-
>   drivers/gpu/drm/i915/gt/intel_context_types.h |  12 +-
>   drivers/gpu/drm/i915/i915_request.h           |   6 +-
>   5 files changed, 90 insertions(+), 105 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> index cf6e05ea4d8f..a24cc1ff08a0 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs.c
> @@ -101,18 +101,37 @@ static void __intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
>   	intel_gt_pm_put_async(b->irq_engine->gt);
>   }
>   
> +static void intel_breadcrumbs_disarm_irq(struct intel_breadcrumbs *b)
> +{
> +	spin_lock(&b->irq_lock);
> +	if (b->irq_armed)
> +		__intel_breadcrumbs_disarm_irq(b);
> +	spin_unlock(&b->irq_lock);
> +}
> +
>   static void add_signaling_context(struct intel_breadcrumbs *b,
>   				  struct intel_context *ce)
>   {
> -	intel_context_get(ce);
> -	list_add_tail(&ce->signal_link, &b->signalers);
> +	lockdep_assert_held(&ce->signal_lock);
> +
> +	spin_lock(&b->signalers_lock);
> +	list_add_rcu(&ce->signal_link, &b->signalers);
> +	spin_unlock(&b->signalers_lock);
>   }
>   
> -static void remove_signaling_context(struct intel_breadcrumbs *b,
> +static bool remove_signaling_context(struct intel_breadcrumbs *b,
>   				     struct intel_context *ce)
>   {
> -	list_del(&ce->signal_link);
> -	intel_context_put(ce);
> +	lockdep_assert_held(&ce->signal_lock);
> +
> +	if (!list_empty(&ce->signals))
> +		return false;
> +
> +	spin_lock(&b->signalers_lock);
> +	list_del_rcu(&ce->signal_link);
> +	spin_unlock(&b->signalers_lock);
> +
> +	return true;
>   }
>   
>   static inline bool __request_completed(const struct i915_request *rq)
> @@ -175,6 +194,8 @@ static void add_retire(struct intel_breadcrumbs *b, struct intel_timeline *tl)
>   
>   static bool __signal_request(struct i915_request *rq)
>   {
> +	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags));
> +
>   	if (!__dma_fence_signal(&rq->fence)) {
>   		i915_request_put(rq);
>   		return false;
> @@ -195,15 +216,12 @@ static void signal_irq_work(struct irq_work *work)
>   	struct intel_breadcrumbs *b = container_of(work, typeof(*b), irq_work);
>   	const ktime_t timestamp = ktime_get();
>   	struct llist_node *signal, *sn;
> -	struct intel_context *ce, *cn;
> -	struct list_head *pos, *next;
> +	struct intel_context *ce;
>   
>   	signal = NULL;
>   	if (unlikely(!llist_empty(&b->signaled_requests)))
>   		signal = llist_del_all(&b->signaled_requests);
>   
> -	spin_lock(&b->irq_lock);
> -
>   	/*
>   	 * Keep the irq armed until the interrupt after all listeners are gone.
>   	 *
> @@ -229,47 +247,44 @@ static void signal_irq_work(struct irq_work *work)
>   	 * interrupt draw less ire from other users of the system and tools
>   	 * like powertop.
>   	 */
> -	if (!signal && b->irq_armed && list_empty(&b->signalers))
> -		__intel_breadcrumbs_disarm_irq(b);
> +	if (!signal && READ_ONCE(b->irq_armed) && list_empty(&b->signalers))
> +		intel_breadcrumbs_disarm_irq(b);
>   
> -	list_for_each_entry_safe(ce, cn, &b->signalers, signal_link) {
> -		GEM_BUG_ON(list_empty(&ce->signals));
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
> +		struct i915_request *rq;
>   
> -		list_for_each_safe(pos, next, &ce->signals) {
> -			struct i915_request *rq =
> -				list_entry(pos, typeof(*rq), signal_link);
> +		list_for_each_entry_rcu(rq, &ce->signals, signal_link) {
> +			bool release;
>   
> -			GEM_BUG_ON(!check_signal_order(ce, rq));
>   			if (!__request_completed(rq))
>   				break;
>   
> +			if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL,
> +						&rq->fence.flags))
> +				break;
> +
>   			/*
>   			 * Queue for execution after dropping the signaling
>   			 * spinlock as the callback chain may end up adding
>   			 * more signalers to the same context or engine.
>   			 */
> -			clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> +			spin_lock(&ce->signal_lock);
> +			list_del_rcu(&rq->signal_link);
> +			release = remove_signaling_context(b, ce);
> +			spin_unlock(&ce->signal_lock);
> +
>   			if (__signal_request(rq))
>   				/* We own signal_node now, xfer to local list */
>   				signal = slist_add(&rq->signal_node, signal);
> -		}
>   
> -		/*
> -		 * We process the list deletion in bulk, only using a list_add
> -		 * (not list_move) above but keeping the status of
> -		 * rq->signal_link known with the I915_FENCE_FLAG_SIGNAL bit.
> -		 */
> -		if (!list_is_first(pos, &ce->signals)) {
> -			/* Advance the list to the first incomplete request */
> -			__list_del_many(&ce->signals, pos);
> -			if (&ce->signals == pos) { /* now empty */
> +			if (release) {
>   				add_retire(b, ce->timeline);
> -				remove_signaling_context(b, ce);
> +				intel_context_put(ce);
>   			}
>   		}
>   	}
> -
> -	spin_unlock(&b->irq_lock);
> +	rcu_read_unlock();
>   
>   	llist_for_each_safe(signal, sn, signal) {
>   		struct i915_request *rq =
> @@ -298,14 +313,15 @@ intel_breadcrumbs_create(struct intel_engine_cs *irq_engine)
>   	if (!b)
>   		return NULL;
>   
> -	spin_lock_init(&b->irq_lock);
> +	b->irq_engine = irq_engine;
> +
> +	spin_lock_init(&b->signalers_lock);
>   	INIT_LIST_HEAD(&b->signalers);
>   	init_llist_head(&b->signaled_requests);
>   
> +	spin_lock_init(&b->irq_lock);
>   	init_irq_work(&b->irq_work, signal_irq_work);
>   
> -	b->irq_engine = irq_engine;
> -
>   	return b;
>   }
>   
> @@ -347,9 +363,9 @@ void intel_breadcrumbs_free(struct intel_breadcrumbs *b)
>   	kfree(b);
>   }
>   
> -static void insert_breadcrumb(struct i915_request *rq,
> -			      struct intel_breadcrumbs *b)
> +static void insert_breadcrumb(struct i915_request *rq)
>   {
> +	struct intel_breadcrumbs *b = READ_ONCE(rq->engine)->breadcrumbs;
>   	struct intel_context *ce = rq->context;
>   	struct list_head *pos;
>   
> @@ -371,6 +387,7 @@ static void insert_breadcrumb(struct i915_request *rq,
>   	}
>   
>   	if (list_empty(&ce->signals)) {
> +		intel_context_get(ce);
>   		add_signaling_context(b, ce);
>   		pos = &ce->signals;
>   	} else {
> @@ -396,8 +413,9 @@ static void insert_breadcrumb(struct i915_request *rq,
>   				break;
>   		}
>   	}
> -	list_add(&rq->signal_link, pos);
> +	list_add_rcu(&rq->signal_link, pos);
>   	GEM_BUG_ON(!check_signal_order(ce, rq));
> +	GEM_BUG_ON(test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags));
>   	set_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
>   
>   	/*
> @@ -410,7 +428,7 @@ static void insert_breadcrumb(struct i915_request *rq,
>   
>   bool i915_request_enable_breadcrumb(struct i915_request *rq)
>   {
> -	struct intel_breadcrumbs *b;
> +	struct intel_context *ce = rq->context;
>   
>   	/* Serialises with i915_request_retire() using rq->lock */
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &rq->fence.flags))
> @@ -425,67 +443,30 @@ bool i915_request_enable_breadcrumb(struct i915_request *rq)
>   	if (!test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
>   		return true;
>   
> -	/*
> -	 * rq->engine is locked by rq->engine->active.lock. That however
> -	 * is not known until after rq->engine has been dereferenced and
> -	 * the lock acquired. Hence we acquire the lock and then validate
> -	 * that rq->engine still matches the lock we hold for it.
> -	 *
> -	 * Here, we are using the breadcrumb lock as a proxy for the
> -	 * rq->engine->active.lock, and we know that since the breadcrumb
> -	 * will be serialised within i915_request_submit/i915_request_unsubmit,
> -	 * the engine cannot change while active as long as we hold the
> -	 * breadcrumb lock on that engine.
> -	 *
> -	 * From the dma_fence_enable_signaling() path, we are outside of the
> -	 * request submit/unsubmit path, and so we must be more careful to
> -	 * acquire the right lock.
> -	 */
> -	b = READ_ONCE(rq->engine)->breadcrumbs;
> -	spin_lock(&b->irq_lock);
> -	while (unlikely(b != READ_ONCE(rq->engine)->breadcrumbs)) {
> -		spin_unlock(&b->irq_lock);
> -		b = READ_ONCE(rq->engine)->breadcrumbs;
> -		spin_lock(&b->irq_lock);
> -	}
> -
> -	/*
> -	 * Now that we are finally serialised with request submit/unsubmit,
> -	 * [with b->irq_lock] and with i915_request_retire() [via checking
> -	 * SIGNALED with rq->lock] confirm the request is indeed active. If
> -	 * it is no longer active, the breadcrumb will be attached upon
> -	 * i915_request_submit().
> -	 */
> +	spin_lock(&ce->signal_lock);
>   	if (test_bit(I915_FENCE_FLAG_ACTIVE, &rq->fence.flags))
> -		insert_breadcrumb(rq, b);
> -
> -	spin_unlock(&b->irq_lock);
> +		insert_breadcrumb(rq);
> +	spin_unlock(&ce->signal_lock);
>   
>   	return true;
>   }
>   
>   void i915_request_cancel_breadcrumb(struct i915_request *rq)
>   {
> -	struct intel_breadcrumbs *b = rq->engine->breadcrumbs;
> +	struct intel_context *ce = rq->context;
> +	bool release;
>   
> -	/*
> -	 * We must wait for b->irq_lock so that we know the interrupt handler
> -	 * has released its reference to the intel_context and has completed
> -	 * the DMA_FENCE_FLAG_SIGNALED_BIT/I915_FENCE_FLAG_SIGNAL dance (if
> -	 * required).
> -	 */
> -	spin_lock(&b->irq_lock);
> -	if (test_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags)) {
> -		struct intel_context *ce = rq->context;
> +	if (!test_and_clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags))
> +		return;
>   
> -		list_del(&rq->signal_link);
> -		if (list_empty(&ce->signals))
> -			remove_signaling_context(b, ce);
> +	spin_lock(&ce->signal_lock);
> +	list_del_rcu(&rq->signal_link);
> +	release = remove_signaling_context(rq->engine->breadcrumbs, ce);
> +	spin_unlock(&ce->signal_lock);
> +	if (release)
> +		intel_context_put(ce);
>   
> -		clear_bit(I915_FENCE_FLAG_SIGNAL, &rq->fence.flags);
> -		i915_request_put(rq);
> -	}
> -	spin_unlock(&b->irq_lock);
> +	i915_request_put(rq);
>   }
>   
>   static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p)
> @@ -495,18 +476,17 @@ static void print_signals(struct intel_breadcrumbs *b, struct drm_printer *p)
>   
>   	drm_printf(p, "Signals:\n");
>   
> -	spin_lock_irq(&b->irq_lock);
> -	list_for_each_entry(ce, &b->signalers, signal_link) {
> -		list_for_each_entry(rq, &ce->signals, signal_link) {
> +	rcu_read_lock();
> +	list_for_each_entry_rcu(ce, &b->signalers, signal_link) {
> +		list_for_each_entry_rcu(rq, &ce->signals, signal_link)
>   			drm_printf(p, "\t[%llx:%llx%s] @ %dms\n",
>   				   rq->fence.context, rq->fence.seqno,
>   				   i915_request_completed(rq) ? "!" :
>   				   i915_request_started(rq) ? "*" :
>   				   "",
>   				   jiffies_to_msecs(jiffies - rq->emitted_jiffies));
> -		}
>   	}
> -	spin_unlock_irq(&b->irq_lock);
> +	rcu_read_unlock();
>   }
>   
>   void intel_engine_print_breadcrumbs(struct intel_engine_cs *engine,
> diff --git a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> index 3fa19820b37a..a74bb3062bd8 100644
> --- a/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_breadcrumbs_types.h
> @@ -29,18 +29,16 @@
>    * the overhead of waking that client is much preferred.
>    */
>   struct intel_breadcrumbs {
> -	spinlock_t irq_lock; /* protects the lists used in hardirq context */
> -
>   	/* Not all breadcrumbs are attached to physical HW */
>   	struct intel_engine_cs *irq_engine;
>   
> +	spinlock_t signalers_lock; /* protects the list of signalers */
>   	struct list_head signalers;
>   	struct llist_head signaled_requests;
>   
> +	spinlock_t irq_lock; /* protects the interrupt from hardirq context */
>   	struct irq_work irq_work; /* for use from inside irq_lock */
> -
>   	unsigned int irq_enabled;
> -
>   	bool irq_armed;
>   };
>   
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index d3a835212167..349e7fa1488d 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -379,7 +379,8 @@ intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   
>   	ce->vm = i915_vm_get(engine->gt->vm);
>   
> -	INIT_LIST_HEAD(&ce->signal_link);
> +	/* NB ce->signal_link/lock is used under RCU */
> +	spin_lock_init(&ce->signal_lock);
>   	INIT_LIST_HEAD(&ce->signals);
>   
>   	mutex_init(&ce->pin_mutex);
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 20cb5835d1c3..52fa9c132746 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -25,6 +25,7 @@ DECLARE_EWMA(runtime, 3, 8);
>   struct i915_gem_context;
>   struct i915_gem_ww_ctx;
>   struct i915_vma;
> +struct intel_breadcrumbs;
>   struct intel_context;
>   struct intel_ring;
>   
> @@ -63,8 +64,15 @@ struct intel_context {
>   	struct i915_address_space *vm;
>   	struct i915_gem_context __rcu *gem_context;
>   
> -	struct list_head signal_link;
> -	struct list_head signals;
> +	/*
> +	 * @signal_lock protects the list of requests that need signaling,
> +	 * @signals. While there are any requests that need signaling,
> +	 * we add the context to the breadcrumbs worker, and remove it
> +	 * upon completion/cancellation of the last request.
> +	 */
> +	struct list_head signal_link; /* Accessed under RCU */
> +	struct list_head signals; /* Guarded by signal_lock */
> +	spinlock_t signal_lock; /* protects signals, the list of requests */
>   
>   	struct i915_vma *state;
>   	struct intel_ring *ring;
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 571a55d38278..26a19b84a586 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -178,10 +178,8 @@ struct i915_request {
>   	struct intel_ring *ring;
>   	struct intel_timeline __rcu *timeline;
>   
> -	union {
> -		struct list_head signal_link;
> -		struct llist_node signal_node;
> -	};
> +	struct list_head signal_link;
> +	struct llist_node signal_node;
>   
>   	/*
>   	 * The rcu epoch of when this request was allocated. Used to judiciously
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU
  2020-11-17 11:30 ` [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
@ 2020-11-18 11:36   ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-18 11:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 11:30, Chris Wilson wrote:
> Allow a brief period for continued access to a dead intel_context by
> deferring the release of the struct until after an RCU grace period.
> As we are using a dedicated slab cache for the contexts, we can defer
> the release of the slab pages via RCU, with the caveat that individual
> structs may be reused from the freelist within an RCU grace period. To
> handle that, we have to avoid clearing members of the zombie struct.
> 
> This is required for a later patch to handle locking around virtual
> requests in the signaler, as those requests may want to move between
> engines and be destroyed while we are holding b->irq_lock on a physical
> engine.
> 
> v2: Drop mutex_reinit(), if we never mark the mutex as destroyed we
> don't need to reset the debug code, at the loss of having the mutex
> debug code spot us attempting to destroy a locked mutex.
> v3: As the intended use will remain strongly referenced counted, with
> very little inflight access across reuse, drop the ctor.
> v4: Drop the unrequired change to remove the temporary reference around
> dropping the active context, and add back some more missing ctor
> operations.
> v5: The ctor is back. Tvrtko spotted that ce->signal_lock [introduced
> later] maybe accessed under RCU and so needs special care not to be
> reinitialised.
> v6: Don't mix SLAB_TYPESAFE_BY_RCU and RCU list iteration.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/gt/intel_context.c       | 12 +++++++++---
>   drivers/gpu/drm/i915/gt/intel_context_types.h | 11 ++++++++++-
>   2 files changed, 19 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/gt/intel_context.c b/drivers/gpu/drm/i915/gt/intel_context.c
> index 92a3f25c4006..d3a835212167 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context.c
> +++ b/drivers/gpu/drm/i915/gt/intel_context.c
> @@ -25,11 +25,18 @@ static struct intel_context *intel_context_alloc(void)
>   	return kmem_cache_zalloc(global.slab_ce, GFP_KERNEL);
>   }
>   
> -void intel_context_free(struct intel_context *ce)
> +static void rcu_context_free(struct rcu_head *rcu)
>   {
> +	struct intel_context *ce = container_of(rcu, typeof(*ce), rcu);
> +
>   	kmem_cache_free(global.slab_ce, ce);
>   }
>   
> +void intel_context_free(struct intel_context *ce)
> +{
> +	call_rcu(&ce->rcu, rcu_context_free);
> +}
> +
>   struct intel_context *
>   intel_context_create(struct intel_engine_cs *engine)
>   {
> @@ -356,8 +363,7 @@ static int __intel_context_active(struct i915_active *active)
>   }
>   
>   void
> -intel_context_init(struct intel_context *ce,
> -		   struct intel_engine_cs *engine)
> +intel_context_init(struct intel_context *ce, struct intel_engine_cs *engine)
>   {
>   	GEM_BUG_ON(!engine->cops);
>   	GEM_BUG_ON(!engine->gt->vm);
> diff --git a/drivers/gpu/drm/i915/gt/intel_context_types.h b/drivers/gpu/drm/i915/gt/intel_context_types.h
> index 552cb57a2e8c..20cb5835d1c3 100644
> --- a/drivers/gpu/drm/i915/gt/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/gt/intel_context_types.h
> @@ -44,7 +44,16 @@ struct intel_context_ops {
>   };
>   
>   struct intel_context {
> -	struct kref ref;
> +	/*
> +	 * Note: Some fields may be accessed under RCU.
> +	 *
> +	 * Unless otherwise noted a field can safely be assumed to be protected
> +	 * by strong reference counting.
> +	 */
> +	union {
> +		struct kref ref; /* no kref_get_unless_zero()! */
> +		struct rcu_head rcu;
> +	};
>   
>   	struct intel_engine_cs *engine;
>   	struct intel_engine_cs *inflight;
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-18 11:24     ` Chris Wilson
@ 2020-11-18 11:38       ` Tvrtko Ursulin
  2020-11-18 12:10         ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-18 11:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/11/2020 11:24, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
>>
>> On 17/11/2020 11:30, Chris Wilson wrote:
>>> Since preempt-to-busy, we may unsubmit a request while it is still on
>>> the HW and completes asynchronously. That means it may be retired and in
>>> the process destroy the virtual engine (as the user has closed their
>>> context), but that engine may still be holding onto the unsubmitted
>>> compelted request. Therefore we need to potentially cleanup the old
>>> request on destroying the virtual engine. We also have to keep the
>>> virtual_engine alive until after the sibling's execlists_dequeue() have
>>> finished peeking into the virtual engines, for which we serialise with
>>> RCU.
>>>
>>> v2: Be paranoid and flush the tasklet as well.
>>> v3: And flush the tasklet before the engines, as the tasklet may
>>> re-attach an rb_node after our removal from the siblings.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
>>>    1 file changed, 54 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> index 17cb7060eb29..c11433884cf6 100644
>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>> @@ -182,6 +182,7 @@
>>>    struct virtual_engine {
>>>        struct intel_engine_cs base;
>>>        struct intel_context context;
>>> +     struct rcu_work rcu;
>>>    
>>>        /*
>>>         * We allow only a single request through the virtual engine at a time
>>> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
>>>        return &ve->base.execlists.default_priolist.requests[0];
>>>    }
>>>    
>>> -static void virtual_context_destroy(struct kref *kref)
>>> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
>>>    {
>>>        struct virtual_engine *ve =
>>> -             container_of(kref, typeof(*ve), context.ref);
>>> +             container_of(wrk, typeof(*ve), rcu.work);
>>>        unsigned int n;
>>>    
>>> -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>>> -     GEM_BUG_ON(ve->request);
>>>        GEM_BUG_ON(ve->context.inflight);
>>>    
>>> +     /* Preempt-to-busy may leave a stale request behind. */
>>> +     if (unlikely(ve->request)) {
>>> +             struct i915_request *old;
>>> +
>>> +             spin_lock_irq(&ve->base.active.lock);
>>> +
>>> +             old = fetch_and_zero(&ve->request);
>>> +             if (old) {
>>> +                     GEM_BUG_ON(!i915_request_completed(old));
>>> +                     __i915_request_submit(old);
>>> +                     i915_request_put(old);
>>> +             }
>>> +
>>> +             spin_unlock_irq(&ve->base.active.lock);
>>> +     }
>>> +
>>> +     /*
>>> +      * Flush the tasklet in case it is still running on another core.
>>> +      *
>>> +      * This needs to be done before we remove ourselves from the siblings'
>>> +      * rbtrees as in the case it is running in parallel, it may reinsert
>>> +      * the rb_node into a sibling.
>>> +      */
>>> +     tasklet_kill(&ve->base.execlists.tasklet);
>>
>> Can it still be running after an RCU period?
> 
> I think there is a window between checking to see if the request is
> completed and kicking the tasklet, that is not under the rcu lock and
> opportunity for the request to be retired, and barrier flushed to drop
> the context references.

 From where would that check come?

> I observed the leaked ve->request, but the tasklet_kill, iirc, is
> speculation about possible windows. Admittedly all long test runs have
> been with this patch in place for most of the last year.
> 
>>> +     /* Decouple ourselves from the siblings, no more access allowed. */
>>>        for (n = 0; n < ve->num_siblings; n++) {
>>>                struct intel_engine_cs *sibling = ve->siblings[n];
>>>                struct rb_node *node = &ve->nodes[sibling->id].rb;
>>> -             unsigned long flags;
>>>    
>>>                if (RB_EMPTY_NODE(node))
>>>                        continue;
>>>    
>>> -             spin_lock_irqsave(&sibling->active.lock, flags);
>>> +             spin_lock_irq(&sibling->active.lock);
>>>    
>>>                /* Detachment is lazily performed in the execlists tasklet */
>>>                if (!RB_EMPTY_NODE(node))
>>>                        rb_erase_cached(node, &sibling->execlists.virtual);
>>>    
>>> -             spin_unlock_irqrestore(&sibling->active.lock, flags);
>>> +             spin_unlock_irq(&sibling->active.lock);
>>>        }
>>>        GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
>>> +     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>>>    
>>>        if (ve->context.state)
>>>                __execlists_context_fini(&ve->context);
>>>        intel_context_fini(&ve->context);
>>>    
>>>        intel_engine_free_request_pool(&ve->base);
>>> +     intel_breadcrumbs_free(ve->base.breadcrumbs);
>>
>> This looks to belong to some other patch.
> 
> Some might say I was fixing up an earlier oversight.

Separate patch would be good, with Fixes: probably since it is a memory 
leak and one liner.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-18 11:38       ` Tvrtko Ursulin
@ 2020-11-18 12:10         ` Chris Wilson
  2020-11-19 14:06           ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-18 12:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-18 11:38:43)
> 
> On 18/11/2020 11:24, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
> >>
> >> On 17/11/2020 11:30, Chris Wilson wrote:
> >>> Since preempt-to-busy, we may unsubmit a request while it is still on
> >>> the HW and completes asynchronously. That means it may be retired and in
> >>> the process destroy the virtual engine (as the user has closed their
> >>> context), but that engine may still be holding onto the unsubmitted
> >>> compelted request. Therefore we need to potentially cleanup the old
> >>> request on destroying the virtual engine. We also have to keep the
> >>> virtual_engine alive until after the sibling's execlists_dequeue() have
> >>> finished peeking into the virtual engines, for which we serialise with
> >>> RCU.
> >>>
> >>> v2: Be paranoid and flush the tasklet as well.
> >>> v3: And flush the tasklet before the engines, as the tasklet may
> >>> re-attach an rb_node after our removal from the siblings.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
> >>>    1 file changed, 54 insertions(+), 7 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> index 17cb7060eb29..c11433884cf6 100644
> >>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>> @@ -182,6 +182,7 @@
> >>>    struct virtual_engine {
> >>>        struct intel_engine_cs base;
> >>>        struct intel_context context;
> >>> +     struct rcu_work rcu;
> >>>    
> >>>        /*
> >>>         * We allow only a single request through the virtual engine at a time
> >>> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
> >>>        return &ve->base.execlists.default_priolist.requests[0];
> >>>    }
> >>>    
> >>> -static void virtual_context_destroy(struct kref *kref)
> >>> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >>>    {
> >>>        struct virtual_engine *ve =
> >>> -             container_of(kref, typeof(*ve), context.ref);
> >>> +             container_of(wrk, typeof(*ve), rcu.work);
> >>>        unsigned int n;
> >>>    
> >>> -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> >>> -     GEM_BUG_ON(ve->request);
> >>>        GEM_BUG_ON(ve->context.inflight);
> >>>    
> >>> +     /* Preempt-to-busy may leave a stale request behind. */
> >>> +     if (unlikely(ve->request)) {
> >>> +             struct i915_request *old;
> >>> +
> >>> +             spin_lock_irq(&ve->base.active.lock);
> >>> +
> >>> +             old = fetch_and_zero(&ve->request);
> >>> +             if (old) {
> >>> +                     GEM_BUG_ON(!i915_request_completed(old));
> >>> +                     __i915_request_submit(old);
> >>> +                     i915_request_put(old);
> >>> +             }
> >>> +
> >>> +             spin_unlock_irq(&ve->base.active.lock);
> >>> +     }
> >>> +
> >>> +     /*
> >>> +      * Flush the tasklet in case it is still running on another core.
> >>> +      *
> >>> +      * This needs to be done before we remove ourselves from the siblings'
> >>> +      * rbtrees as in the case it is running in parallel, it may reinsert
> >>> +      * the rb_node into a sibling.
> >>> +      */
> >>> +     tasklet_kill(&ve->base.execlists.tasklet);
> >>
> >> Can it still be running after an RCU period?
> > 
> > I think there is a window between checking to see if the request is
> > completed and kicking the tasklet, that is not under the rcu lock and
> > opportunity for the request to be retired, and barrier flushed to drop
> > the context references.
> 
>  From where would that check come?

The window of opportunity extends all the way from the
i915_request_completed check during unsubmit right until the virtual
engine tasklet is executed -- we do not hold a reference to the virtual
engine for the tasklet, and that request may be retired in the
background, and along with it the virtual engine destroyed.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging
  2020-11-17 13:25     ` Chris Wilson
@ 2020-11-18 15:51       ` Tvrtko Ursulin
  2020-11-19 10:47         ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-18 15:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 17/11/2020 13:25, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-11-17 12:59:44)
>>
>> On 17/11/2020 11:30, Chris Wilson wrote:
>>> +             if (show_request) {
>>> +                     list_for_each_entry_safe(rq, rn, &tl->requests, link)
>>> +                             show_request(m, rq,
>>> +                                          i915_request_is_active(rq) ? "  E" :
>>> +                                          i915_request_is_ready(rq) ? "  Q" :
>>> +                                          "  U");
>>
>> Can we get some consistency between the category counts and flags.
>>
>> s/count/queued/ -> Q
> 
> Hmm, if you are sure. Q would then not match with the engine info.

Sure? Not really. What do we have there? You mean "!/*/+/-" flags? Or 
"E/Q/V" from intel_execlists_show_requests? Right, 'Q' there means 
runnable and it doesn't show queued at all. Yes, why not change everything.

> Still favouring count over queued; I think count indicates more clearly
> that it is the superset, but queued may imply it excludes ready and
> definitely sounds like it should not include inflight.

I am okay with that.

>> ready -> R (also matches with term runnable)
>> active -> E ? hmmm E is consistent with the engine info dump.
>>
>> Not ideal but perhaps every bit of more consistency is good.
> 
> Not sold yet, but not happy with the current flags either.
> 
> If we go with 'R' for ready, we should also update engine info.

Okay we seem to have plenty of options.

U or Q - queued/unready
R or Q - ready/queued (to backend) (Rv/Qv for virtual?)
E or R, or I - executing/running/in-flight

Q -> R -> E
U -> R -> E
U -> Q -> E/R/I
U -> R -> E/I

I don't know.. either one as long as all places use the same.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging
  2020-11-18 15:51       ` Tvrtko Ursulin
@ 2020-11-19 10:47         ` Chris Wilson
  0 siblings, 0 replies; 52+ messages in thread
From: Chris Wilson @ 2020-11-19 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-18 15:51:41)
> 
> On 17/11/2020 13:25, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-11-17 12:59:44)
> >>
> >> On 17/11/2020 11:30, Chris Wilson wrote:
> >>> +             if (show_request) {
> >>> +                     list_for_each_entry_safe(rq, rn, &tl->requests, link)
> >>> +                             show_request(m, rq,
> >>> +                                          i915_request_is_active(rq) ? "  E" :
> >>> +                                          i915_request_is_ready(rq) ? "  Q" :
> >>> +                                          "  U");
> >>
> >> Can we get some consistency between the category counts and flags.
> >>
> >> s/count/queued/ -> Q
> > 
> > Hmm, if you are sure. Q would then not match with the engine info.
> 
> Sure? Not really. What do we have there? You mean "!/*/+/-" flags? Or 
> "E/Q/V" from intel_execlists_show_requests? Right, 'Q' there means 
> runnable and it doesn't show queued at all. Yes, why not change everything.
> 
> > Still favouring count over queued; I think count indicates more clearly
> > that it is the superset, but queued may imply it excludes ready and
> > definitely sounds like it should not include inflight.
> 
> I am okay with that.
> 
> >> ready -> R (also matches with term runnable)
> >> active -> E ? hmmm E is consistent with the engine info dump.
> >>
> >> Not ideal but perhaps every bit of more consistency is good.
> > 
> > Not sold yet, but not happy with the current flags either.
> > 
> > If we go with 'R' for ready, we should also update engine info.
> 
> Okay we seem to have plenty of options.
> 
> U or Q - queued/unready
> R or Q - ready/queued (to backend) (Rv/Qv for virtual?)
> E or R, or I - executing/running/in-flight
> 
> Q -> R -> E
> U -> R -> E
> U -> Q -> E/R/I
> U -> R -> E/I
> 
> I don't know.. either one as long as all places use the same.

URE.

Unready -> ready -> executing.

Unready -> ready is a good pairing, and executing over inflight to avoid
being confused for an infection.

And unready opens up a plethora of jokes by just dropping the 'y'.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-18 12:10         ` Chris Wilson
@ 2020-11-19 14:06           ` Tvrtko Ursulin
  2020-11-19 14:22             ` Chris Wilson
  0 siblings, 1 reply; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-19 14:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 18/11/2020 12:10, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-11-18 11:38:43)
>>
>> On 18/11/2020 11:24, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
>>>>
>>>> On 17/11/2020 11:30, Chris Wilson wrote:
>>>>> Since preempt-to-busy, we may unsubmit a request while it is still on
>>>>> the HW and completes asynchronously. That means it may be retired and in
>>>>> the process destroy the virtual engine (as the user has closed their
>>>>> context), but that engine may still be holding onto the unsubmitted
>>>>> compelted request. Therefore we need to potentially cleanup the old
>>>>> request on destroying the virtual engine. We also have to keep the
>>>>> virtual_engine alive until after the sibling's execlists_dequeue() have
>>>>> finished peeking into the virtual engines, for which we serialise with
>>>>> RCU.
>>>>>
>>>>> v2: Be paranoid and flush the tasklet as well.
>>>>> v3: And flush the tasklet before the engines, as the tasklet may
>>>>> re-attach an rb_node after our removal from the siblings.
>>>>>
>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> ---
>>>>>     drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
>>>>>     1 file changed, 54 insertions(+), 7 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>> index 17cb7060eb29..c11433884cf6 100644
>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>> @@ -182,6 +182,7 @@
>>>>>     struct virtual_engine {
>>>>>         struct intel_engine_cs base;
>>>>>         struct intel_context context;
>>>>> +     struct rcu_work rcu;
>>>>>     
>>>>>         /*
>>>>>          * We allow only a single request through the virtual engine at a time
>>>>> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
>>>>>         return &ve->base.execlists.default_priolist.requests[0];
>>>>>     }
>>>>>     
>>>>> -static void virtual_context_destroy(struct kref *kref)
>>>>> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
>>>>>     {
>>>>>         struct virtual_engine *ve =
>>>>> -             container_of(kref, typeof(*ve), context.ref);
>>>>> +             container_of(wrk, typeof(*ve), rcu.work);
>>>>>         unsigned int n;
>>>>>     
>>>>> -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>>>>> -     GEM_BUG_ON(ve->request);
>>>>>         GEM_BUG_ON(ve->context.inflight);
>>>>>     
>>>>> +     /* Preempt-to-busy may leave a stale request behind. */
>>>>> +     if (unlikely(ve->request)) {
>>>>> +             struct i915_request *old;
>>>>> +
>>>>> +             spin_lock_irq(&ve->base.active.lock);
>>>>> +
>>>>> +             old = fetch_and_zero(&ve->request);
>>>>> +             if (old) {
>>>>> +                     GEM_BUG_ON(!i915_request_completed(old));
>>>>> +                     __i915_request_submit(old);
>>>>> +                     i915_request_put(old);
>>>>> +             }
>>>>> +
>>>>> +             spin_unlock_irq(&ve->base.active.lock);
>>>>> +     }
>>>>> +
>>>>> +     /*
>>>>> +      * Flush the tasklet in case it is still running on another core.
>>>>> +      *
>>>>> +      * This needs to be done before we remove ourselves from the siblings'
>>>>> +      * rbtrees as in the case it is running in parallel, it may reinsert
>>>>> +      * the rb_node into a sibling.
>>>>> +      */
>>>>> +     tasklet_kill(&ve->base.execlists.tasklet);
>>>>
>>>> Can it still be running after an RCU period?
>>>
>>> I think there is a window between checking to see if the request is
>>> completed and kicking the tasklet, that is not under the rcu lock and
>>> opportunity for the request to be retired, and barrier flushed to drop
>>> the context references.
>>
>>   From where would that check come?
> 
> The window of opportunity extends all the way from the
> i915_request_completed check during unsubmit right until the virtual
> engine tasklet is executed -- we do not hold a reference to the virtual
> engine for the tasklet, and that request may be retired in the
> background, and along with it the virtual engine destroyed.

In this case aren't sibling tasklets also a problem?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-19 14:06           ` Tvrtko Ursulin
@ 2020-11-19 14:22             ` Chris Wilson
  2020-11-19 16:17               ` Tvrtko Ursulin
  0 siblings, 1 reply; 52+ messages in thread
From: Chris Wilson @ 2020-11-19 14:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2020-11-19 14:06:00)
> 
> On 18/11/2020 12:10, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2020-11-18 11:38:43)
> >>
> >> On 18/11/2020 11:24, Chris Wilson wrote:
> >>> Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
> >>>>
> >>>> On 17/11/2020 11:30, Chris Wilson wrote:
> >>>>> Since preempt-to-busy, we may unsubmit a request while it is still on
> >>>>> the HW and completes asynchronously. That means it may be retired and in
> >>>>> the process destroy the virtual engine (as the user has closed their
> >>>>> context), but that engine may still be holding onto the unsubmitted
> >>>>> compelted request. Therefore we need to potentially cleanup the old
> >>>>> request on destroying the virtual engine. We also have to keep the
> >>>>> virtual_engine alive until after the sibling's execlists_dequeue() have
> >>>>> finished peeking into the virtual engines, for which we serialise with
> >>>>> RCU.
> >>>>>
> >>>>> v2: Be paranoid and flush the tasklet as well.
> >>>>> v3: And flush the tasklet before the engines, as the tasklet may
> >>>>> re-attach an rb_node after our removal from the siblings.
> >>>>>
> >>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>> ---
> >>>>>     drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
> >>>>>     1 file changed, 54 insertions(+), 7 deletions(-)
> >>>>>
> >>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>>> index 17cb7060eb29..c11433884cf6 100644
> >>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
> >>>>> @@ -182,6 +182,7 @@
> >>>>>     struct virtual_engine {
> >>>>>         struct intel_engine_cs base;
> >>>>>         struct intel_context context;
> >>>>> +     struct rcu_work rcu;
> >>>>>     
> >>>>>         /*
> >>>>>          * We allow only a single request through the virtual engine at a time
> >>>>> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
> >>>>>         return &ve->base.execlists.default_priolist.requests[0];
> >>>>>     }
> >>>>>     
> >>>>> -static void virtual_context_destroy(struct kref *kref)
> >>>>> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
> >>>>>     {
> >>>>>         struct virtual_engine *ve =
> >>>>> -             container_of(kref, typeof(*ve), context.ref);
> >>>>> +             container_of(wrk, typeof(*ve), rcu.work);
> >>>>>         unsigned int n;
> >>>>>     
> >>>>> -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
> >>>>> -     GEM_BUG_ON(ve->request);
> >>>>>         GEM_BUG_ON(ve->context.inflight);
> >>>>>     
> >>>>> +     /* Preempt-to-busy may leave a stale request behind. */
> >>>>> +     if (unlikely(ve->request)) {
> >>>>> +             struct i915_request *old;
> >>>>> +
> >>>>> +             spin_lock_irq(&ve->base.active.lock);
> >>>>> +
> >>>>> +             old = fetch_and_zero(&ve->request);
> >>>>> +             if (old) {
> >>>>> +                     GEM_BUG_ON(!i915_request_completed(old));
> >>>>> +                     __i915_request_submit(old);
> >>>>> +                     i915_request_put(old);
> >>>>> +             }
> >>>>> +
> >>>>> +             spin_unlock_irq(&ve->base.active.lock);
> >>>>> +     }
> >>>>> +
> >>>>> +     /*
> >>>>> +      * Flush the tasklet in case it is still running on another core.
> >>>>> +      *
> >>>>> +      * This needs to be done before we remove ourselves from the siblings'
> >>>>> +      * rbtrees as in the case it is running in parallel, it may reinsert
> >>>>> +      * the rb_node into a sibling.
> >>>>> +      */
> >>>>> +     tasklet_kill(&ve->base.execlists.tasklet);
> >>>>
> >>>> Can it still be running after an RCU period?
> >>>
> >>> I think there is a window between checking to see if the request is
> >>> completed and kicking the tasklet, that is not under the rcu lock and
> >>> opportunity for the request to be retired, and barrier flushed to drop
> >>> the context references.
> >>
> >>   From where would that check come?
> > 
> > The window of opportunity extends all the way from the
> > i915_request_completed check during unsubmit right until the virtual
> > engine tasklet is executed -- we do not hold a reference to the virtual
> > engine for the tasklet, and that request may be retired in the
> > background, and along with it the virtual engine destroyed.
> 
> In this case aren't sibling tasklets also a problem?

The next stanza decouples the siblings. At this point, we know that the
request must have been completed (to retire and drop the context
reference) so at this point nothing should be allowed to kick the
virtual engine tasklet, it's just the outstanding execution we need to
serialise. So the following assertion that nothing did kick the tasklet
as we decoupled the siblings holds. After that assertion, there should
be nothing else that knows about the virtual tasklet.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

* Re: [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine
  2020-11-19 14:22             ` Chris Wilson
@ 2020-11-19 16:17               ` Tvrtko Ursulin
  0 siblings, 0 replies; 52+ messages in thread
From: Tvrtko Ursulin @ 2020-11-19 16:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/11/2020 14:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2020-11-19 14:06:00)
>>
>> On 18/11/2020 12:10, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2020-11-18 11:38:43)
>>>>
>>>> On 18/11/2020 11:24, Chris Wilson wrote:
>>>>> Quoting Tvrtko Ursulin (2020-11-18 11:05:24)
>>>>>>
>>>>>> On 17/11/2020 11:30, Chris Wilson wrote:
>>>>>>> Since preempt-to-busy, we may unsubmit a request while it is still on
>>>>>>> the HW and completes asynchronously. That means it may be retired and in
>>>>>>> the process destroy the virtual engine (as the user has closed their
>>>>>>> context), but that engine may still be holding onto the unsubmitted
>>>>>>> compelted request. Therefore we need to potentially cleanup the old
>>>>>>> request on destroying the virtual engine. We also have to keep the
>>>>>>> virtual_engine alive until after the sibling's execlists_dequeue() have
>>>>>>> finished peeking into the virtual engines, for which we serialise with
>>>>>>> RCU.
>>>>>>>
>>>>>>> v2: Be paranoid and flush the tasklet as well.
>>>>>>> v3: And flush the tasklet before the engines, as the tasklet may
>>>>>>> re-attach an rb_node after our removal from the siblings.
>>>>>>>
>>>>>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>>> ---
>>>>>>>      drivers/gpu/drm/i915/gt/intel_lrc.c | 61 +++++++++++++++++++++++++----
>>>>>>>      1 file changed, 54 insertions(+), 7 deletions(-)
>>>>>>>
>>>>>>> diff --git a/drivers/gpu/drm/i915/gt/intel_lrc.c b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> index 17cb7060eb29..c11433884cf6 100644
>>>>>>> --- a/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> +++ b/drivers/gpu/drm/i915/gt/intel_lrc.c
>>>>>>> @@ -182,6 +182,7 @@
>>>>>>>      struct virtual_engine {
>>>>>>>          struct intel_engine_cs base;
>>>>>>>          struct intel_context context;
>>>>>>> +     struct rcu_work rcu;
>>>>>>>      
>>>>>>>          /*
>>>>>>>           * We allow only a single request through the virtual engine at a time
>>>>>>> @@ -5470,44 +5471,90 @@ static struct list_head *virtual_queue(struct virtual_engine *ve)
>>>>>>>          return &ve->base.execlists.default_priolist.requests[0];
>>>>>>>      }
>>>>>>>      
>>>>>>> -static void virtual_context_destroy(struct kref *kref)
>>>>>>> +static void rcu_virtual_context_destroy(struct work_struct *wrk)
>>>>>>>      {
>>>>>>>          struct virtual_engine *ve =
>>>>>>> -             container_of(kref, typeof(*ve), context.ref);
>>>>>>> +             container_of(wrk, typeof(*ve), rcu.work);
>>>>>>>          unsigned int n;
>>>>>>>      
>>>>>>> -     GEM_BUG_ON(!list_empty(virtual_queue(ve)));
>>>>>>> -     GEM_BUG_ON(ve->request);
>>>>>>>          GEM_BUG_ON(ve->context.inflight);
>>>>>>>      
>>>>>>> +     /* Preempt-to-busy may leave a stale request behind. */
>>>>>>> +     if (unlikely(ve->request)) {
>>>>>>> +             struct i915_request *old;
>>>>>>> +
>>>>>>> +             spin_lock_irq(&ve->base.active.lock);
>>>>>>> +
>>>>>>> +             old = fetch_and_zero(&ve->request);
>>>>>>> +             if (old) {
>>>>>>> +                     GEM_BUG_ON(!i915_request_completed(old));
>>>>>>> +                     __i915_request_submit(old);
>>>>>>> +                     i915_request_put(old);
>>>>>>> +             }
>>>>>>> +
>>>>>>> +             spin_unlock_irq(&ve->base.active.lock);
>>>>>>> +     }
>>>>>>> +
>>>>>>> +     /*
>>>>>>> +      * Flush the tasklet in case it is still running on another core.
>>>>>>> +      *
>>>>>>> +      * This needs to be done before we remove ourselves from the siblings'
>>>>>>> +      * rbtrees as in the case it is running in parallel, it may reinsert
>>>>>>> +      * the rb_node into a sibling.
>>>>>>> +      */
>>>>>>> +     tasklet_kill(&ve->base.execlists.tasklet);
>>>>>>
>>>>>> Can it still be running after an RCU period?
>>>>>
>>>>> I think there is a window between checking to see if the request is
>>>>> completed and kicking the tasklet, that is not under the rcu lock and
>>>>> opportunity for the request to be retired, and barrier flushed to drop
>>>>> the context references.
>>>>
>>>>    From where would that check come?
>>>
>>> The window of opportunity extends all the way from the
>>> i915_request_completed check during unsubmit right until the virtual
>>> engine tasklet is executed -- we do not hold a reference to the virtual
>>> engine for the tasklet, and that request may be retired in the
>>> background, and along with it the virtual engine destroyed.
>>
>> In this case aren't sibling tasklets also a problem?
> 
> The next stanza decouples the siblings. At this point, we know that the
> request must have been completed (to retire and drop the context
> reference) so at this point nothing should be allowed to kick the
> virtual engine tasklet, it's just the outstanding execution we need to
> serialise. So the following assertion that nothing did kick the tasklet
> as we decoupled the siblings holds. After that assertion, there should
> be nothing else that knows about the virtual tasklet.

Let me see in step by step because I am slow today.

1. Tasklet runs, decides to preempt the VE away.
2. VE completes despite that.
3. Userspace closes the context.
4. RCU period.
5. All tasklets which were active during 1 have exited.
    + VE decoupled and
6. VE unlinked from siblings and destroyed.

Re-submit VE tasklet may start as soon after 1 and any time after. If it 
starts after 4, then RCU period from context close does not see it. Okay 
makes sense.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 52+ messages in thread

end of thread, other threads:[~2020-11-19 16:17 UTC | newest]

Thread overview: 52+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-17 11:30 [Intel-gfx] [PATCH 01/28] drm/i915/selftests: Improve granularity for mocs reset checks Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 02/28] drm/i915/selftests: Small tweak to put the termination conditions together Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 03/28] drm/i915/gem: Drop free_work for GEM contexts Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 04/28] drm/i915/gt: Ignore dt==0 for reporting underflows Chris Wilson
2020-11-17 11:42   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 05/28] drm/i915/gt: Track the overall busy time Chris Wilson
2020-11-17 12:44   ` Tvrtko Ursulin
2020-11-17 13:05     ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 06/28] drm/i915/gt: Include semaphore status in print_request() Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 07/28] drm/i915: Lift i915_request_show() Chris Wilson
2020-11-17 12:51   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 08/28] drm/i915/gt: Show all active timelines for debugging Chris Wilson
2020-11-17 12:59   ` Tvrtko Ursulin
2020-11-17 13:25     ` Chris Wilson
2020-11-18 15:51       ` Tvrtko Ursulin
2020-11-19 10:47         ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 09/28] drm/i915: Lift waiter/signaler iterators Chris Wilson
2020-11-17 13:00   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 10/28] drm/i915: Show timeline dependencies for debug Chris Wilson
2020-11-17 13:06   ` Tvrtko Ursulin
2020-11-17 13:30     ` Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 11/28] drm/i915/gt: Defer enabling the breadcrumb interrupt to after submission Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 12/28] drm/i915/gt: Track signaled breadcrumbs outside of the breadcrumb spinlock Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 13/28] drm/i915/gt: Don't cancel the interrupt shadow too early Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 14/28] drm/i915/gt: Free stale request on destroying the virtual engine Chris Wilson
2020-11-18 11:05   ` Tvrtko Ursulin
2020-11-18 11:24     ` Chris Wilson
2020-11-18 11:38       ` Tvrtko Ursulin
2020-11-18 12:10         ` Chris Wilson
2020-11-19 14:06           ` Tvrtko Ursulin
2020-11-19 14:22             ` Chris Wilson
2020-11-19 16:17               ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 15/28] drm/i915/gt: Protect context lifetime with RCU Chris Wilson
2020-11-18 11:36   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 16/28] drm/i915/gt: Split the breadcrumb spinlock between global and contexts Chris Wilson
2020-11-18 11:35   ` Tvrtko Ursulin
2020-11-17 11:30 ` [Intel-gfx] [PATCH 17/28] drm/i915/gt: Move the breadcrumb to the signaler if completed upon cancel Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 18/28] drm/i915/gt: Decouple completed requests on unwind Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 19/28] drm/i915/gt: Check for a completed last request once Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 20/28] drm/i915/gt: Replace direct submit with direct call to tasklet Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 21/28] drm/i915/gt: ce->inflight updates are now serialised Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 22/28] drm/i915/gt: Use virtual_engine during execlists_dequeue Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 23/28] drm/i915/gt: Decouple inflight virtual engines Chris Wilson
2020-11-17 11:30 ` [Intel-gfx] [PATCH 24/28] drm/i915/gt: Defer schedule_out until after the next dequeue Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 25/28] drm/i915/gt: Remove virtual breadcrumb before transfer Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 26/28] drm/i915/gt: Shrink the critical section for irq signaling Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 27/28] drm/i915/gt: Resubmit the virtual engine on schedule-out Chris Wilson
2020-11-17 11:31 ` [Intel-gfx] [PATCH 28/28] drm/i915/gt: Simplify virtual engine handling for execlists_hold() Chris Wilson
2020-11-17 18:54 ` [Intel-gfx] ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/28] drm/i915/selftests: Improve granularity for mocs reset checks Patchwork
2020-11-17 18:56 ` [Intel-gfx] ✗ Fi.CI.SPARSE: " Patchwork
2020-11-17 19:24 ` [Intel-gfx] ✓ Fi.CI.BAT: success " Patchwork
2020-11-17 22:56 ` [Intel-gfx] ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.