All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption
@ 2019-03-01 14:03 Chris Wilson
  2019-03-01 14:03 ` [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex Chris Wilson
                   ` (40 more replies)
  0 siblings, 41 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!

v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4f2187aa44e4..f57cfe2fc078 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,6 +164,8 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
@@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static int effective_prio(const struct i915_request *rq)
 {
+	int prio = rq_prio(rq);
+
+	/*
+	 * On unwinding the active request, we give it a priority bump
+	 * equivalent to a freshly submitted request. This protects it from
+	 * being gazumped again, but it would be preferable if we didn't
+	 * let it be gazumped in the first place!
+	 *
+	 * See __unwind_incomplete_requests()
+	 */
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
+		/*
+		 * After preemption, we insert the active request at the
+		 * end of the new priority level. This means that we will be
+		 * _lower_ priority than the preemptee all things equal (and
+		 * so the preemption is valid), so adjust our comparison
+		 * accordingly.
+		 */
+		prio |= ACTIVE_PRIORITY;
+		prio--;
+	}
+
 	/* Restrict mere WAIT boosts from triggering preemption */
-	return rq_prio(rq) | __NO_PREEMPTION;
+	return prio | __NO_PREEMPTION;
 }
 
 static int queue_prio(const struct intel_engine_execlists *execlists)
@@ -359,7 +383,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
-	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
+	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
 
 	lockdep_assert_held(&engine->timeline.lock);
 
@@ -390,9 +414,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * The active request is now effectively the start of a new client
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
+	 *
+	 * One consequence of this preemption boost is that we may jump
+	 * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
+	 * making those priorities non-preemptible. They will be moved forward
+	 * in the priority queue, but they will not gain immediate access to
+	 * the GPU.
 	 */
-	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
-		prio |= I915_PRIORITY_NEWCLIENT;
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
+		prio |= ACTIVE_PRIORITY;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:09   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 03/38] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
                   ` (39 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

A simple mutex used for guarding the flow of requests in and out of the
timeline. In the short-term, it will be used only to guard the addition
of requests into the timeline, taken on alloc and released on commit so
that only one caller can construct a request into the timeline
(important as the seqno and ring pointers must be serialised). This will
be used by observers to ensure that the seqno/hwsp is stable. Later,
when we have reduced retiring to only operate on a single timeline at a
time, we can then use the mutex as the sole guard required for retiring.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c            | 6 +++++-
 drivers/gpu/drm/i915/i915_timeline.c           | 1 +
 drivers/gpu/drm/i915/i915_timeline.h           | 2 ++
 drivers/gpu/drm/i915/selftests/i915_request.c  | 4 +---
 drivers/gpu/drm/i915/selftests/mock_timeline.c | 1 +
 5 files changed, 10 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c65f6c990fdd..719d1a5ab082 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -563,6 +563,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		return ERR_CAST(ce);
 
 	reserve_gt(i915);
+	mutex_lock(&ce->ring->timeline->mutex);
 
 	/* Move our oldest request to the slab-cache (if not in use!) */
 	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
@@ -688,6 +689,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
+	mutex_unlock(&ce->ring->timeline->mutex);
 	unreserve_gt(i915);
 	intel_context_unpin(ce);
 	return ERR_PTR(ret);
@@ -880,7 +882,7 @@ void i915_request_add(struct i915_request *request)
 	GEM_TRACE("%s fence %llx:%lld\n",
 		  engine->name, request->fence.context, request->fence.seqno);
 
-	lockdep_assert_held(&request->i915->drm.struct_mutex);
+	lockdep_assert_held(&request->timeline->mutex);
 	trace_i915_request_add(request);
 
 	/*
@@ -991,6 +993,8 @@ void i915_request_add(struct i915_request *request)
 	 */
 	if (prev && i915_request_completed(prev))
 		i915_request_retire_upto(prev);
+
+	mutex_unlock(&request->timeline->mutex);
 }
 
 static unsigned long local_clock_us(unsigned int *cpu)
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b2202d2e58a2..87a80558da28 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -162,6 +162,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
+	mutex_init(&timeline->mutex);
 
 	INIT_ACTIVE_REQUEST(&timeline->barrier);
 	INIT_ACTIVE_REQUEST(&timeline->last_request);
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 7bec7d2e45bf..36c3849f7108 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -44,6 +44,8 @@ struct i915_timeline {
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
 
+	struct mutex mutex; /* protects the flow of requests */
+
 	unsigned int pin_count;
 	const u32 *hwsp_seqno;
 	struct i915_vma *hwsp_ggtt;
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7da52e3d67af..7e1b65b8eb19 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -141,14 +141,12 @@ static int igt_fence_wait(void *arg)
 		err = -ENOMEM;
 		goto out_locked;
 	}
-	mutex_unlock(&i915->drm.struct_mutex); /* safe as we are single user */
 
 	if (dma_fence_wait_timeout(&request->fence, false, T) != -ETIME) {
 		pr_err("fence wait success before submit (expected timeout)!\n");
-		goto out_device;
+		goto out_locked;
 	}
 
-	mutex_lock(&i915->drm.struct_mutex);
 	i915_request_add(request);
 	mutex_unlock(&i915->drm.struct_mutex);
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
index d2de9ece2118..416d85233263 100644
--- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
@@ -14,6 +14,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
 	timeline->fence_context = context;
 
 	spin_lock_init(&timeline->lock);
+	mutex_init(&timeline->mutex);
 
 	INIT_ACTIVE_REQUEST(&timeline->barrier);
 	INIT_ACTIVE_REQUEST(&timeline->last_request);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 03/38] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
  2019-03-01 14:03 ` [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 04/38] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
                   ` (38 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound
v6: Pack cacheline into empty bits of page-aligned vaddr
v7: Use i915_utils to hide the pointer casting around bit manipulation

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c           |  31 +-
 drivers/gpu/drm/i915/i915_request.h           |  11 +
 drivers/gpu/drm/i915/i915_timeline.c          | 293 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_timeline.h          |  11 +-
 .../gpu/drm/i915/selftests/i915_timeline.c    | 113 +++++++
 5 files changed, 420 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 719d1a5ab082..d354967d6ae8 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -325,11 +325,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	return tl->seqno += 1 + tl->has_initial_breadcrumb;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -532,8 +527,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -610,24 +607,27 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
-
 	INIT_LIST_HEAD(&rq->active_list);
+
+	tl = ce->ring->timeline;
+	ret = i915_timeline_get_seqno(tl, rq, &seqno);
+	if (ret)
+		goto err_free;
+
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
+	rq->timeline = tl;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->hwsp_cacheline = tl->hwsp_cacheline;
+	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
 	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -687,6 +687,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	mutex_unlock(&ce->ring->timeline->mutex);
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index be3ded6bcf56..ea1e6f0ade53 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -38,6 +38,7 @@ struct drm_file;
 struct drm_i915_gem_object;
 struct i915_request;
 struct i915_timeline;
+struct i915_timeline_cacheline;
 
 struct i915_capture_list {
 	struct i915_capture_list *next;
@@ -148,6 +149,16 @@ struct i915_request {
 	 */
 	const u32 *hwsp_seqno;
 
+	/*
+	 * If we need to access the timeline's seqno for this request in
+	 * another request, we need to keep a read reference to this associated
+	 * cacheline, so that we do not free and recycle it before the foriegn
+	 * observers have completed. Hence, we keep a pointer to the cacheline
+	 * inside the timeline's HWSP vma, but it is only valid while this
+	 * request has not completed and guarded by the timeline mutex.
+	 */
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	/** Position in the ring of the start of the request */
 	u32 head;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 87a80558da28..8484ba6e51d1 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -6,19 +6,32 @@
 
 #include "i915_drv.h"
 
-#include "i915_timeline.h"
+#include "i915_active.h"
 #include "i915_syncmap.h"
+#include "i915_timeline.h"
+
+#define ptr_set_bit(ptr, bit) ((typeof(ptr))((unsigned long)(ptr) | BIT(bit)))
+#define ptr_test_bit(ptr, bit) ((unsigned long)(ptr) & BIT(bit))
 
 struct i915_timeline_hwsp {
-	struct i915_vma *vma;
+	struct i915_gt_timelines *gt;
 	struct list_head free_link;
+	struct i915_vma *vma;
 	u64 free_bitmap;
 };
 
-static inline struct i915_timeline_hwsp *
-i915_timeline_hwsp(const struct i915_timeline *tl)
+struct i915_timeline_cacheline {
+	struct i915_active active;
+	struct i915_timeline_hwsp *hwsp;
+	void *vaddr;
+#define CACHELINE_BITS 6
+#define CACHELINE_FREE CACHELINE_BITS
+};
+
+static inline struct drm_i915_private *
+hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
 {
-	return tl->hwsp_ggtt->private;
+	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
 }
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -71,6 +84,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->gt = gt;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -88,14 +102,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 	return hwsp->vma;
 }
 
-static void hwsp_free(struct i915_timeline *timeline)
+static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-	struct i915_timeline_hwsp *hwsp;
-
-	hwsp = i915_timeline_hwsp(timeline);
-	if (!hwsp) /* leave global HWSP alone! */
-		return;
+	struct i915_gt_timelines *gt = hwsp->gt;
 
 	spin_lock(&gt->hwsp_lock);
 
@@ -103,7 +112,8 @@ static void hwsp_free(struct i915_timeline *timeline)
 	if (!hwsp->free_bitmap)
 		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
 
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
+	hwsp->free_bitmap |= BIT_ULL(cacheline);
 
 	/* And if no one is left using it, give the page back to the system */
 	if (hwsp->free_bitmap == ~0ull) {
@@ -115,6 +125,76 @@ static void hwsp_free(struct i915_timeline *timeline)
 	spin_unlock(&gt->hwsp_lock);
 }
 
+static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
+
+	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
+	i915_vma_put(cl->hwsp->vma);
+	__idle_hwsp_free(cl->hwsp, ptr_unmask_bits(cl->vaddr, CACHELINE_BITS));
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
+static void __cacheline_retire(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	i915_vma_unpin(cl->hwsp->vma);
+	if (ptr_test_bit(cl->vaddr, CACHELINE_FREE))
+		__idle_cacheline_free(cl);
+}
+
+static struct i915_timeline_cacheline *
+cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
+{
+	struct i915_timeline_cacheline *cl;
+	void *vaddr;
+
+	GEM_BUG_ON(cacheline >= BIT(CACHELINE_BITS));
+
+	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
+	if (!cl)
+		return ERR_PTR(-ENOMEM);
+
+	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		kfree(cl);
+		return ERR_CAST(vaddr);
+	}
+
+	i915_vma_get(hwsp->vma);
+	cl->hwsp = hwsp;
+	cl->vaddr = page_pack_bits(vaddr, cacheline);
+
+	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
+
+	return cl;
+}
+
+static void cacheline_acquire(struct i915_timeline_cacheline *cl)
+{
+	if (cl && i915_active_acquire(&cl->active))
+		__i915_vma_pin(cl->hwsp->vma);
+}
+
+static void cacheline_release(struct i915_timeline_cacheline *cl)
+{
+	if (cl)
+		i915_active_release(&cl->active);
+}
+
+static void cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(ptr_test_bit(cl->vaddr, CACHELINE_FREE));
+	cl->vaddr = ptr_set_bit(cl->vaddr, CACHELINE_FREE);
+
+	if (i915_active_is_idle(&cl->active))
+		__idle_cacheline_free(cl);
+}
+
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
@@ -136,29 +216,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
+	timeline->hwsp_cacheline = NULL;
 
-	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
+		struct i915_timeline_cacheline *cl;
 		unsigned int cacheline;
 
 		hwsp = hwsp_alloc(timeline, &cacheline);
 		if (IS_ERR(hwsp))
 			return PTR_ERR(hwsp);
 
+		cl = cacheline_alloc(hwsp->private, cacheline);
+		if (IS_ERR(cl)) {
+			__idle_hwsp_free(hwsp->private, cacheline);
+			return PTR_ERR(cl);
+		}
+
+		timeline->hwsp_cacheline = cl;
 		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
-	}
-	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		hwsp_free(timeline);
-		i915_vma_put(hwsp);
-		return PTR_ERR(vaddr);
+		vaddr = page_mask_bits(cl->vaddr);
+	} else {
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+
+		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 	}
 
 	timeline->hwsp_seqno =
 		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
+	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
+
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
@@ -240,9 +331,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
-	hwsp_free(timeline);
 
-	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	if (timeline->hwsp_cacheline)
+		cacheline_free(timeline->hwsp_cacheline);
+	else
+		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+
 	i915_vma_put(timeline->hwsp_ggtt);
 }
 
@@ -285,6 +379,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	cacheline_acquire(tl->hwsp_cacheline);
 	timeline_add_to_active(tl);
 
 	return 0;
@@ -294,6 +389,157 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno -= 1 + tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl,
+			  struct i915_request *rq,
+			  u32 *seqno)
+{
+	struct i915_timeline_cacheline *cl;
+	unsigned int cacheline;
+	struct i915_vma *vma;
+	void *vaddr;
+	int err;
+
+	/*
+	 * If there is an outstanding GPU reference to this cacheline,
+	 * such as it being sampled by a HW semaphore on another timeline,
+	 * we cannot wraparound our seqno value (the HW semaphore does
+	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
+	 * So if the cacheline is still busy, we must detach ourselves
+	 * from it and leave it inflight alongside its users.
+	 *
+	 * However, if nobody is watching and we can guarantee that nobody
+	 * will, we could simply reuse the same cacheline.
+	 *
+	 * if (i915_active_request_is_signaled(&tl->last_request) &&
+	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
+	 *	return 0;
+	 *
+	 * That seems unlikely for a busy timeline that needed to wrap in
+	 * the first place, so just replace the cacheline.
+	 */
+
+	vma = hwsp_alloc(tl, &cacheline);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_rollback;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err) {
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_rollback;
+	}
+
+	cl = cacheline_alloc(vma->private, cacheline);
+	if (IS_ERR(cl)) {
+		err = PTR_ERR(cl);
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_unpin;
+	}
+	GEM_BUG_ON(cl->hwsp->vma != vma);
+
+	/*
+	 * Attach the old cacheline to the current request, so that we only
+	 * free it after the current request is retired, which ensures that
+	 * all writes into the cacheline from previous requests are complete.
+	 */
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context, rq);
+	if (err)
+		goto err_cacheline;
+
+	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
+	cacheline_free(tl->hwsp_cacheline);
+
+	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
+	i915_vma_put(tl->hwsp_ggtt);
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+
+	vaddr = page_mask_bits(cl->vaddr);
+	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
+	tl->hwsp_seqno =
+		memset(vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
+
+	tl->hwsp_offset += i915_ggtt_offset(vma);
+
+	cacheline_acquire(cl);
+	tl->hwsp_cacheline = cl;
+
+	*seqno = timeline_advance(tl);
+	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
+	return 0;
+
+err_cacheline:
+	cacheline_free(cl);
+err_unpin:
+	i915_vma_unpin(vma);
+err_rollback:
+	timeline_rollback(tl);
+	return err;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && tl->hwsp_cacheline))
+		return __i915_timeline_get_seqno(tl, rq, seqno);
+
+	return 0;
+}
+
+static int cacheline_ref(struct i915_timeline_cacheline *cl,
+			 struct i915_request *rq)
+{
+	return i915_active_ref(&cl->active, rq->fence.context, rq);
+}
+
+int i915_timeline_read_hwsp(struct i915_request *from,
+			    struct i915_request *to,
+			    u32 *hwsp)
+{
+	struct i915_timeline_cacheline *cl = from->hwsp_cacheline;
+	struct i915_timeline *tl = from->timeline;
+	int err;
+
+	GEM_BUG_ON(to->timeline == tl);
+
+	mutex_lock_nested(&tl->mutex, SINGLE_DEPTH_NESTING);
+	err = i915_request_completed(from);
+	if (!err)
+		err = cacheline_ref(cl, to);
+	if (!err) {
+		if (likely(cl == tl->hwsp_cacheline)) {
+			*hwsp = tl->hwsp_offset;
+		} else { /* across a seqno wrap, recover the original offset */
+			*hwsp = i915_ggtt_offset(cl->hwsp->vma) +
+				ptr_unmask_bits(cl->vaddr, CACHELINE_BITS) *
+				CACHELINE_BYTES;
+		}
+	}
+	mutex_unlock(&tl->mutex);
+
+	return err;
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -301,6 +547,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 		return;
 
 	timeline_remove_from_active(tl);
+	cacheline_release(tl->hwsp_cacheline);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 36c3849f7108..60b1dfad93ed 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -34,7 +34,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
-struct i915_timeline_hwsp;
+struct i915_timeline_cacheline;
 
 struct i915_timeline {
 	u64 fence_context;
@@ -51,6 +51,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	bool has_initial_breadcrumb;
 
 	/**
@@ -162,8 +164,15 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
+int i915_timeline_read_hwsp(struct i915_request *from,
+			    struct i915_request *until,
+			    u32 *hwsp_offset);
+
 void i915_timelines_init(struct drm_i915_private *i915);
 void i915_timelines_park(struct drm_i915_private *i915);
 void i915_timelines_fini(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 12ea69b1a1e5..844701759ffc 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -641,6 +641,118 @@ static int live_hwsp_alternate(void *arg)
 #undef NUM_TIMELINES
 }
 
+static int live_hwsp_wrap(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_timeline *tl;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	/*
+	 * Across a seqno wrap, we need to keep the old cacheline alive for
+	 * foreign GPU references.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	tl = i915_timeline_create(i915, __func__, NULL);
+	if (IS_ERR(tl)) {
+		err = PTR_ERR(tl);
+		goto out_rpm;
+	}
+	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+		goto out_free;
+
+	err = i915_timeline_pin(tl);
+	if (err)
+		goto out_free;
+
+	for_each_engine(engine, i915, id) {
+		const u32 *hwsp_seqno[2];
+		struct i915_request *rq;
+		u32 seqno[2];
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		tl->seqno = -4u;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
+			 seqno[0], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[0] = tl->hwsp_seqno;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
+			 seqno[1], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[1] = tl->hwsp_seqno;
+
+		/* With wrap should come a new hwsp */
+		GEM_BUG_ON(seqno[1] >= seqno[0]);
+		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
+
+		i915_request_add(rq);
+
+		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+			pr_err("Wait for timeline writes timed out!\n");
+			err = -EIO;
+			goto out;
+		}
+
+		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
+			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
+			       *hwsp_seqno[0], *hwsp_seqno[1],
+			       seqno[0], seqno[1]);
+			err = -EINVAL;
+			goto out;
+		}
+
+		i915_retire_requests(i915); /* recycle HWSP */
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	i915_timeline_unpin(tl);
+out_free:
+	i915_timeline_put(tl);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
 static int live_hwsp_recycle(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -723,6 +835,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_hwsp_recycle),
 		SUBTEST(live_hwsp_engine),
 		SUBTEST(live_hwsp_alternate),
+		SUBTEST(live_hwsp_wrap),
 	};
 
 	return i915_subtests(tests, i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 04/38] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
  2019-03-01 14:03 ` [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex Chris Wilson
  2019-03-01 14:03 ` [PATCH 03/38] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:11   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 05/38] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
                   ` (37 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the 3x reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement (can be up to ~10%, most see no change)
for single clients that utilize multiple engines (typically media players
and transcoders), without regressing multiple clients that can saturate
the system or changing the power envelope dramatically.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Testcase: igt/gem_exec_whisper
Testcase: igt/benchmarks/gem_wsim
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c           |   2 +-
 drivers/gpu/drm/i915/i915_request.c       | 138 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h       |   1 +
 drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
 drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   9 +-
 drivers/gpu/drm/i915/intel_lrc.c          |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
 include/uapi/drm/i915_drm.h               |   1 +
 10 files changed, 160 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c6354f6cdbdb..c08abdef5eb6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -351,7 +351,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
 		break;
 	case I915_PARAM_HAS_SEMAPHORES:
-		value = 0;
+		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
 		break;
 	case I915_PARAM_HAS_SECURE_BATCHES:
 		value = capable(CAP_SYS_ADMIN);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index d354967d6ae8..59e30b8c4ee9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -22,8 +22,9 @@
  *
  */
 
-#include <linux/prefetch.h>
 #include <linux/dma-fence-array.h>
+#include <linux/irq_work.h>
+#include <linux/prefetch.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/signal.h>
@@ -32,9 +33,16 @@
 #include "i915_active.h"
 #include "i915_reset.h"
 
+struct execute_cb {
+	struct list_head link;
+	struct irq_work work;
+	struct i915_sw_fence *fence;
+};
+
 static struct i915_global_request {
 	struct kmem_cache *slab_requests;
 	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_execute_cbs;
 } global;
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
@@ -325,6 +333,69 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kmem_cache_free(global.slab_execute_cbs, cb);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	/*
+	 * XXX Rollback on __i915_request_unsubmit()
+	 *
+	 * In the future, perhaps when we have an active time-slicing scheduler,
+	 * it will be interesting to unsubmit parallel execution and remove
+	 * busywaits from the GPU until their master is restarted. This is
+	 * quite hairy, we have to carefully rollback the fence and do a
+	 * preempt-to-idle cycle on the target engine, all the while the
+	 * master execute_cb may refire.
+	 */
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
+static int
+i915_request_await_execution(struct i915_request *rq,
+			     struct i915_request *signal,
+			     gfp_t gfp)
+{
+	struct execute_cb *cb;
+
+	if (i915_request_is_active(signal))
+		return 0;
+
+	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
+	if (!cb)
+		return -ENOMEM;
+
+	cb->fence = &rq->submit;
+	i915_sw_fence_await(cb->fence);
+	init_irq_work(&cb->work, irq_execute_cb);
+
+	spin_lock_irq(&signal->lock);
+	if (i915_request_is_active(signal)) {
+		i915_sw_fence_complete(cb->fence);
+		kmem_cache_free(global.slab_execute_cbs, cb);
+	} else {
+		list_add_tail(&cb->link, &signal->execute_cb);
+	}
+	spin_unlock_irq(&signal->lock);
+
+	return 0;
+}
+
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -361,6 +432,8 @@ void __i915_request_submit(struct i915_request *request)
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
 
+	__notify_execute_cb(request);
+
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
@@ -608,6 +681,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	}
 
 	INIT_LIST_HEAD(&rq->active_list);
+	INIT_LIST_HEAD(&rq->execute_cb);
 
 	tl = ce->ring->timeline;
 	ret = i915_timeline_get_seqno(tl, rq, &seqno);
@@ -696,6 +770,52 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	return ERR_PTR(ret);
 }
 
+static int
+emit_semaphore_wait(struct i915_request *to,
+		    struct i915_request *from,
+		    gfp_t gfp)
+{
+	u32 hwsp_offset;
+	u32 *cs;
+	int err;
+
+	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
+	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
+
+	/* We need to pin the signaler's HWSP until we are finished reading. */
+	err = i915_timeline_read_hwsp(from, to, &hwsp_offset);
+	if (err)
+		return err;
+
+	/* Only submit our spinner after the signaler is running! */
+	err = i915_request_await_execution(to, from, gfp);
+	if (err)
+		return err;
+
+	cs = intel_ring_begin(to, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Using greater-than-or-equal here means we have to worry
+	 * about seqno wraparound. To side step that issue, we swap
+	 * the timeline HWSP upon wrapping, so that everyone listening
+	 * for the old (pre-wrap) values do not see the much smaller
+	 * (post-wrap) values than they were expecting (and so wait
+	 * forever).
+	 */
+	*cs++ = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_GTE_SDD;
+	*cs++ = from->fence.seqno;
+	*cs++ = hwsp_offset;
+	*cs++ = 0;
+
+	intel_ring_advance(to, cs);
+	return 0;
+}
+
 static int
 i915_request_await_request(struct i915_request *to, struct i915_request *from)
 {
@@ -717,6 +837,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
 						       I915_FENCE_GFP);
+	} else if (intel_engine_has_semaphores(to->engine) &&
+		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
+		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
 	} else {
 		ret = i915_sw_fence_await_dma_fence(&to->submit,
 						    &from->fence, 0,
@@ -1208,14 +1331,23 @@ int __init i915_global_request_init(void)
 	if (!global.slab_requests)
 		return -ENOMEM;
 
+	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
+					     SLAB_HWCACHE_ALIGN |
+					     SLAB_RECLAIM_ACCOUNT |
+					     SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_execute_cbs)
+		goto err_requests;
+
 	global.slab_dependencies = KMEM_CACHE(i915_dependency,
 					      SLAB_HWCACHE_ALIGN |
 					      SLAB_RECLAIM_ACCOUNT);
 	if (!global.slab_dependencies)
-		goto err_requests;
+		goto err_execute_cbs;
 
 	return 0;
 
+err_execute_cbs:
+	kmem_cache_destroy(global.slab_execute_cbs);
 err_requests:
 	kmem_cache_destroy(global.slab_requests);
 	return -ENOMEM;
@@ -1224,11 +1356,13 @@ int __init i915_global_request_init(void)
 void i915_global_request_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_execute_cbs);
 	kmem_cache_shrink(global.slab_requests);
 }
 
 void i915_global_request_exit(void)
 {
 	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_execute_cbs);
 	kmem_cache_destroy(global.slab_requests);
 }
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index ea1e6f0ade53..3e0d62d35226 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -129,6 +129,7 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
+	struct list_head execute_cb;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 7c58b049ecb5..8d1400d378d7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
 	__i915_sw_fence_notify(fence, FENCE_FREE);
 }
 
-static void i915_sw_fence_complete(struct i915_sw_fence *fence)
+void i915_sw_fence_complete(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 
@@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-static void i915_sw_fence_await(struct i915_sw_fence *fence)
+void i915_sw_fence_await(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 0e055ea0179f..6dec9e1d1102 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
+void i915_sw_fence_await(struct i915_sw_fence *fence);
+void i915_sw_fence_complete(struct i915_sw_fence *fence);
+
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
 {
 	return atomic_read(&fence->pending) <= 0;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index df8f88142f1d..4fc7e2ac6278 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -616,6 +616,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 	} map[] = {
 #define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
 		MAP(PREEMPTION, PREEMPTION),
+		MAP(SEMAPHORES, SEMAPHORES),
 #undef MAP
 	};
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index b96a31bc1080..a34ece53a771 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -105,8 +105,13 @@
 #define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
-#define   MI_SEMAPHORE_POLL		(1<<15)
-#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+#define   MI_SEMAPHORE_POLL		(1 << 15)
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0 << 12)
+#define   MI_SEMAPHORE_SAD_GTE_SDD	(1 << 12)
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2 << 12)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3 << 12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4 << 12)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5 << 12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f57cfe2fc078..53d6f7fdb50e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2324,6 +2324,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->park = NULL;
 	engine->unpark = NULL;
 
+	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b8ec7e40a59b..2e7264119ec4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -499,6 +499,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 	unsigned int flags;
 
 	/*
@@ -576,6 +577,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+static inline bool
+intel_engine_has_semaphores(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
+}
+
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
 static inline bool __execlists_need_preempt(int prio, int last)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8304a7f1ec3f..b10eea3f6d24 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -479,6 +479,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
 #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
+#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 05/38] drm/i915: Prioritise non-busywait semaphore workloads
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (2 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 04/38] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:12   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible Chris Wilson
                   ` (36 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

v2: Stop rolling the bits into a chain and just use a flag in case this
request or any of our dependencies use a semaphore. The rolling around
was contagious as Tvrtko was heard to fall off his chair.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c |  6 ++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  9 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c      |  2 +-
 4 files changed, 29 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 59e30b8c4ee9..bcf3c1a155e2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -813,6 +813,7 @@ emit_semaphore_wait(struct i915_request *to,
 	*cs++ = 0;
 
 	intel_ring_advance(to, cs);
+	to->sched.flags |= I915_SCHED_HAS_SEMAPHORE;
 	return 0;
 }
 
@@ -1083,6 +1084,21 @@ void i915_request_add(struct i915_request *request)
 	if (engine->schedule) {
 		struct i915_sched_attr attr = request->gem_context->sched;
 
+		/*
+		 * Boost actual workloads past semaphores!
+		 *
+		 * With semaphores we spin on one engine waiting for another,
+		 * simply to reduce the latency of starting our work when
+		 * the signaler completes. However, if there is any other
+		 * work that we could be doing on this engine instead, that
+		 * is better utilisation and will reduce the overall duration
+		 * of the current work. To avoid PI boosting a semaphore
+		 * far in the distance past over useful work, we keep a history
+		 * of any semaphore use along our dependency chain.
+		 */
+		if (!(request->sched.flags & I915_SCHED_HAS_SEMAPHORE))
+			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
+
 		/*
 		 * Boost priorities to new clients (new request flows).
 		 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 50018ad30233..8a64748a7912 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -39,6 +39,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->flags = 0;
 }
 
 static struct i915_dependency *
@@ -69,6 +70,11 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 		dep->signaler = signal;
 		dep->flags = flags;
 
+		/* Keep track of whether anyone on this chain has a semaphore */
+		if (signal->flags & I915_SCHED_HAS_SEMAPHORE &&
+		    !node_started(signal))
+			node->flags |=  I915_SCHED_HAS_SEMAPHORE;
+
 		ret = true;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 7d4a49750d92..6ce450cf63fa 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -24,14 +24,15 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 2
+#define I915_USER_PRIORITY_SHIFT 3
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
-#define I915_PRIORITY_WAIT	((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
 
 #define __NO_PREEMPTION (I915_PRIORITY_WAIT)
 
@@ -74,6 +75,8 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
+	unsigned int flags;
+#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 53d6f7fdb50e..2268860cca44 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,7 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
-#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (3 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 05/38] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:18   ` Michał Winiarski
  2019-03-01 14:03 ` [PATCH 07/38] drm/i915: Force GPU idle on suspend Chris Wilson
                   ` (35 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

There is no point in whitelisting a register that the user then cannot
write to, so check the register exists before merging such patches.

v2: Mark SLICE_COMMON_ECO_CHICKEN1 [731c] as write-only

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Dale B Stimson <dale.b.stimson@intel.com>
Cc: Michał Winiarski <michal.winiarski@intel.com>
---
 .../drm/i915/selftests/intel_workarounds.c    | 376 ++++++++++++++++++
 1 file changed, 376 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index e6ffc8ac22dc..33b3ced83fde 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -12,6 +12,14 @@
 #include "igt_spinner.h"
 #include "igt_wedge_me.h"
 #include "mock_context.h"
+#include "mock_drm.h"
+
+static const struct wo_register {
+	enum intel_platform platform;
+	u32 reg;
+} wo_registers[] = {
+	{ INTEL_GEMINILAKE, 0x731c }
+};
 
 #define REF_NAME_MAX (INTEL_ENGINE_CS_MAX_NAME + 4)
 struct wa_lists {
@@ -331,6 +339,373 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
 	return err;
 }
 
+static struct i915_vma *create_scratch(struct i915_gem_context *ctx)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	void *ptr;
+	int err;
+
+	obj = i915_gem_object_create_internal(ctx->i915, PAGE_SIZE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+
+	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+
+	ptr = i915_gem_object_pin_map(obj, I915_MAP_WB);
+	if (IS_ERR(ptr)) {
+		err = PTR_ERR(ptr);
+		goto err_obj;
+	}
+	memset(ptr, 0xc5, PAGE_SIZE);
+	i915_gem_object_unpin_map(obj);
+
+	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_obj;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (err)
+		goto err_obj;
+
+	err = i915_gem_object_set_to_cpu_domain(obj, false);
+	if (err)
+		goto err_obj;
+
+	return vma;
+
+err_obj:
+	i915_gem_object_put(obj);
+	return ERR_PTR(err);
+}
+
+static struct i915_vma *create_batch(struct i915_gem_context *ctx)
+{
+	struct drm_i915_gem_object *obj;
+	struct i915_vma *vma;
+	int err;
+
+	obj = i915_gem_object_create_internal(ctx->i915, 16 * PAGE_SIZE);
+	if (IS_ERR(obj))
+		return ERR_CAST(obj);
+
+	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_obj;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_USER);
+	if (err)
+		goto err_obj;
+
+	err = i915_gem_object_set_to_wc_domain(obj, true);
+	if (err)
+		goto err_obj;
+
+	return vma;
+
+err_obj:
+	i915_gem_object_put(obj);
+	return ERR_PTR(err);
+}
+
+static u32 reg_write(u32 old, u32 new, u32 rsvd)
+{
+	if (rsvd == 0x0000ffff) {
+		old &= ~(new >> 16);
+		old |= new & (new >> 16);
+	} else {
+		old &= ~rsvd;
+		old |= new & rsvd;
+	}
+
+	return old;
+}
+
+static bool wo_register(struct intel_engine_cs *engine, u32 reg)
+{
+	enum intel_platform platform = INTEL_INFO(engine->i915)->platform;
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(wo_registers); i++) {
+		if (wo_registers[i].platform == platform &&
+		    wo_registers[i].reg == reg)
+			return true;
+	}
+
+	return false;
+}
+
+static int check_dirty_whitelist(struct i915_gem_context *ctx,
+				 struct intel_engine_cs *engine)
+{
+	const u32 values[] = {
+		0x00000000,
+		0x01010101,
+		0x10100101,
+		0x03030303,
+		0x30300303,
+		0x05050505,
+		0x50500505,
+		0x0f0f0f0f,
+		0xf00ff00f,
+		0x10101010,
+		0xf0f01010,
+		0x30303030,
+		0xa0a03030,
+		0x50505050,
+		0xc0c05050,
+		0xf0f0f0f0,
+		0x11111111,
+		0x33333333,
+		0x55555555,
+		0x0000ffff,
+		0x00ff00ff,
+		0xff0000ff,
+		0xffff00ff,
+		0xffffffff,
+	};
+	struct i915_vma *scratch;
+	struct i915_vma *batch;
+	int err = 0, i, v;
+	u32 *cs;
+
+	scratch = create_scratch(ctx);
+	if (IS_ERR(scratch))
+		return PTR_ERR(scratch);
+
+	batch = create_batch(ctx);
+	if (IS_ERR(batch)) {
+		err = PTR_ERR(batch);
+		goto out_scratch;
+	}
+
+	for (i = 0; i < engine->whitelist.count; i++) {
+		u32 reg = i915_mmio_reg_offset(engine->whitelist.list[i].reg);
+		u64 addr = scratch->node.start;
+		struct i915_request *rq;
+		u32 srm, lrm, rsvd;
+		u32 expect;
+		int idx;
+
+		if (wo_register(engine, reg))
+			continue;
+
+		srm = MI_STORE_REGISTER_MEM;
+		lrm = MI_LOAD_REGISTER_MEM;
+		if (INTEL_GEN(ctx->i915) >= 8)
+			lrm++, srm++;
+
+		pr_debug("%s: Writing garbage to %x {srm=0x%08x, lrm=0x%08x}\n",
+			 engine->name, reg, srm, lrm);
+
+		cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC);
+		if (IS_ERR(cs)) {
+			err = PTR_ERR(cs);
+			goto out_batch;
+		}
+
+		/* SRM original */
+		*cs++ = srm;
+		*cs++ = reg;
+		*cs++ = lower_32_bits(addr);
+		*cs++ = upper_32_bits(addr);
+
+		idx = 1;
+		for (v = 0; v < ARRAY_SIZE(values); v++) {
+			/* LRI garbage */
+			*cs++ = MI_LOAD_REGISTER_IMM(1);
+			*cs++ = reg;
+			*cs++ = values[v];
+
+			/* SRM result */
+			*cs++ = srm;
+			*cs++ = reg;
+			*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
+			*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
+			idx++;
+		}
+		for (v = 0; v < ARRAY_SIZE(values); v++) {
+			/* LRI garbage */
+			*cs++ = MI_LOAD_REGISTER_IMM(1);
+			*cs++ = reg;
+			*cs++ = ~values[v];
+
+			/* SRM result */
+			*cs++ = srm;
+			*cs++ = reg;
+			*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
+			*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
+			idx++;
+		}
+		GEM_BUG_ON(idx * sizeof(u32) > scratch->size);
+
+		/* LRM original -- don't leave garbage in the context! */
+		*cs++ = lrm;
+		*cs++ = reg;
+		*cs++ = lower_32_bits(addr);
+		*cs++ = upper_32_bits(addr);
+
+		*cs++ = MI_BATCH_BUFFER_END;
+
+		i915_gem_object_unpin_map(batch->obj);
+		i915_gem_chipset_flush(ctx->i915);
+
+		rq = i915_request_alloc(engine, ctx);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out_batch;
+		}
+
+		if (engine->emit_init_breadcrumb) { /* Be nice if we hang */
+			err = engine->emit_init_breadcrumb(rq);
+			if (err)
+				goto err_request;
+		}
+
+		err = engine->emit_bb_start(rq,
+					    batch->node.start, PAGE_SIZE,
+					    0);
+		if (err)
+			goto err_request;
+
+err_request:
+		i915_request_add(rq);
+		if (err)
+			goto out_batch;
+
+		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+			pr_err("%s: Futzing %x timedout; cancelling test\n",
+			       engine->name, reg);
+			i915_gem_set_wedged(ctx->i915);
+			err = -EIO;
+			goto out_batch;
+		}
+
+		cs = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB);
+		if (IS_ERR(cs)) {
+			err = PTR_ERR(cs);
+			goto out_batch;
+		}
+
+		GEM_BUG_ON(values[ARRAY_SIZE(values) - 1] != 0xffffffff);
+		rsvd = cs[ARRAY_SIZE(values)]; /* detect write masking */
+		if (!rsvd) {
+			pr_err("%s: Unable to write to whitelisted register %x\n",
+			       engine->name, reg);
+			err = -EINVAL;
+			goto out_unpin;
+		}
+
+		expect = cs[0];
+		idx = 1;
+		for (v = 0; v < ARRAY_SIZE(values); v++) {
+			expect = reg_write(expect, values[v], rsvd);
+			if (cs[idx] != expect)
+				err++;
+			idx++;
+		}
+		for (v = 0; v < ARRAY_SIZE(values); v++) {
+			expect = reg_write(expect, ~values[v], rsvd);
+			if (cs[idx] != expect)
+				err++;
+			idx++;
+		}
+		if (err) {
+			pr_err("%s: %d mismatch between values written to whitelisted register [%x], and values read back!\n",
+			       engine->name, err, reg);
+
+			pr_info("%s: Whitelisted register: %x, original value %08x, rsvd %08x\n",
+				engine->name, reg, cs[0], rsvd);
+
+			expect = cs[0];
+			idx = 1;
+			for (v = 0; v < ARRAY_SIZE(values); v++) {
+				u32 w = values[v];
+
+				expect = reg_write(expect, w, rsvd);
+				pr_info("Wrote %08x, read %08x, expect %08x\n",
+					w, cs[idx], expect);
+				idx++;
+			}
+			for (v = 0; v < ARRAY_SIZE(values); v++) {
+				u32 w = ~values[v];
+
+				expect = reg_write(expect, w, rsvd);
+				pr_info("Wrote %08x, read %08x, expect %08x\n",
+					w, cs[idx], expect);
+				idx++;
+			}
+
+			err = -EINVAL;
+		}
+out_unpin:
+		i915_gem_object_unpin_map(scratch->obj);
+		if (err)
+			break;
+	}
+
+	if (igt_flush_test(ctx->i915, I915_WAIT_LOCKED))
+		err = -EIO;
+out_batch:
+	i915_vma_unpin_and_release(&batch, 0);
+out_scratch:
+	i915_vma_unpin_and_release(&scratch, 0);
+	return err;
+}
+
+static int live_dirty_whitelist(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_gem_context *ctx;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	struct drm_file *file;
+	int err = 0;
+
+	/* Can the user write to the whitelisted registers? */
+
+	if (INTEL_GEN(i915) < 7) /* minimum requirement for LRI, SRM, LRM */
+		return 0;
+
+	wakeref = intel_runtime_pm_get(i915);
+
+	mutex_unlock(&i915->drm.struct_mutex);
+	file = mock_file(i915);
+	mutex_lock(&i915->drm.struct_mutex);
+	if (IS_ERR(file)) {
+		err = PTR_ERR(file);
+		goto out_rpm;
+	}
+
+	ctx = live_context(i915, file);
+	if (IS_ERR(ctx)) {
+		err = PTR_ERR(ctx);
+		goto out_file;
+	}
+
+	for_each_engine(engine, i915, id) {
+		if (engine->whitelist.count == 0)
+			continue;
+
+		err = check_dirty_whitelist(ctx, engine);
+		if (err)
+			goto out_file;
+	}
+
+out_file:
+	mutex_unlock(&i915->drm.struct_mutex);
+	mock_file_free(i915, file);
+	mutex_lock(&i915->drm.struct_mutex);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+	return err;
+}
+
 static int live_reset_whitelist(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -504,6 +879,7 @@ live_engine_reset_gt_engine_workarounds(void *arg)
 int intel_workarounds_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
+		SUBTEST(live_dirty_whitelist),
 		SUBTEST(live_reset_whitelist),
 		SUBTEST(live_gpu_reset_gt_engine_workarounds),
 		SUBTEST(live_engine_reset_gt_engine_workarounds),
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 07/38] drm/i915: Force GPU idle on suspend
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (4 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 08/38] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
                   ` (34 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

To facilitate the next patch to allow preemptible kernels not to incur
the wrath of hangcheck, we need to ensure that we can still suspend and
shutdown. That is we will not be able to rely on hangcheck to terminate
a blocking kernel and instead must manually do so ourselves. The
advantage is that we can apply more pressure!

As we now perform a GPU reset to clean up any residual kernels, we leave
the GPU in an unknown state and in particular can not talk to the GuC
before we reinitialise it following resume. For example, we no longer
need to tell the GuC to suspend itself, as it is already reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a1ad5e137a97..f59af9567ec9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3167,13 +3167,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 
 		lockdep_assert_held(&i915->drm.struct_mutex);
 
-		if (GEM_SHOW_DEBUG() && !timeout) {
-			/* Presume that timeout was non-zero to begin with! */
-			dev_warn(&i915->drm.pdev->dev,
-				 "Missed idle-completion interrupt!\n");
-			GEM_TRACE_DUMP();
-		}
-
 		err = wait_for_engines(i915);
 		if (err)
 			return err;
@@ -4421,11 +4414,12 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 					     I915_WAIT_INTERRUPTIBLE |
 					     I915_WAIT_LOCKED |
 					     I915_WAIT_FOR_IDLE_BOOST,
-					     MAX_SCHEDULE_TIMEOUT);
-		if (ret && ret != -EIO)
+					     HZ / 5);
+		if (ret == -EINTR)
 			goto err_unlock;
 
-		assert_kernel_context_is_current(i915);
+		/* Forcibly cancel outstanding work and leave the gpu quiet. */
+		i915_gem_set_wedged(i915);
 	}
 	i915_retire_requests(i915); /* ensure we flush after wedging */
 
@@ -4440,15 +4434,11 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 */
 	drain_delayed_work(&i915->gt.idle_work);
 
-	intel_uc_suspend(i915);
-
 	/*
 	 * Assert that we successfully flushed all the work and
 	 * reset the GPU back to its idle, low power state.
 	 */
-	WARN_ON(i915->gt.awake);
-	if (WARN_ON(!intel_engines_are_idle(i915)))
-		i915_gem_set_wedged(i915); /* no hope, discard everything */
+	GEM_BUG_ON(i915->gt.awake);
 
 	intel_runtime_pm_put(i915, wakeref);
 	return 0;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 08/38] drm/i915/selftests: Improve switch-to-kernel-context checking
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (5 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 07/38] drm/i915: Force GPU idle on suspend Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 09/38] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
                   ` (33 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

We can reduce the switch-to-kernel-context selftest to operate as a loop
and so trivially test another state transition (that of idle->busy).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/selftests/i915_gem_context.c | 80 ++++++++-----------
 1 file changed, 35 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 3a238b9628b3..c7ea3535264a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1493,63 +1493,55 @@ static int __igt_switch_to_kernel_context(struct drm_i915_private *i915,
 {
 	struct intel_engine_cs *engine;
 	unsigned int tmp;
-	int err;
+	int pass;
 
 	GEM_TRACE("Testing %s\n", __engine_name(i915, engines));
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		struct i915_request *rq;
+	for (pass = 0; pass < 4; pass++) { /* Once busy; once idle; repeat */
+		bool from_idle = pass & 1;
+		int err;
 
-		rq = i915_request_alloc(engine, ctx);
-		if (IS_ERR(rq))
-			return PTR_ERR(rq);
+		if (!from_idle) {
+			for_each_engine_masked(engine, i915, engines, tmp) {
+				struct i915_request *rq;
 
-		i915_request_add(rq);
-	}
+				rq = i915_request_alloc(engine, ctx);
+				if (IS_ERR(rq))
+					return PTR_ERR(rq);
 
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		return err;
-
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		if (!engine_has_kernel_context_barrier(engine)) {
-			pr_err("kernel context not last on engine %s!\n",
-			       engine->name);
-			return -EINVAL;
+				i915_request_add(rq);
+			}
 		}
-	}
 
-	err = i915_gem_wait_for_idle(i915,
-				     I915_WAIT_LOCKED,
-				     MAX_SCHEDULE_TIMEOUT);
-	if (err)
-		return err;
+		err = i915_gem_switch_to_kernel_context(i915);
+		if (err)
+			return err;
 
-	GEM_BUG_ON(i915->gt.active_requests);
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		if (engine->last_retired_context->gem_context != i915->kernel_context) {
-			pr_err("engine %s not idling in kernel context!\n",
-			       engine->name);
+		if (!from_idle) {
+			err = i915_gem_wait_for_idle(i915,
+						     I915_WAIT_LOCKED,
+						     MAX_SCHEDULE_TIMEOUT);
+			if (err)
+				return err;
+		}
+
+		if (i915->gt.active_requests) {
+			pr_err("%d active requests remain after switching to kernel context, pass %d (%s) on %s engine%s\n",
+			       i915->gt.active_requests,
+			       pass, from_idle ? "idle" : "busy",
+			       __engine_name(i915, engines),
+			       is_power_of_2(engines) ? "" : "s");
 			return -EINVAL;
 		}
-	}
 
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		return err;
+		/* XXX Bonus points for proving we are the kernel context! */
 
-	if (i915->gt.active_requests) {
-		pr_err("switch-to-kernel-context emitted %d requests even though it should already be idling in the kernel context\n",
-		       i915->gt.active_requests);
-		return -EINVAL;
+		mutex_unlock(&i915->drm.struct_mutex);
+		drain_delayed_work(&i915->gt.idle_work);
+		mutex_lock(&i915->drm.struct_mutex);
 	}
 
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		if (!intel_engine_has_kernel_context(engine)) {
-			pr_err("kernel context not last on engine %s!\n",
-			       engine->name);
-			return -EINVAL;
-		}
-	}
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		return -EIO;
 
 	return 0;
 }
@@ -1593,8 +1585,6 @@ static int igt_switch_to_kernel_context(void *arg)
 
 out_unlock:
 	GEM_TRACE_DUMP_ON(err);
-	if (igt_flush_test(i915, I915_WAIT_LOCKED))
-		err = -EIO;
 
 	intel_runtime_pm_put(i915, wakeref);
 	mutex_unlock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 09/38] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (6 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 08/38] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 10/38] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
                   ` (32 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

When the system idles, we switch to the kernel context as a defensive
measure (no users are harmed if the kernel context is lost). Currently,
we issue a switch to kernel context and then come back later to see if
the kernel context is still current and the system is idle. However,
if we are no longer privy to the runqueue ordering, then we have to
relax our assumptions about the logical state of the GPU and the only
way to ensure that the kernel context is currently loaded is by issuing
a request to run after all others, and wait for it to complete all while
preventing anyone else from issuing their own requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c           |  14 +--
 drivers/gpu/drm/i915/i915_drv.h           |   2 +-
 drivers/gpu/drm/i915/i915_gem.c           | 139 ++++++++--------------
 drivers/gpu/drm/i915/i915_gem_context.c   |   4 +
 drivers/gpu/drm/i915/selftests/i915_gem.c |   9 +-
 5 files changed, 63 insertions(+), 105 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index c08abdef5eb6..224bb96b7877 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -714,8 +714,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 	return 0;
 
 cleanup_gem:
-	if (i915_gem_suspend(dev_priv))
-		DRM_ERROR("failed to idle hardware; continuing to unload!\n");
+	i915_gem_suspend(dev_priv);
 	i915_gem_fini(dev_priv);
 cleanup_modeset:
 	intel_modeset_cleanup(dev);
@@ -1787,8 +1786,7 @@ void i915_driver_unload(struct drm_device *dev)
 	/* Flush any external code that still may be under the RCU lock */
 	synchronize_rcu();
 
-	if (i915_gem_suspend(dev_priv))
-		DRM_ERROR("failed to idle hardware; continuing to unload!\n");
+	i915_gem_suspend(dev_priv);
 
 	drm_atomic_helper_shutdown(dev);
 
@@ -1896,7 +1894,6 @@ static bool suspend_to_idle(struct drm_i915_private *dev_priv)
 static int i915_drm_prepare(struct drm_device *dev)
 {
 	struct drm_i915_private *i915 = to_i915(dev);
-	int err;
 
 	/*
 	 * NB intel_display_suspend() may issue new requests after we've
@@ -1904,12 +1901,9 @@ static int i915_drm_prepare(struct drm_device *dev)
 	 * split out that work and pull it forward so that after point,
 	 * the GPU is not woken again.
 	 */
-	err = i915_gem_suspend(i915);
-	if (err)
-		dev_err(&i915->drm.pdev->dev,
-			"GEM idle failed, suspend/resume might fail\n");
+	i915_gem_suspend(i915);
 
-	return err;
+	return 0;
 }
 
 static int i915_drm_suspend(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 453af7438e67..cf325a00d143 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3046,7 +3046,7 @@ void i915_gem_fini(struct drm_i915_private *dev_priv);
 void i915_gem_cleanup_engines(struct drm_i915_private *dev_priv);
 int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
 			   unsigned int flags, long timeout);
-int __must_check i915_gem_suspend(struct drm_i915_private *dev_priv);
+void i915_gem_suspend(struct drm_i915_private *dev_priv);
 void i915_gem_suspend_late(struct drm_i915_private *dev_priv);
 void i915_gem_resume(struct drm_i915_private *dev_priv);
 vm_fault_t i915_gem_fault(struct vm_fault *vmf);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index f59af9567ec9..503b02525c99 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2872,13 +2872,6 @@ i915_gem_retire_work_handler(struct work_struct *work)
 				   round_jiffies_up_relative(HZ));
 }
 
-static inline bool
-new_requests_since_last_retire(const struct drm_i915_private *i915)
-{
-	return (READ_ONCE(i915->gt.active_requests) ||
-		work_pending(&i915->gt.idle_work.work));
-}
-
 static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
@@ -2887,7 +2880,8 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	if (i915_reset_failed(i915))
 		return;
 
-	GEM_BUG_ON(i915->gt.active_requests);
+	i915_retire_requests(i915);
+
 	for_each_engine(engine, i915, id) {
 		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
 		GEM_BUG_ON(engine->last_retired_context !=
@@ -2895,77 +2889,75 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	}
 }
 
+static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
+{
+	if (i915_gem_switch_to_kernel_context(i915))
+		return false;
+
+	if (i915_gem_wait_for_idle(i915,
+				   I915_WAIT_LOCKED |
+				   I915_WAIT_FOR_IDLE_BOOST,
+				   HZ / 10))
+		return false;
+
+	assert_kernel_context_is_current(i915);
+	return true;
+}
+
 static void
 i915_gem_idle_work_handler(struct work_struct *work)
 {
-	struct drm_i915_private *dev_priv =
-		container_of(work, typeof(*dev_priv), gt.idle_work.work);
+	struct drm_i915_private *i915 =
+		container_of(work, typeof(*i915), gt.idle_work.work);
+	typeof(i915->gt) *gt = &i915->gt;
 	bool rearm_hangcheck;
 
-	if (!READ_ONCE(dev_priv->gt.awake))
+	if (!READ_ONCE(gt->awake))
 		return;
 
-	if (READ_ONCE(dev_priv->gt.active_requests))
+	if (READ_ONCE(gt->active_requests))
 		return;
 
-	/*
-	 * Flush out the last user context, leaving only the pinned
-	 * kernel context resident. When we are idling on the kernel_context,
-	 * no more new requests (with a context switch) are emitted and we
-	 * can finally rest. A consequence is that the idle work handler is
-	 * always called at least twice before idling (and if the system is
-	 * idle that implies a round trip through the retire worker).
-	 */
-	mutex_lock(&dev_priv->drm.struct_mutex);
-	i915_gem_switch_to_kernel_context(dev_priv);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
-
-	GEM_TRACE("active_requests=%d (after switch-to-kernel-context)\n",
-		  READ_ONCE(dev_priv->gt.active_requests));
-
-	/*
-	 * Wait for last execlists context complete, but bail out in case a
-	 * new request is submitted. As we don't trust the hardware, we
-	 * continue on if the wait times out. This is necessary to allow
-	 * the machine to suspend even if the hardware dies, and we will
-	 * try to recover in resume (after depriving the hardware of power,
-	 * it may be in a better mmod).
-	 */
-	__wait_for(if (new_requests_since_last_retire(dev_priv)) return,
-		   intel_engines_are_idle(dev_priv),
-		   I915_IDLE_ENGINES_TIMEOUT * 1000,
-		   10, 500);
-
 	rearm_hangcheck =
-		cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
+		cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
 
-	if (!mutex_trylock(&dev_priv->drm.struct_mutex)) {
+	if (!mutex_trylock(&i915->drm.struct_mutex)) {
 		/* Currently busy, come back later */
-		mod_delayed_work(dev_priv->wq,
-				 &dev_priv->gt.idle_work,
+		mod_delayed_work(i915->wq,
+				 &gt->idle_work,
 				 msecs_to_jiffies(50));
 		goto out_rearm;
 	}
 
 	/*
-	 * New request retired after this work handler started, extend active
-	 * period until next instance of the work.
+	 * Flush out the last user context, leaving only the pinned
+	 * kernel context resident. Should anything unfortunate happen
+	 * while we are idle (such as the GPU being power cycled), no users
+	 * will be harmed.
 	 */
-	if (new_requests_since_last_retire(dev_priv))
-		goto out_unlock;
-
-	__i915_gem_park(dev_priv);
+	if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {
+		++gt->active_requests; /* don't requeue idle */
+
+		if (!switch_to_kernel_context_sync(i915)) {
+			dev_err(i915->drm.dev,
+				"Failed to idle engines, declaring wedged!\n");
+			GEM_TRACE_DUMP();
+			i915_gem_set_wedged(i915);
+		}
+		i915_retire_requests(i915);
 
-	assert_kernel_context_is_current(dev_priv);
+		if (!--gt->active_requests) {
+			__i915_gem_park(i915);
+			rearm_hangcheck = false;
+		}
+	}
 
-	rearm_hangcheck = false;
-out_unlock:
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&i915->drm.struct_mutex);
 
 out_rearm:
 	if (rearm_hangcheck) {
-		GEM_BUG_ON(!dev_priv->gt.awake);
-		i915_queue_hangcheck(dev_priv);
+		GEM_BUG_ON(!gt->awake);
+		i915_queue_hangcheck(i915);
 	}
 }
 
@@ -3172,7 +3164,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 			return err;
 
 		i915_retire_requests(i915);
-		GEM_BUG_ON(i915->gt.active_requests);
 	}
 
 	return 0;
@@ -4382,10 +4373,9 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	mutex_unlock(&i915->drm.struct_mutex);
 }
 
-int i915_gem_suspend(struct drm_i915_private *i915)
+void i915_gem_suspend(struct drm_i915_private *i915)
 {
 	intel_wakeref_t wakeref;
-	int ret;
 
 	GEM_TRACE("\n");
 
@@ -4405,19 +4395,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_reset_failed(i915)) {
-		ret = i915_gem_switch_to_kernel_context(i915);
-		if (ret)
-			goto err_unlock;
-
-		ret = i915_gem_wait_for_idle(i915,
-					     I915_WAIT_INTERRUPTIBLE |
-					     I915_WAIT_LOCKED |
-					     I915_WAIT_FOR_IDLE_BOOST,
-					     HZ / 5);
-		if (ret == -EINTR)
-			goto err_unlock;
-
+	if (!switch_to_kernel_context_sync(i915)) {
 		/* Forcibly cancel outstanding work and leave the gpu quiet. */
 		i915_gem_set_wedged(i915);
 	}
@@ -4441,12 +4419,6 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	GEM_BUG_ON(i915->gt.awake);
 
 	intel_runtime_pm_put(i915, wakeref);
-	return 0;
-
-err_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
-	intel_runtime_pm_put(i915, wakeref);
-	return ret;
 }
 
 void i915_gem_suspend_late(struct drm_i915_private *i915)
@@ -4712,18 +4684,11 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 			goto err_active;
 	}
 
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		goto err_active;
-
-	if (i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED, HZ / 5)) {
-		i915_gem_set_wedged(i915);
+	if (!switch_to_kernel_context_sync(i915)) {
 		err = -EIO; /* Caller will declare us wedged */
 		goto err_active;
 	}
 
-	assert_kernel_context_is_current(i915);
-
 	/*
 	 * Immediately park the GPU so that we enable powersaving and
 	 * treat it as idle. The next time we issue a request, we will
@@ -4967,7 +4932,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_init_hw:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
-	WARN_ON(i915_gem_suspend(dev_priv));
+	i915_gem_suspend(dev_priv);
 	i915_gem_suspend_late(dev_priv);
 
 	i915_gem_drain_workqueue(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d266ba3f7210..3c6edebd595b 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -765,6 +765,10 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
 	lockdep_assert_held(&i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915->kernel_context);
 
+	/* Inoperable, so presume the GPU is safely pointing into the void! */
+	if (i915_terminally_wedged(i915))
+		return 0;
+
 	i915_retire_requests(i915);
 
 	for_each_engine(engine, i915, id) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index e77b7ed449ae..50bb7bbd26d3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -84,14 +84,9 @@ static void simulate_hibernate(struct drm_i915_private *i915)
 
 static int pm_prepare(struct drm_i915_private *i915)
 {
-	int err = 0;
-
-	if (i915_gem_suspend(i915)) {
-		pr_err("i915_gem_suspend failed\n");
-		err = -EINVAL;
-	}
+	i915_gem_suspend(i915);
 
-	return err;
+	return 0;
 }
 
 static void pm_suspend(struct drm_i915_private *i915)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 10/38] drm/i915: Store the BIT(engine->id) as the engine's mask
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (7 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 09/38] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:25   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 11/38] drm/i915: Refactor common code to load initial power context Chris Wilson
                   ` (31 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.

v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |  4 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  2 +-
 drivers/gpu/drm/i915/i915_pci.c               | 39 +++++++++++--------
 drivers/gpu/drm/i915/i915_reset.c             |  8 ++--
 drivers/gpu/drm/i915/intel_device_info.c      |  6 +--
 drivers/gpu/drm/i915/intel_device_info.h      |  6 +--
 drivers/gpu/drm/i915/intel_engine_cs.c        | 15 ++++---
 drivers/gpu/drm/i915/intel_guc_submission.c   |  4 +-
 drivers/gpu/drm/i915/intel_hangcheck.c        |  8 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 18 ++++-----
 drivers/gpu/drm/i915/intel_ringbuffer.h       | 11 ++----
 .../gpu/drm/i915/selftests/i915_gem_context.c |  6 +--
 drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
 drivers/gpu/drm/i915/selftests/intel_guc.c    |  4 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  4 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  4 +-
 .../drm/i915/selftests/intel_workarounds.c    |  2 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  1 +
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 +-
 19 files changed, 75 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cf325a00d143..0dd680cdb9ce 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2099,7 +2099,7 @@ static inline struct drm_i915_private *huc_to_i915(struct intel_huc *huc)
 
 /* Iterator over subset of engines selected by mask */
 #define for_each_engine_masked(engine__, dev_priv__, mask__, tmp__) \
-	for ((tmp__) = (mask__) & INTEL_INFO(dev_priv__)->ring_mask; \
+	for ((tmp__) = (mask__) & INTEL_INFO(dev_priv__)->engine_mask; \
 	     (tmp__) ? \
 	     ((engine__) = (dev_priv__)->engine[__mask_next_bit(tmp__)]), 1 : \
 	     0;)
@@ -2432,7 +2432,7 @@ static inline unsigned int i915_sg_segment_size(void)
 #define ALL_ENGINES	(~0)
 
 #define HAS_ENGINE(dev_priv, id) \
-	(!!(INTEL_INFO(dev_priv)->ring_mask & ENGINE_MASK(id)))
+	(!!(INTEL_INFO(dev_priv)->engine_mask & ENGINE_MASK(id)))
 
 #define HAS_BSD(dev_priv)	HAS_ENGINE(dev_priv, VCS)
 #define HAS_BSD2(dev_priv)	HAS_ENGINE(dev_priv, VCS2)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 7e79691664e5..99022738cc89 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -798,7 +798,7 @@ static void gen8_initialize_pml4(struct i915_address_space *vm,
  */
 static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
 {
-	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->vm.i915)->ring_mask;
+	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->vm.i915)->engine_mask;
 }
 
 /* Removes entries from a single page table, releasing it if it's empty.
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index a9211c370cd1..524f55771f23 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -94,7 +94,7 @@
 	.gpu_reset_clobbers_display = true, \
 	.hws_needs_physical = 1, \
 	.unfenced_needs_alignment = 1, \
-	.ring_mask = RENDER_RING, \
+	.engine_mask = RENDER_RING, \
 	.has_snoop = true, \
 	.has_coherent_ggtt = false, \
 	GEN_DEFAULT_PIPEOFFSETS, \
@@ -133,7 +133,7 @@ static const struct intel_device_info intel_i865g_info = {
 	.num_pipes = 2, \
 	.display.has_gmch = 1, \
 	.gpu_reset_clobbers_display = true, \
-	.ring_mask = RENDER_RING, \
+	.engine_mask = RENDER_RING, \
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	GEN_DEFAULT_PIPEOFFSETS, \
@@ -210,7 +210,7 @@ static const struct intel_device_info intel_pineview_info = {
 	.display.has_hotplug = 1, \
 	.display.has_gmch = 1, \
 	.gpu_reset_clobbers_display = true, \
-	.ring_mask = RENDER_RING, \
+	.engine_mask = RENDER_RING, \
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	GEN_DEFAULT_PIPEOFFSETS, \
@@ -239,7 +239,7 @@ static const struct intel_device_info intel_i965gm_info = {
 static const struct intel_device_info intel_g45_info = {
 	GEN4_FEATURES,
 	PLATFORM(INTEL_G45),
-	.ring_mask = RENDER_RING | BSD_RING,
+	.engine_mask = RENDER_RING | BSD_RING,
 	.gpu_reset_clobbers_display = false,
 };
 
@@ -249,7 +249,7 @@ static const struct intel_device_info intel_gm45_info = {
 	.is_mobile = 1,
 	.display.has_fbc = 1,
 	.display.supports_tv = 1,
-	.ring_mask = RENDER_RING | BSD_RING,
+	.engine_mask = RENDER_RING | BSD_RING,
 	.gpu_reset_clobbers_display = false,
 };
 
@@ -257,7 +257,7 @@ static const struct intel_device_info intel_gm45_info = {
 	GEN(5), \
 	.num_pipes = 2, \
 	.display.has_hotplug = 1, \
-	.ring_mask = RENDER_RING | BSD_RING, \
+	.engine_mask = RENDER_RING | BSD_RING, \
 	.has_snoop = true, \
 	.has_coherent_ggtt = true, \
 	/* ilk does support rc6, but we do not implement [power] contexts */ \
@@ -283,7 +283,7 @@ static const struct intel_device_info intel_ironlake_m_info = {
 	.num_pipes = 2, \
 	.display.has_hotplug = 1, \
 	.display.has_fbc = 1, \
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING, \
+	.engine_mask = RENDER_RING | BSD_RING | BLT_RING, \
 	.has_coherent_ggtt = true, \
 	.has_llc = 1, \
 	.has_rc6 = 1, \
@@ -328,7 +328,7 @@ static const struct intel_device_info intel_sandybridge_m_gt2_info = {
 	.num_pipes = 3, \
 	.display.has_hotplug = 1, \
 	.display.has_fbc = 1, \
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING, \
+	.engine_mask = RENDER_RING | BSD_RING | BLT_RING, \
 	.has_coherent_ggtt = true, \
 	.has_llc = 1, \
 	.has_rc6 = 1, \
@@ -389,7 +389,7 @@ static const struct intel_device_info intel_valleyview_info = {
 	.ppgtt = INTEL_PPGTT_FULL,
 	.has_snoop = true,
 	.has_coherent_ggtt = false,
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING,
+	.engine_mask = RENDER_RING | BSD_RING | BLT_RING,
 	.display_mmio_offset = VLV_DISPLAY_BASE,
 	GEN_DEFAULT_PAGE_SIZES,
 	GEN_DEFAULT_PIPEOFFSETS,
@@ -398,7 +398,7 @@ static const struct intel_device_info intel_valleyview_info = {
 
 #define G75_FEATURES  \
 	GEN7_FEATURES, \
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
+	.engine_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
 	.display.has_ddi = 1, \
 	.has_fpga_dbg = 1, \
 	.display.has_psr = 1, \
@@ -462,7 +462,8 @@ static const struct intel_device_info intel_broadwell_rsvd_info = {
 static const struct intel_device_info intel_broadwell_gt3_info = {
 	BDW_PLATFORM,
 	.gt = 3,
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
+	.engine_mask =
+		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
 };
 
 static const struct intel_device_info intel_cherryview_info = {
@@ -471,7 +472,7 @@ static const struct intel_device_info intel_cherryview_info = {
 	.num_pipes = 3,
 	.display.has_hotplug = 1,
 	.is_lp = 1,
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING,
+	.engine_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING,
 	.has_64bit_reloc = 1,
 	.has_runtime_pm = 1,
 	.has_rc6 = 1,
@@ -521,7 +522,8 @@ static const struct intel_device_info intel_skylake_gt2_info = {
 
 #define SKL_GT3_PLUS_PLATFORM \
 	SKL_PLATFORM, \
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING
+	.engine_mask = \
+		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING
 
 
 static const struct intel_device_info intel_skylake_gt3_info = {
@@ -538,7 +540,7 @@ static const struct intel_device_info intel_skylake_gt4_info = {
 	GEN(9), \
 	.is_lp = 1, \
 	.display.has_hotplug = 1, \
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
+	.engine_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
 	.num_pipes = 3, \
 	.has_64bit_reloc = 1, \
 	.display.has_ddi = 1, \
@@ -592,7 +594,8 @@ static const struct intel_device_info intel_kabylake_gt2_info = {
 static const struct intel_device_info intel_kabylake_gt3_info = {
 	KBL_PLATFORM,
 	.gt = 3,
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
+	.engine_mask =
+		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
 };
 
 #define CFL_PLATFORM \
@@ -612,7 +615,8 @@ static const struct intel_device_info intel_coffeelake_gt2_info = {
 static const struct intel_device_info intel_coffeelake_gt3_info = {
 	CFL_PLATFORM,
 	.gt = 3,
-	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
+	.engine_mask =
+		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
 };
 
 #define GEN10_FEATURES \
@@ -655,7 +659,8 @@ static const struct intel_device_info intel_icelake_11_info = {
 	GEN11_FEATURES,
 	PLATFORM(INTEL_ICELAKE),
 	.is_alpha_support = 1,
-	.ring_mask = RENDER_RING | BLT_RING | VEBOX_RING | BSD_RING | BSD3_RING,
+	.engine_mask =
+		RENDER_RING | BLT_RING | VEBOX_RING | BSD_RING | BSD3_RING,
 };
 
 #undef GEN
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 55d6123dbba4..fc5c5a7e4dfb 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -692,7 +692,7 @@ static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
 		return err;
 
 	for_each_engine(engine, i915, id)
-		intel_engine_reset(engine, stalled_mask & ENGINE_MASK(id));
+		intel_engine_reset(engine, stalled_mask & engine->mask);
 
 	i915_gem_restore_fences(i915);
 
@@ -1057,7 +1057,7 @@ void i915_reset(struct drm_i915_private *i915,
 static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
 					struct intel_engine_cs *engine)
 {
-	return intel_gpu_reset(i915, intel_engine_flag(engine));
+	return intel_gpu_reset(i915, engine->mask);
 }
 
 /**
@@ -1241,7 +1241,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 */
 	wakeref = intel_runtime_pm_get(i915);
 
-	engine_mask &= INTEL_INFO(i915)->ring_mask;
+	engine_mask &= INTEL_INFO(i915)->engine_mask;
 
 	if (flags & I915_ERROR_CAPTURE) {
 		i915_capture_error_state(i915, engine_mask, msg);
@@ -1260,7 +1260,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 				continue;
 
 			if (i915_reset_engine(engine, msg) == 0)
-				engine_mask &= ~intel_engine_flag(engine);
+				engine_mask &= ~engine->mask;
 
 			clear_bit(I915_RESET_ENGINE + engine->id,
 				  &error->flags);
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index 855a5074ad77..6283506e89b5 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -738,7 +738,7 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
 		runtime->num_scalers[PIPE_C] = 1;
 	}
 
-	BUILD_BUG_ON(I915_NUM_ENGINES > BITS_PER_TYPE(intel_ring_mask_t));
+	BUILD_BUG_ON(BITS_PER_TYPE(intel_engine_mask_t) < I915_NUM_ENGINES);
 
 	if (IS_GEN(dev_priv, 11))
 		for_each_pipe(dev_priv, pipe)
@@ -887,7 +887,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
 			continue;
 
 		if (!(BIT(i) & RUNTIME_INFO(dev_priv)->vdbox_enable)) {
-			info->ring_mask &= ~ENGINE_MASK(_VCS(i));
+			info->engine_mask &= ~ENGINE_MASK(_VCS(i));
 			DRM_DEBUG_DRIVER("vcs%u fused off\n", i);
 			continue;
 		}
@@ -906,7 +906,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
 			continue;
 
 		if (!(BIT(i) & RUNTIME_INFO(dev_priv)->vebox_enable)) {
-			info->ring_mask &= ~ENGINE_MASK(_VECS(i));
+			info->engine_mask &= ~ENGINE_MASK(_VECS(i));
 			DRM_DEBUG_DRIVER("vecs%u fused off\n", i);
 		}
 	}
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index e8b8661df746..047d10bdd455 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -150,14 +150,14 @@ struct sseu_dev_info {
 	u8 eu_mask[GEN_MAX_SLICES * GEN_MAX_SUBSLICES];
 };
 
-typedef u8 intel_ring_mask_t;
+typedef u8 intel_engine_mask_t;
 
 struct intel_device_info {
 	u16 gen_mask;
 
 	u8 gen;
 	u8 gt; /* GT number, 0 if undefined */
-	intel_ring_mask_t ring_mask; /* Rings supported by the HW */
+	intel_engine_mask_t engine_mask; /* Engines supported by the HW */
 
 	enum intel_platform platform;
 	u32 platform_mask;
@@ -200,7 +200,7 @@ struct intel_runtime_info {
 	u8 num_sprites[I915_MAX_PIPES];
 	u8 num_scalers[I915_MAX_PIPES];
 
-	u8 num_rings;
+	u8 num_engines;
 
 	/* Slice/subslice/EU info */
 	struct sseu_dev_info sseu;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 4fc7e2ac6278..8226871c7781 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -313,7 +313,10 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	if (!engine)
 		return -ENOMEM;
 
+	BUILD_BUG_ON(BITS_PER_TYPE(engine->mask) < I915_NUM_ENGINES);
+
 	engine->id = id;
+	engine->mask = BIT(id);
 	engine->i915 = dev_priv;
 	__sprint_engine_name(engine->name, info);
 	engine->hw_id = engine->guc_id = info->hw_id;
@@ -355,15 +358,15 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 int intel_engines_init_mmio(struct drm_i915_private *dev_priv)
 {
 	struct intel_device_info *device_info = mkwrite_device_info(dev_priv);
-	const unsigned int ring_mask = INTEL_INFO(dev_priv)->ring_mask;
+	const unsigned int engine_mask = INTEL_INFO(dev_priv)->engine_mask;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 	unsigned int mask = 0;
 	unsigned int i;
 	int err;
 
-	WARN_ON(ring_mask == 0);
-	WARN_ON(ring_mask &
+	WARN_ON(engine_mask == 0);
+	WARN_ON(engine_mask &
 		GENMASK(BITS_PER_TYPE(mask) - 1, I915_NUM_ENGINES));
 
 	if (i915_inject_load_failure())
@@ -385,8 +388,8 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv)
 	 * are added to the driver by a warning and disabling the forgotten
 	 * engines.
 	 */
-	if (WARN_ON(mask != ring_mask))
-		device_info->ring_mask = mask;
+	if (WARN_ON(mask != engine_mask))
+		device_info->engine_mask = mask;
 
 	/* We always presume we have at least RCS available for later probing */
 	if (WARN_ON(!HAS_ENGINE(dev_priv, RCS))) {
@@ -394,7 +397,7 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv)
 		goto cleanup;
 	}
 
-	RUNTIME_INFO(dev_priv)->num_rings = hweight32(mask);
+	RUNTIME_INFO(dev_priv)->num_engines = hweight32(mask);
 
 	i915_check_and_clear_faults(dev_priv);
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 56ba2fcbabe6..119d3326bb5e 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -1030,7 +1030,7 @@ static int guc_clients_create(struct intel_guc *guc)
 	GEM_BUG_ON(guc->preempt_client);
 
 	client = guc_client_alloc(dev_priv,
-				  INTEL_INFO(dev_priv)->ring_mask,
+				  INTEL_INFO(dev_priv)->engine_mask,
 				  GUC_CLIENT_PRIORITY_KMD_NORMAL,
 				  dev_priv->kernel_context);
 	if (IS_ERR(client)) {
@@ -1041,7 +1041,7 @@ static int guc_clients_create(struct intel_guc *guc)
 
 	if (dev_priv->preempt_context) {
 		client = guc_client_alloc(dev_priv,
-					  INTEL_INFO(dev_priv)->ring_mask,
+					  INTEL_INFO(dev_priv)->engine_mask,
 					  GUC_CLIENT_PRIORITY_KMD_HIGH,
 					  dev_priv->preempt_context);
 		if (IS_ERR(client)) {
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index f1d8dfc58049..c508875d5236 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -120,7 +120,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	 */
 	tmp = I915_READ_CTL(engine);
 	if (tmp & RING_WAIT) {
-		i915_handle_error(dev_priv, BIT(engine->id), 0,
+		i915_handle_error(dev_priv, engine->mask, 0,
 				  "stuck wait on %s", engine->name);
 		I915_WRITE_CTL(engine, tmp);
 		return ENGINE_WAIT_KICK;
@@ -282,13 +282,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		hangcheck_store_sample(engine, &hc);
 
 		if (hc.stalled) {
-			hung |= intel_engine_flag(engine);
+			hung |= engine->mask;
 			if (hc.action != ENGINE_DEAD)
-				stuck |= intel_engine_flag(engine);
+				stuck |= engine->mask;
 		}
 
 		if (hc.wedged)
-			wedged |= intel_engine_flag(engine);
+			wedged |= engine->mask;
 	}
 
 	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1b96b0960adc..e78594ecba6f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1692,8 +1692,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	struct drm_i915_private *i915 = rq->i915;
 	struct intel_engine_cs *engine = rq->engine;
 	enum intel_engine_id id;
-	const int num_rings =
-		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_rings - 1 : 0;
+	const int num_engines =
+		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
 	bool force_restore = false;
 	int len;
 	u32 *cs;
@@ -1707,7 +1707,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 
 	len = 4;
 	if (IS_GEN(i915, 7))
-		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
+		len += 2 + (num_engines ? 4 * num_engines + 6 : 0);
 	if (flags & MI_FORCE_RESTORE) {
 		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
 		flags &= ~MI_FORCE_RESTORE;
@@ -1722,10 +1722,10 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
 	if (IS_GEN(i915, 7)) {
 		*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
-		if (num_rings) {
+		if (num_engines) {
 			struct intel_engine_cs *signaller;
 
-			*cs++ = MI_LOAD_REGISTER_IMM(num_rings);
+			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
 			for_each_engine(signaller, i915, id) {
 				if (signaller == engine)
 					continue;
@@ -1768,11 +1768,11 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 	*cs++ = MI_NOOP;
 
 	if (IS_GEN(i915, 7)) {
-		if (num_rings) {
+		if (num_engines) {
 			struct intel_engine_cs *signaller;
 			i915_reg_t last_reg = {}; /* keep gcc quiet */
 
-			*cs++ = MI_LOAD_REGISTER_IMM(num_rings);
+			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
 			for_each_engine(signaller, i915, id) {
 				if (signaller == engine)
 					continue;
@@ -1859,8 +1859,8 @@ static int switch_context(struct i915_request *rq)
 				goto err;
 		} while (--loops);
 
-		if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
-			unwind_mm = intel_engine_flag(engine);
+		if (ppgtt->pd_dirty_rings & engine->mask) {
+			unwind_mm = engine->mask;
 			ppgtt->pd_dirty_rings &= ~unwind_mm;
 			hw_flags = MI_FORCE_RESTORE;
 		}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2e7264119ec4..850ad892836c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -10,12 +10,12 @@
 #include <linux/seqlock.h>
 
 #include "i915_gem_batch_pool.h"
-
-#include "i915_reg.h"
 #include "i915_pmu.h"
+#include "i915_reg.h"
 #include "i915_request.h"
 #include "i915_selftest.h"
 #include "i915_timeline.h"
+#include "intel_device_info.h"
 #include "intel_gpu_commands.h"
 #include "intel_workarounds.h"
 
@@ -334,6 +334,7 @@ struct intel_engine_cs {
 	enum intel_engine_id id;
 	unsigned int hw_id;
 	unsigned int guc_id;
+	intel_engine_mask_t mask;
 
 	u8 uabi_id;
 	u8 uabi_class;
@@ -668,12 +669,6 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
 	return port;
 }
 
-static inline unsigned int
-intel_engine_flag(const struct intel_engine_cs *engine)
-{
-	return BIT(engine->id);
-}
-
 static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index c7ea3535264a..1c608b60c55c 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -556,7 +556,7 @@ static int igt_ctx_exec(void *arg)
 		ncontexts++;
 	}
 	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
-		ncontexts, RUNTIME_INFO(i915)->num_rings, ndwords);
+		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
 
 	dw = 0;
 	list_for_each_entry(obj, &objects, st_link) {
@@ -1126,7 +1126,7 @@ static int igt_ctx_readonly(void *arg)
 		}
 	}
 	pr_info("Submitted %lu dwords (across %u engines)\n",
-		ndwords, RUNTIME_INFO(i915)->num_rings);
+		ndwords, RUNTIME_INFO(i915)->num_engines);
 
 	dw = 0;
 	list_for_each_entry(obj, &objects, st_link) {
@@ -1459,7 +1459,7 @@ static int igt_vm_isolation(void *arg)
 		count += this;
 	}
 	pr_info("Checked %lu scratch offsets across %d engines\n",
-		count, RUNTIME_INFO(i915)->num_rings);
+		count, RUNTIME_INFO(i915)->num_engines);
 
 out_rpm:
 	intel_runtime_pm_put(i915, wakeref);
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 7e1b65b8eb19..29875c28a544 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -1216,7 +1216,7 @@ static int live_breadcrumbs_smoketest(void *arg)
 		num_fences += atomic_long_read(&t[id].num_fences);
 	}
 	pr_info("Completed %lu waits for %lu fences across %d engines and %d cpus\n",
-		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
+		num_waits, num_fences, RUNTIME_INFO(i915)->num_engines, ncpus);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	ret = igt_live_test_end(&live) ?: ret;
diff --git a/drivers/gpu/drm/i915/selftests/intel_guc.c b/drivers/gpu/drm/i915/selftests/intel_guc.c
index c5e0a0e98fcb..b05a21eaa8f4 100644
--- a/drivers/gpu/drm/i915/selftests/intel_guc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_guc.c
@@ -111,7 +111,7 @@ static int validate_client(struct intel_guc_client *client,
 			dev_priv->preempt_context : dev_priv->kernel_context;
 
 	if (client->owner != ctx_owner ||
-	    client->engines != INTEL_INFO(dev_priv)->ring_mask ||
+	    client->engines != INTEL_INFO(dev_priv)->engine_mask ||
 	    client->priority != client_priority ||
 	    client->doorbell_id == GUC_DOORBELL_INVALID)
 		return -EINVAL;
@@ -261,7 +261,7 @@ static int igt_guc_doorbells(void *arg)
 
 	for (i = 0; i < ATTEMPTS; i++) {
 		clients[i] = guc_client_alloc(dev_priv,
-					      INTEL_INFO(dev_priv)->ring_mask,
+					      INTEL_INFO(dev_priv)->engine_mask,
 					      i % GUC_CLIENT_PRIORITY_NUM,
 					      dev_priv->kernel_context);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 12e047328ab8..5624a238244e 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1358,7 +1358,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 out_reset:
 	igt_global_reset_lock(i915);
-	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
+	fake_hangcheck(rq->i915, rq->engine->mask);
 	igt_global_reset_unlock(i915);
 
 	if (tsk) {
@@ -1643,7 +1643,7 @@ static int igt_handle_error(void *arg)
 	/* Temporarily disable error capture */
 	error = xchg(&i915->gpu_error.first_error, (void *)-1);
 
-	i915_handle_error(i915, ENGINE_MASK(engine->id), 0, NULL);
+	i915_handle_error(i915, engine->mask, 0, NULL);
 
 	xchg(&i915->gpu_error.first_error, error);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 0677038a5466..565e949a4722 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -929,7 +929,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
 
 	pr_info("Submitted %lu crescendo:%x requests across %d engines and %d contexts\n",
 		count, flags,
-		RUNTIME_INFO(smoke->i915)->num_rings, smoke->ncontext);
+		RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext);
 	return 0;
 }
 
@@ -957,7 +957,7 @@ static int smoke_random(struct preempt_smoke *smoke, unsigned int flags)
 
 	pr_info("Submitted %lu random:%x requests across %d engines and %d contexts\n",
 		count, flags,
-		RUNTIME_INFO(smoke->i915)->num_rings, smoke->ncontext);
+		RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext);
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index 33b3ced83fde..79a836d21abf 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -222,7 +222,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
 
 static int do_device_reset(struct intel_engine_cs *engine)
 {
-	i915_reset(engine->i915, ENGINE_MASK(engine->id), "live_workarounds");
+	i915_reset(engine->i915, engine->mask, "live_workarounds");
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index ec1ae948954c..c2c954f64226 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -223,6 +223,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.i915 = i915;
 	snprintf(engine->base.name, sizeof(engine->base.name), "%s", name);
 	engine->base.id = id;
+	engine->base.mask = BIT(id);
 	engine->base.status_page.addr = (void *)(engine + 1);
 
 	engine->base.context_pin = mock_context_pin;
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index c27616efc4f8..8581cf5e0e8c 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -206,7 +206,7 @@ struct drm_i915_private *mock_gem_device(void)
 
 	mock_init_ggtt(i915, &i915->ggtt);
 
-	mkwrite_device_info(i915)->ring_mask = BIT(0);
+	mkwrite_device_info(i915)->engine_mask = BIT(0);
 	i915->kernel_context = mock_context(i915, NULL);
 	if (!i915->kernel_context)
 		goto err_unlock;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 11/38] drm/i915: Refactor common code to load initial power context
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (8 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 10/38] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 12/38] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
                   ` (30 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

We load a context (the kernel context) on both module load and resume in
order to initialise some logical state onto the GPU. We can use the same
routine for both operations, which will become more useful as we
refactor rc6/rps enabling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 48 ++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 503b02525c99..7ec2f68218fc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2904,6 +2904,22 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
 	return true;
 }
 
+static bool load_power_context(struct drm_i915_private *i915)
+{
+	if (!switch_to_kernel_context_sync(i915))
+		return false;
+
+	/*
+	 * Immediately park the GPU so that we enable powersaving and
+	 * treat it as idle. The next time we issue a request, we will
+	 * unpark and start using the engine->pinned_default_state, otherwise
+	 * it is in limbo and an early reset may fail.
+	 */
+	__i915_gem_park(i915);
+
+	return true;
+}
+
 static void
 i915_gem_idle_work_handler(struct work_struct *work)
 {
@@ -4486,7 +4502,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	intel_uc_resume(i915);
 
 	/* Always reload a context for powersaving. */
-	if (i915_gem_switch_to_kernel_context(i915))
+	if (!load_power_context(i915))
 		goto err_wedged;
 
 out_unlock:
@@ -4651,7 +4667,7 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 	struct i915_gem_context *ctx;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
-	int err;
+	int err = 0;
 
 	/*
 	 * As we reset the gpu during very early sanitisation, the current
@@ -4684,19 +4700,12 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 			goto err_active;
 	}
 
-	if (!switch_to_kernel_context_sync(i915)) {
-		err = -EIO; /* Caller will declare us wedged */
+	/* Flush the default context image to memory, and enable powersaving. */
+	if (!load_power_context(i915)) {
+		err = -EIO;
 		goto err_active;
 	}
 
-	/*
-	 * Immediately park the GPU so that we enable powersaving and
-	 * treat it as idle. The next time we issue a request, we will
-	 * unpark and start using the engine->pinned_default_state, otherwise
-	 * it is in limbo and an early reset may fail.
-	 */
-	__i915_gem_park(i915);
-
 	for_each_engine(engine, i915, id) {
 		struct i915_vma *state;
 		void *vaddr;
@@ -4762,19 +4771,10 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 err_active:
 	/*
 	 * If we have to abandon now, we expect the engines to be idle
-	 * and ready to be torn-down. First try to flush any remaining
-	 * request, ensure we are pointing at the kernel context and
-	 * then remove it.
+	 * and ready to be torn-down. The quickest way we can accomplish
+	 * this is by declaring ourselves wedged.
 	 */
-	if (WARN_ON(i915_gem_switch_to_kernel_context(i915)))
-		goto out_ctx;
-
-	if (WARN_ON(i915_gem_wait_for_idle(i915,
-					   I915_WAIT_LOCKED,
-					   MAX_SCHEDULE_TIMEOUT)))
-		goto out_ctx;
-
-	i915_gem_contexts_lost(i915);
+	i915_gem_set_wedged(i915);
 	goto out_ctx;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 12/38] drm/i915: Reduce presumption of request ordering for barriers
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (9 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 11/38] drm/i915: Refactor common code to load initial power context Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 13/38] drm/i915: Remove has-kernel-context Chris Wilson
                   ` (29 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Currently we assume that we know the order in which requests run and so
can determine if we need to reissue a switch-to-kernel-context prior to
idling. That assumption does not hold for the future, so instead of
tracking which barriers have been used, simply determine if we have ever
switched away from the kernel context by using the engine and before
idling ensure that all engines that have been used since the last idle
are synchronously switched back to the kernel context for safety (and
else of shrinking memory while idle).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_gem.c               | 13 ++--
 drivers/gpu/drm/i915/i915_gem_context.c       | 66 +------------------
 drivers/gpu/drm/i915/i915_gem_context.h       |  3 +-
 drivers/gpu/drm/i915/i915_gem_evict.c         |  2 +-
 drivers/gpu/drm/i915/i915_request.c           |  1 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |  5 ++
 .../gpu/drm/i915/selftests/i915_gem_context.c |  3 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  4 ++
 10 files changed, 28 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0dd680cdb9ce..195e71bb4a4f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1994,6 +1994,7 @@ struct drm_i915_private {
 
 		struct list_head active_rings;
 		struct list_head closed_vma;
+		unsigned long active_engines;
 		u32 active_requests;
 
 		/**
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7ec2f68218fc..d17a99f59374 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2889,9 +2889,10 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	}
 }
 
-static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
+static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
+					  unsigned long mask)
 {
-	if (i915_gem_switch_to_kernel_context(i915))
+	if (i915_gem_switch_to_kernel_context(i915, mask))
 		return false;
 
 	if (i915_gem_wait_for_idle(i915,
@@ -2906,7 +2907,8 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
 
 static bool load_power_context(struct drm_i915_private *i915)
 {
-	if (!switch_to_kernel_context_sync(i915))
+	/* Force loading the kernel context on all engines */
+	if (!switch_to_kernel_context_sync(i915, -1))
 		return false;
 
 	/*
@@ -2954,7 +2956,8 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {
 		++gt->active_requests; /* don't requeue idle */
 
-		if (!switch_to_kernel_context_sync(i915)) {
+		if (!switch_to_kernel_context_sync(i915,
+						   i915->gt.active_engines)) {
 			dev_err(i915->drm.dev,
 				"Failed to idle engines, declaring wedged!\n");
 			GEM_TRACE_DUMP();
@@ -4411,7 +4414,7 @@ void i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!switch_to_kernel_context_sync(i915)) {
+	if (!switch_to_kernel_context_sync(i915, i915->gt.active_engines)) {
 		/* Forcibly cancel outstanding work and leave the gpu quiet. */
 		i915_gem_set_wedged(i915);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3c6edebd595b..004ffcfb305d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -702,63 +702,10 @@ last_request_on_engine(struct i915_timeline *timeline,
 	return NULL;
 }
 
-static bool engine_has_kernel_context_barrier(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	const struct intel_context * const ce =
-		to_intel_context(i915->kernel_context, engine);
-	struct i915_timeline *barrier = ce->ring->timeline;
-	struct intel_ring *ring;
-	bool any_active = false;
-
-	lockdep_assert_held(&i915->drm.struct_mutex);
-	list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
-		struct i915_request *rq;
-
-		rq = last_request_on_engine(ring->timeline, engine);
-		if (!rq)
-			continue;
-
-		any_active = true;
-
-		if (rq->hw_context == ce)
-			continue;
-
-		/*
-		 * Was this request submitted after the previous
-		 * switch-to-kernel-context?
-		 */
-		if (!i915_timeline_sync_is_later(barrier, &rq->fence)) {
-			GEM_TRACE("%s needs barrier for %llx:%lld\n",
-				  ring->timeline->name,
-				  rq->fence.context,
-				  rq->fence.seqno);
-			return false;
-		}
-
-		GEM_TRACE("%s has barrier after %llx:%lld\n",
-			  ring->timeline->name,
-			  rq->fence.context,
-			  rq->fence.seqno);
-	}
-
-	/*
-	 * If any other timeline was still active and behind the last barrier,
-	 * then our last switch-to-kernel-context must still be queued and
-	 * will run last (leaving the engine in the kernel context when it
-	 * eventually idles).
-	 */
-	if (any_active)
-		return true;
-
-	/* The engine is idle; check that it is idling in the kernel context. */
-	return engine->last_retired_context == ce;
-}
-
-int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
+int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
+				      unsigned long mask)
 {
 	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
 
 	GEM_TRACE("awake?=%s\n", yesno(i915->gt.awake));
 
@@ -769,17 +716,11 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
 	if (i915_terminally_wedged(i915))
 		return 0;
 
-	i915_retire_requests(i915);
-
-	for_each_engine(engine, i915, id) {
+	for_each_engine_masked(engine, i915, mask, mask) {
 		struct intel_ring *ring;
 		struct i915_request *rq;
 
 		GEM_BUG_ON(!to_intel_context(i915->kernel_context, engine));
-		if (engine_has_kernel_context_barrier(engine))
-			continue;
-
-		GEM_TRACE("emit barrier on %s\n", engine->name);
 
 		rq = i915_request_alloc(engine, i915->kernel_context);
 		if (IS_ERR(rq))
@@ -803,7 +744,6 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
 			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
 							 &prev->submit,
 							 I915_FENCE_GFP);
-			i915_timeline_sync_set(rq->timeline, &prev->fence);
 		}
 
 		i915_request_add(rq);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index be63666ffaac..c39dbb32a5c6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -372,7 +372,8 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 void i915_gem_context_close(struct drm_file *file);
 
 int i915_switch_context(struct i915_request *rq);
-int i915_gem_switch_to_kernel_context(struct drm_i915_private *dev_priv);
+int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
+				      unsigned long engine_mask);
 
 void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 68d74c50ac39..7d8e90dfca84 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -62,7 +62,7 @@ static int ggtt_flush(struct drm_i915_private *i915)
 	 * the hopes that we can then remove contexts and the like only
 	 * bound by their active reference.
 	 */
-	err = i915_gem_switch_to_kernel_context(i915);
+	err = i915_gem_switch_to_kernel_context(i915, i915->gt.active_engines);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index bcf3c1a155e2..9d111eedad5a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1066,6 +1066,7 @@ void i915_request_add(struct i915_request *request)
 		GEM_TRACE("marking %s as active\n", ring->timeline->name);
 		list_add(&ring->active_link, &request->i915->gt.active_rings);
 	}
+	request->i915->gt.active_engines |= request->engine->mask;
 	request->emitted_jiffies = jiffies;
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 8226871c7781..e283ea693576 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1116,6 +1116,9 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
+	if (!engine->context_size)
+		return true;
+
 	/*
 	 * Check the last context seen by the engine. If active, it will be
 	 * the last request that remains in the timeline. When idle, it is
@@ -1215,6 +1218,8 @@ void intel_engines_park(struct drm_i915_private *i915)
 		i915_gem_batch_pool_fini(&engine->batch_pool);
 		engine->execlists.no_priolist = false;
 	}
+
+	i915->gt.active_engines = 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 1c608b60c55c..7ae5033457b6 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1512,7 +1512,8 @@ static int __igt_switch_to_kernel_context(struct drm_i915_private *i915,
 			}
 		}
 
-		err = i915_gem_switch_to_kernel_context(i915);
+		err = i915_gem_switch_to_kernel_context(i915,
+							i915->gt.active_engines);
 		if (err)
 			return err;
 
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index e0d3122fd35a..94aee4071a66 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -14,7 +14,7 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 	cond_resched();
 
 	if (flags & I915_WAIT_LOCKED &&
-	    i915_gem_switch_to_kernel_context(i915)) {
+	    i915_gem_switch_to_kernel_context(i915, i915->gt.active_engines)) {
 		pr_err("Failed to switch back to kernel context; declaring wedged\n");
 		i915_gem_set_wedged(i915);
 	}
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 8581cf5e0e8c..ce384e659220 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -109,6 +109,10 @@ static void mock_retire_work_handler(struct work_struct *work)
 
 static void mock_idle_work_handler(struct work_struct *work)
 {
+	struct drm_i915_private *i915 =
+		container_of(work, typeof(*i915), gt.idle_work.work);
+
+	i915->gt.active_engines = 0;
 }
 
 static int pm_domain_resume(struct device *dev)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 13/38] drm/i915: Remove has-kernel-context
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (10 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 12/38] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method Chris Wilson
                   ` (28 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

We can no longer assume execution ordering, and in particular we cannot
assume which context will execute last. One side-effect of this is that
we cannot determine if the kernel-context is resident on the GPU, so
remove the routines that claimed to do so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.h      | 13 -----------
 drivers/gpu/drm/i915/i915_gem.c         | 18 --------------
 drivers/gpu/drm/i915/i915_gem_evict.c   | 16 +++----------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 31 -------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 3 insertions(+), 76 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 5fbd9102384b..a049ccd478c6 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -108,19 +108,6 @@ i915_active_request_set_retire_fn(struct i915_active_request *active,
 	active->retire = fn ?: i915_active_retire_noop;
 }
 
-static inline struct i915_request *
-__i915_active_request_peek(const struct i915_active_request *active)
-{
-	/*
-	 * Inside the error capture (running with the driver in an unknown
-	 * state), we want to bend the rules slightly (a lot).
-	 *
-	 * Work is in progress to make it safer, in the meantime this keeps
-	 * the known issue from spamming the logs.
-	 */
-	return rcu_dereference_protected(active->request, 1);
-}
-
 /**
  * i915_active_request_raw - return the active request
  * @active - the active tracker
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d17a99f59374..3204f79d7aa6 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2872,23 +2872,6 @@ i915_gem_retire_work_handler(struct work_struct *work)
 				   round_jiffies_up_relative(HZ));
 }
 
-static void assert_kernel_context_is_current(struct drm_i915_private *i915)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	if (i915_reset_failed(i915))
-		return;
-
-	i915_retire_requests(i915);
-
-	for_each_engine(engine, i915, id) {
-		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
-		GEM_BUG_ON(engine->last_retired_context !=
-			   to_intel_context(i915->kernel_context, engine));
-	}
-}
-
 static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
 					  unsigned long mask)
 {
@@ -2901,7 +2884,6 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
 				   HZ / 10))
 		return false;
 
-	assert_kernel_context_is_current(i915);
 	return true;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 7d8e90dfca84..060f5903544a 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -38,25 +38,15 @@ I915_SELFTEST_DECLARE(static struct igt_evict_ctl {
 
 static bool ggtt_is_idle(struct drm_i915_private *i915)
 {
-       struct intel_engine_cs *engine;
-       enum intel_engine_id id;
-
-       if (i915->gt.active_requests)
-	       return false;
-
-       for_each_engine(engine, i915, id) {
-	       if (!intel_engine_has_kernel_context(engine))
-		       return false;
-       }
-
-       return true;
+	return !i915->gt.active_requests;
 }
 
 static int ggtt_flush(struct drm_i915_private *i915)
 {
 	int err;
 
-	/* Not everything in the GGTT is tracked via vma (otherwise we
+	/*
+	 * Not everything in the GGTT is tracked via vma (otherwise we
 	 * could evict as required with minimal stalling) so we are forced
 	 * to idle the GPU and explicitly retire outstanding requests in
 	 * the hopes that we can then remove contexts and the like only
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e283ea693576..c7839c61bb8a 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1100,37 +1100,6 @@ bool intel_engines_are_idle(struct drm_i915_private *i915)
 	return true;
 }
 
-/**
- * intel_engine_has_kernel_context:
- * @engine: the engine
- *
- * Returns true if the last context to be executed on this engine, or has been
- * executed if the engine is already idle, is the kernel context
- * (#i915.kernel_context).
- */
-bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
-{
-	const struct intel_context *kernel_context =
-		to_intel_context(engine->i915->kernel_context, engine);
-	struct i915_request *rq;
-
-	lockdep_assert_held(&engine->i915->drm.struct_mutex);
-
-	if (!engine->context_size)
-		return true;
-
-	/*
-	 * Check the last context seen by the engine. If active, it will be
-	 * the last request that remains in the timeline. When idle, it is
-	 * the last executed context as tracked by retirement.
-	 */
-	rq = __i915_active_request_peek(&engine->timeline.last_request);
-	if (rq)
-		return rq->hw_context == kernel_context;
-	else
-		return engine->last_retired_context == kernel_context;
-}
-
 void intel_engines_reset_default_submission(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 850ad892836c..22a345bea13c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -936,7 +936,6 @@ void intel_engines_sanitize(struct drm_i915_private *i915, bool force);
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
 bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 
-bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine);
 void intel_engine_lost_context(struct intel_engine_cs *engine);
 
 void intel_engines_park(struct drm_i915_private *i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (11 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 13/38] drm/i915: Remove has-kernel-context Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:39   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 15/38] drm/i915: Track active engines within a context Chris Wilson
                   ` (27 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile               |  1 +
 drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
 drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
 include/uapi/drm/i915_drm.h                 | 20 ++++++++++
 5 files changed, 91 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a1d834068765..89105b1aaf12 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -46,6 +46,7 @@ i915-y := i915_drv.o \
 	  i915_sw_fence.o \
 	  i915_syncmap.o \
 	  i915_sysfs.o \
+	  i915_user_extensions.o \
 	  intel_csr.o \
 	  intel_device_info.o \
 	  intel_pm.o \
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
new file mode 100644
index 000000000000..879b4094b2d7
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.c
@@ -0,0 +1,43 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include <linux/sched/signal.h>
+#include <linux/uaccess.h>
+#include <uapi/drm/i915_drm.h>
+
+#include "i915_user_extensions.h"
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned long count,
+			 void *data)
+{
+	unsigned int stackdepth = 512;
+
+	while (ext) {
+		int err;
+		u64 x;
+
+		if (!stackdepth--) /* recursion vs useful flexibility */
+			return -EINVAL;
+
+		if (get_user(x, &ext->name))
+			return -EFAULT;
+
+		err = -EINVAL;
+		if (x < count && tbl[x])
+			err = tbl[x](ext, data);
+		if (err)
+			return err;
+
+		if (get_user(x, &ext->next_extension))
+			return -EFAULT;
+
+		ext = u64_to_user_ptr(x);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
new file mode 100644
index 000000000000..313a510b068a
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.h
@@ -0,0 +1,20 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef I915_USER_EXTENSIONS_H
+#define I915_USER_EXTENSIONS_H
+
+struct i915_user_extension;
+
+typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
+				      void *data);
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned long count,
+			 void *data);
+
+#endif /* I915_USER_EXTENSIONS_H */
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 9726df37c4c4..fcc751aa1ea8 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -105,6 +105,13 @@
 	__T;								\
 })
 
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })
+
 static inline u64 ptr_to_u64(const void *ptr)
 {
 	return (uintptr_t)ptr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b10eea3f6d24..f4fa0825722a 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -62,6 +62,26 @@ extern "C" {
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * i915_user_extension: Base class for defining a chain of extensions
+ *
+ * Many interfaces need to grow over time. In most cases we can simply
+ * extend the struct and have userspace pass in more data. Another option,
+ * as demonstrated by Vulkan's approach to providing extensions for forward
+ * and backward compatibility, is to use a list of optional structs to
+ * provide those extra details.
+ *
+ * The key advantage to using an extension chain is that it allows us to
+ * redefine the interface more easily than an ever growing struct of
+ * increasing complexity, and for large parts of that interface to be
+ * entirely optional. The downside is more pointer chasing; chasing across
+ * the __user boundary with pointers encapsulated inside u64.
+ */
+struct i915_user_extension {
+	__u64 next_extension;
+	__u64 name;
+};
+
 /*
  * MOCS indexes used for GPU surfaces, defining the cacheability of the
  * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 15/38] drm/i915: Track active engines within a context
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (12 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:46   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 16/38] drm/i915: Introduce a context barrier callback Chris Wilson
                   ` (26 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

For use in the next patch, if we track which engines have been used by
the HW, we can reduce the work required to flush our state off the HW to
those engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c           | 18 +++++----------
 drivers/gpu/drm/i915/i915_gem_context.c       |  5 +++++
 drivers/gpu/drm/i915/i915_gem_context.h       |  5 +++++
 drivers/gpu/drm/i915/intel_lrc.c              | 22 +++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 14 +++++++-----
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 ++
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  6 +++++
 7 files changed, 43 insertions(+), 29 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 298371aad445..36c63b087ffd 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -388,12 +388,9 @@ static void print_context_stats(struct seq_file *m,
 	struct i915_gem_context *ctx;
 
 	list_for_each_entry(ctx, &i915->contexts.list, link) {
-		struct intel_engine_cs *engine;
-		enum intel_engine_id id;
-
-		for_each_engine(engine, i915, id) {
-			struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce;
 
+		list_for_each_entry(ce, &ctx->active_engines, active_link) {
 			if (ce->state)
 				per_file_stats(0, ce->state->obj, &kstats);
 			if (ce->ring)
@@ -1880,9 +1877,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
 {
 	struct drm_i915_private *dev_priv = node_to_i915(m->private);
 	struct drm_device *dev = &dev_priv->drm;
-	struct intel_engine_cs *engine;
 	struct i915_gem_context *ctx;
-	enum intel_engine_id id;
 	int ret;
 
 	ret = mutex_lock_interruptible(&dev->struct_mutex);
@@ -1890,6 +1885,8 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		return ret;
 
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
+		struct intel_context *ce;
+
 		seq_puts(m, "HW context ");
 		if (!list_empty(&ctx->hw_id_link))
 			seq_printf(m, "%x [pin %u]", ctx->hw_id,
@@ -1912,11 +1909,8 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		seq_putc(m, ctx->remap_slice ? 'R' : 'r');
 		seq_putc(m, '\n');
 
-		for_each_engine(engine, dev_priv, id) {
-			struct intel_context *ce =
-				to_intel_context(ctx, engine);
-
-			seq_printf(m, "%s: ", engine->name);
+		list_for_each_entry(ce, &ctx->active_engines, active_link) {
+			seq_printf(m, "%s: ", ce->engine->name);
 			if (ce->state)
 				describe_obj(m, ce->state->obj);
 			if (ce->ring)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 004ffcfb305d..3b5145b30d85 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -224,6 +224,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
+	GEM_BUG_ON(!list_empty(&ctx->active_engines));
 
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
@@ -239,6 +240,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	put_pid(ctx->pid);
 
 	list_del(&ctx->link);
+	mutex_destroy(&ctx->mutex);
 
 	kfree_rcu(ctx, rcu);
 }
@@ -351,6 +353,7 @@ intel_context_init(struct intel_context *ce,
 		   struct intel_engine_cs *engine)
 {
 	ce->gem_context = ctx;
+	ce->engine = engine;
 
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
@@ -379,6 +382,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	list_add_tail(&ctx->link, &dev_priv->contexts.list);
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
+	INIT_LIST_HEAD(&ctx->active_engines);
+	mutex_init(&ctx->mutex);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index c39dbb32a5c6..df48013b581e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -163,6 +163,9 @@ struct i915_gem_context {
 	atomic_t hw_id_pin_count;
 	struct list_head hw_id_link;
 
+	struct list_head active_engines;
+	struct mutex mutex;
+
 	/**
 	 * @user_handle: userspace identifier
 	 *
@@ -176,7 +179,9 @@ struct i915_gem_context {
 	/** engine: per-engine logical HW state */
 	struct intel_context {
 		struct i915_gem_context *gem_context;
+		struct intel_engine_cs *engine;
 		struct intel_engine_cs *active;
+		struct list_head active_link;
 		struct list_head signal_link;
 		struct list_head signals;
 		struct i915_vma *state;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2268860cca44..50a7c87705c8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1276,6 +1276,7 @@ static void execlists_context_unpin(struct intel_context *ce)
 	i915_gem_object_unpin_map(ce->state->obj);
 	i915_vma_unpin(ce->state);
 
+	list_del(&ce->active_link);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -1360,6 +1361,11 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	__execlists_update_reg_state(engine, ce);
 
 	ce->state->obj->pin_global++;
+
+	mutex_lock(&ctx->mutex);
+	list_add(&ce->active_link, &ctx->active_engines);
+	mutex_unlock(&ctx->mutex);
+
 	i915_gem_context_get(ctx);
 	return ce;
 
@@ -2880,9 +2886,8 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 
 void intel_lr_context_resume(struct drm_i915_private *i915)
 {
-	struct intel_engine_cs *engine;
 	struct i915_gem_context *ctx;
-	enum intel_engine_id id;
+	struct intel_context *ce;
 
 	/*
 	 * Because we emit WA_TAIL_DWORDS there may be a disparity
@@ -2896,17 +2901,10 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	 * simplicity, we just zero everything out.
 	 */
 	list_for_each_entry(ctx, &i915->contexts.list, link) {
-		for_each_engine(engine, i915, id) {
-			struct intel_context *ce =
-				to_intel_context(ctx, engine);
-
-			if (!ce->state)
-				continue;
-
+		list_for_each_entry(ce, &ctx->active_engines, active_link) {
+			GEM_BUG_ON(!ce->ring);
 			intel_ring_reset(ce->ring, 0);
-
-			if (ce->pin_count) /* otherwise done in context_pin */
-				__execlists_update_reg_state(engine, ce);
+			__execlists_update_reg_state(ce->engine, ce);
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index e78594ecba6f..764dcc5d5856 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1430,6 +1430,7 @@ static void intel_ring_context_unpin(struct intel_context *ce)
 	__context_unpin_ppgtt(ce->gem_context);
 	__context_unpin(ce);
 
+	list_del(&ce->active_link);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -1509,6 +1510,10 @@ __ring_context_pin(struct intel_engine_cs *engine,
 {
 	int err;
 
+	/* One ringbuffer to rule them all */
+	GEM_BUG_ON(!engine->buffer);
+	ce->ring = engine->buffer;
+
 	if (!ce->state && engine->context_size) {
 		struct i915_vma *vma;
 
@@ -1529,12 +1534,11 @@ __ring_context_pin(struct intel_engine_cs *engine,
 	if (err)
 		goto err_unpin;
 
-	i915_gem_context_get(ctx);
-
-	/* One ringbuffer to rule them all */
-	GEM_BUG_ON(!engine->buffer);
-	ce->ring = engine->buffer;
+	mutex_lock(&ctx->mutex);
+	list_add(&ce->active_link, &ctx->active_engines);
+	mutex_unlock(&ctx->mutex);
 
+	i915_gem_context_get(ctx);
 	return ce;
 
 err_unpin:
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index b646cdcdd602..353b37b9f78e 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -44,6 +44,8 @@ mock_context(struct drm_i915_private *i915,
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
+	INIT_LIST_HEAD(&ctx->active_engines);
+	mutex_init(&ctx->mutex);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index c2c954f64226..8032a8a9542f 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -125,6 +125,7 @@ static void hw_delay_complete(struct timer_list *t)
 static void mock_context_unpin(struct intel_context *ce)
 {
 	mock_timeline_unpin(ce->ring->timeline);
+	list_del(&ce->active_link);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -160,6 +161,11 @@ mock_context_pin(struct intel_engine_cs *engine,
 	mock_timeline_pin(ce->ring->timeline);
 
 	ce->ops = &mock_context_ops;
+
+	mutex_lock(&ctx->mutex);
+	list_add(&ce->active_link, &ctx->active_engines);
+	mutex_unlock(&ctx->mutex);
+
 	i915_gem_context_get(ctx);
 	return ce;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 16/38] drm/i915: Introduce a context barrier callback
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (13 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 15/38] drm/i915: Track active engines within a context Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 16:12   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
                   ` (25 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to update live state within a context.
As this state may be in use by the GPU and we haven't been explicitly
tracking its activity, we instead attach it to a request we send down
the context setup with its new state and on retiring that request
cleanup the old state as we then know that it is no longer live.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       |  74 +++++++++++++
 .../gpu/drm/i915/selftests/i915_gem_context.c | 103 ++++++++++++++++++
 2 files changed, 177 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 3b5145b30d85..91926a407548 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -707,6 +707,80 @@ last_request_on_engine(struct i915_timeline *timeline,
 	return NULL;
 }
 
+struct context_barrier_task {
+	struct i915_active base;
+	void (*task)(void *data);
+	void *data;
+};
+
+static void cb_retire(struct i915_active *base)
+{
+	struct context_barrier_task *cb = container_of(base, typeof(*cb), base);
+
+	if (cb->task)
+		cb->task(cb->data);
+
+	i915_active_fini(&cb->base);
+	kfree(cb);
+}
+
+I915_SELFTEST_DECLARE(static unsigned long context_barrier_inject_fault);
+static int context_barrier_task(struct i915_gem_context *ctx,
+				unsigned long engines,
+				void (*task)(void *data),
+				void *data)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct context_barrier_task *cb;
+	struct intel_context *ce;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+	GEM_BUG_ON(!task);
+
+	cb = kmalloc(sizeof(*cb), GFP_KERNEL);
+	if (!cb)
+		return -ENOMEM;
+
+	i915_active_init(i915, &cb->base, cb_retire);
+	i915_active_acquire(&cb->base);
+
+	wakeref = intel_runtime_pm_get(i915);
+	list_for_each_entry(ce, &ctx->active_engines, active_link) {
+		struct intel_engine_cs *engine = ce->engine;
+		struct i915_request *rq;
+
+		if (!(ce->engine->mask & engines))
+			continue;
+
+		if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
+				       engine->mask)) {
+			err = -ENXIO;
+			break;
+		}
+
+		rq = i915_request_alloc(engine, ctx);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+
+		err = i915_active_ref(&cb->base, rq->fence.context, rq);
+		i915_request_add(rq);
+		if (err)
+			break;
+	}
+	intel_runtime_pm_put(i915, wakeref);
+
+	cb->task = err ? NULL : task; /* caller needs to unwind instead */
+	cb->data = data;
+
+	i915_active_release(&cb->base);
+
+	return err;
+}
+
 int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 				      unsigned long mask)
 {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 7ae5033457b6..4f7c04247354 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1594,10 +1594,113 @@ static int igt_switch_to_kernel_context(void *arg)
 	return err;
 }
 
+static void mock_barrier_task(void *data)
+{
+	unsigned int *counter = data;
+
+	++*counter;
+}
+
+static int mock_context_barrier(void *arg)
+{
+#undef pr_fmt
+#define pr_fmt(x) "context_barrier_task():" # x
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq;
+	intel_wakeref_t wakeref;
+	unsigned int counter;
+	int err;
+
+	/*
+	 * The context barrier provides us with a callback after it emits
+	 * a request; useful for retiring old state after loading new.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	ctx = mock_context(i915, "mock");
+	if (IS_ERR(ctx)) {
+		err = PTR_ERR(ctx);
+		goto unlock;
+	}
+
+	counter = 0;
+	err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	if (counter == 0) {
+		pr_err("Did not retire immediately with 0 engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	counter = 0;
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	if (counter == 0) {
+		pr_err("Did not retire immediately for all inactive engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	rq = ERR_PTR(-ENODEV);
+	with_intel_runtime_pm(i915, wakeref)
+		rq = i915_request_alloc(i915->engine[RCS], ctx);
+	if (IS_ERR(rq)) {
+		pr_err("Request allocation failed!\n");
+		goto out;
+	}
+	i915_request_add(rq);
+	GEM_BUG_ON(list_empty(&ctx->active_engines));
+
+	counter = 0;
+	context_barrier_inject_fault = BIT(RCS);
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	context_barrier_inject_fault = 0;
+	if (err == -ENXIO)
+		err = 0;
+	else
+		pr_err("Did not hit fault injection!\n");
+	if (counter != 0) {
+		pr_err("Invoked callback on error!\n");
+		err = -EIO;
+	}
+	if (err)
+		goto out;
+
+	counter = 0;
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	mock_device_flush(i915);
+	if (counter == 0) {
+		pr_err("Did not retire on each active engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	mock_context_close(ctx);
+unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+#undef pr_fmt
+#define pr_fmt(x) x
+}
+
 int i915_gem_context_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_switch_to_kernel_context),
+		SUBTEST(mock_context_barrier),
 	};
 	struct drm_i915_private *i915;
 	int err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (14 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 16/38] drm/i915: Introduce a context barrier callback Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-06 11:27   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
                   ` (24 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c               |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
 .../gpu/drm/i915/selftests/i915_gem_context.c | 239 ++++++++++++----
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
 drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
 include/uapi/drm/i915_drm.h                   |  36 +++
 11 files changed, 511 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 224bb96b7877..6b75d1b7b8bd 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3008,6 +3008,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 195e71bb4a4f..cbc2b59722ae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -218,6 +218,9 @@ struct drm_i915_file_private {
 	} mm;
 	struct idr context_idr;
 
+	struct mutex vm_lock;
+	struct idr vm_idr;
+
 	unsigned int bsd_engine;
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 91926a407548..8c35b6019f0d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -124,6 +124,8 @@ static void lut_close(struct i915_gem_context *ctx)
 		struct i915_vma *vma = rcu_dereference_raw(*slot);
 
 		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
+
+		vma->open_count--;
 		__i915_gem_object_release_unless_active(vma->obj);
 	}
 	rcu_read_unlock();
@@ -308,7 +310,7 @@ static void context_close(struct i915_gem_context *ctx)
 	 */
 	lut_close(ctx);
 	if (ctx->ppgtt)
-		i915_ppgtt_close(&ctx->ppgtt->vm);
+		i915_ppgtt_close(ctx->ppgtt);
 
 	ctx->file_priv = ERR_PTR(-EBADF);
 	i915_gem_context_put(ctx);
@@ -447,6 +449,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
 	context_close(ctx);
 }
 
+static struct i915_hw_ppgtt *
+__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_hw_ppgtt *old = ctx->ppgtt;
+
+	i915_ppgtt_open(ppgtt);
+	ctx->ppgtt = i915_ppgtt_get(ppgtt);
+
+	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
+
+	return old;
+}
+
+static void __assign_ppgtt(struct i915_gem_context *ctx,
+			   struct i915_hw_ppgtt *ppgtt)
+{
+	if (ppgtt == ctx->ppgtt)
+		return;
+
+	ppgtt = __set_ppgtt(ctx, ppgtt);
+	if (ppgtt) {
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
 			struct drm_i915_file_private *file_priv)
@@ -473,8 +501,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 			return ERR_CAST(ppgtt);
 		}
 
-		ctx->ppgtt = ppgtt;
-		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
+		__assign_ppgtt(ctx, ppgtt);
+		i915_ppgtt_put(ppgtt);
 	}
 
 	trace_i915_context_create(ctx);
@@ -655,19 +683,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+static int vm_idr_cleanup(int id, void *p, void *data)
+{
+	i915_ppgtt_put(p);
+	return 0;
+}
+
 int i915_gem_context_open(struct drm_i915_private *i915,
 			  struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 
+	mutex_init(&file_priv->vm_lock);
+
 	idr_init(&file_priv->context_idr);
+	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	ctx = i915_gem_create_context(i915, file_priv);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
+		idr_destroy(&file_priv->vm_idr);
 		return PTR_ERR(ctx);
 	}
 
@@ -684,6 +722,89 @@ void i915_gem_context_close(struct drm_file *file)
 
 	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
 	idr_destroy(&file_priv->context_idr);
+
+	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
+	idr_destroy(&file_priv->vm_idr);
+
+	mutex_destroy(&file_priv->vm_lock);
+}
+
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_vm_control *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+
+	if (!HAS_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	ppgtt = i915_ppgtt_create(i915, file_priv);
+	if (IS_ERR(ppgtt))
+		return PTR_ERR(ppgtt);
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		goto err_put;
+
+	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+	mutex_unlock(&file_priv->vm_lock);
+	if (err < 0)
+		goto err_put;
+
+	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
+	ppgtt->user_handle = err;
+	args->id = err;
+	return 0;
+
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_vm_control *args = data;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+	u32 id;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	id = args->id;
+	if (!id)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_remove(&file_priv->vm_idr, id);
+	if (ppgtt) {
+		GEM_BUG_ON(!ppgtt->user_handle);
+		ppgtt->user_handle = 0;
+	}
+
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	i915_ppgtt_put(ppgtt);
+	return 0;
 }
 
 static struct i915_request *
@@ -831,6 +952,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 	return 0;
 }
 
+static int get_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int ret;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	/* XXX rcu acquire? */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ppgtt = i915_ppgtt_get(ctx->ppgtt);
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	ret = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (ret)
+		goto err_put;
+
+	if (!ppgtt->user_handle) {
+		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+		GEM_BUG_ON(!ret);
+		if (ret < 0)
+			goto err_unlock;
+
+		ppgtt->user_handle = ret;
+		i915_ppgtt_get(ppgtt);
+	}
+
+	args->size = 0;
+	args->value = ppgtt->user_handle;
+
+	ret = 0;
+err_unlock:
+	mutex_unlock(&file_priv->vm_lock);
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return ret;
+}
+
+static void set_ppgtt_barrier(void *data)
+{
+	struct i915_hw_ppgtt *old = data;
+
+	i915_ppgtt_close(old);
+	i915_ppgtt_put(old);
+}
+
+static int set_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt, *old;
+	int err;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (upper_32_bits(args->value))
+		return -EINVAL;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_find(&file_priv->vm_idr, args->value);
+	if (ppgtt) {
+		GEM_BUG_ON(ppgtt->user_handle != args->value);
+		i915_ppgtt_get(ppgtt);
+	}
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (err)
+		goto out;
+
+	if (ppgtt == ctx->ppgtt)
+		goto unlock;
+
+	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
+	lut_close(ctx);
+
+	old = __set_ppgtt(ctx, ppgtt);
+
+	/*
+	 * We need to flush any requests using the current ppgtt before
+	 * we release it as the requests do not hold a reference themselves,
+	 * only indirectly through the context.
+	 */
+	err = context_barrier_task(ctx, -1, set_ppgtt_barrier, old);
+	if (err) {
+		ctx->ppgtt = old;
+		ctx->desc_template = default_desc_template(ctx->i915, old);
+
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+
+unlock:
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+out:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
 {
 	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
@@ -1009,6 +1244,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = get_sseu(ctx, args);
 		break;
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1306,9 +1544,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
 			ret = -EINVAL;
@@ -1364,9 +1599,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 					I915_USER_PRIORITY(priority);
 		}
 		break;
+
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = set_sseu(ctx, args);
 		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = set_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index df48013b581e..97cf9d3d07ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -384,6 +384,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev);
 
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
+
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 99022738cc89..d3f80dbf13ef 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2103,10 +2103,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
 	return ppgtt;
 }
 
-void i915_ppgtt_close(struct i915_address_space *vm)
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
 {
-	GEM_BUG_ON(vm->closed);
-	vm->closed = true;
+	GEM_BUG_ON(ppgtt->vm.closed);
+
+	ppgtt->open_count++;
+}
+
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
+{
+	GEM_BUG_ON(!ppgtt->open_count);
+	if (--ppgtt->open_count)
+		return;
+
+	GEM_BUG_ON(ppgtt->vm.closed);
+	ppgtt->vm.closed = true;
 }
 
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 03ade71b8d9a..bb750318f52a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 
 	unsigned long pd_dirty_rings;
+	unsigned int open_count;
+
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
 		struct i915_page_directory_pointer pdp;	/* GEN8+ */
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
+
+	u32 user_handle;
 };
 
 struct gen6_hw_ppgtt {
@@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
 void i915_ppgtt_release(struct kref *kref);
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
 					struct drm_i915_file_private *fpriv);
-void i915_ppgtt_close(struct i915_address_space *vm);
-static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
+
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
+
+static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
-	if (ppgtt)
-		kref_get(&ppgtt->ref);
+	kref_get(&ppgtt->ref);
+	return ppgtt;
 }
+
 static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index 4b9ded4ca0f5..c51aa09668cc 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
 	err = i915_subtests(tests, ppgtt);
 
 out_close:
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 4f7c04247354..9133afc03135 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
 	return 0;
 }
 
-static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
+static noinline int cpu_check(struct drm_i915_gem_object *obj,
+			      unsigned int idx, unsigned int max)
 {
 	unsigned int n, m, needs_flush;
 	int err;
@@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (m = 0; m < max; m++) {
 			if (map[m] != m) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], m);
+				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
+				       __builtin_return_address(0), idx,
+				       n, real_page_count(obj), m, max,
+				       map[m], m);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (; m < DW_PER_PAGE; m++) {
 			if (map[m] != STACK_MAGIC) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], STACK_MAGIC);
+				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
+				       __builtin_return_address(0), idx, n, m,
+				       map[m], STACK_MAGIC);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
 static int igt_ctx_exec(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct drm_i915_gem_object *obj = NULL;
-	unsigned long ncontexts, ndwords, dw;
-	struct igt_live_test t;
-	struct drm_file *file;
-	IGT_TIMEOUT(end_time);
-	LIST_HEAD(objects);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int err = -ENODEV;
 
 	/*
@@ -495,38 +495,42 @@ static int igt_ctx_exec(void *arg)
 	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return 0;
 
-	file = mock_file(i915);
-	if (IS_ERR(file))
-		return PTR_ERR(file);
+	for_each_engine(engine, i915, id) {
+		struct drm_i915_gem_object *obj = NULL;
+		unsigned long ncontexts, ndwords, dw;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-	mutex_lock(&i915->drm.struct_mutex);
+		if (!intel_engine_can_store_dword(engine))
+			continue;
 
-	err = igt_live_test_begin(&t, i915, __func__, "");
-	if (err)
-		goto out_unlock;
+		if (!engine->context_size)
+			continue; /* No logical context support in HW */
 
-	ncontexts = 0;
-	ndwords = 0;
-	dw = 0;
-	while (!time_after(jiffies, end_time)) {
-		struct intel_engine_cs *engine;
-		struct i915_gem_context *ctx;
-		unsigned int id;
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
 
-		ctx = i915_gem_create_context(i915, file->driver_priv);
-		if (IS_ERR(ctx)) {
-			err = PTR_ERR(ctx);
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
 			goto out_unlock;
-		}
 
-		for_each_engine(engine, i915, id) {
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			struct i915_gem_context *ctx;
 			intel_wakeref_t wakeref;
 
-			if (!engine->context_size)
-				continue; /* No logical context support in HW */
-
-			if (!intel_engine_can_store_dword(engine))
-				continue;
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
 
 			if (!obj) {
 				obj = create_test_object(ctx, file, &objects);
@@ -536,7 +540,6 @@ static int igt_ctx_exec(void *arg)
 				}
 			}
 
-			err = 0;
 			with_intel_runtime_pm(i915, wakeref)
 				err = gpu_fill(obj, ctx, engine, dw);
 			if (err) {
@@ -551,32 +554,158 @@ static int igt_ctx_exec(void *arg)
 				obj = NULL;
 				dw = 0;
 			}
+
 			ndwords++;
+			ncontexts++;
 		}
-		ncontexts++;
+
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
+
+out_unlock:
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
+
+		mock_file_free(i915, file);
+		if (err)
+			return err;
 	}
-	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
-		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
 
-	dw = 0;
-	list_for_each_entry(obj, &objects, st_link) {
-		unsigned int rem =
-			min_t(unsigned int, ndwords - dw, max_dwords(obj));
+	return 0;
+}
+
+static int igt_shared_ctx_exec(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENODEV;
+
+	/*
+	 * Create a few different contexts with the same mm and write
+	 * through each ctx using the GPU making sure those writes end
+	 * up in the expected pages of our obj.
+	 */
+
+	for_each_engine(engine, i915, id) {
+		unsigned long ncontexts, ndwords, dw;
+		struct drm_i915_gem_object *obj = NULL;
+		struct i915_gem_context *ctx = NULL;
+		struct i915_gem_context *parent;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-		err = cpu_check(obj, rem);
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
 		if (err)
-			break;
+			goto out_unlock;
 
-		dw += rem;
-	}
+		parent = i915_gem_create_context(i915, file->driver_priv);
+		if (IS_ERR(parent)) {
+			err = PTR_ERR(parent);
+			if (err == -ENODEV) /* no logical ctx support */
+				err = 0;
+			goto out_unlock;
+		}
+
+		if (!parent->ppgtt) {
+			err = 0;
+			goto out_unlock;
+		}
+
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			intel_wakeref_t wakeref;
+
+			if (ctx)
+				__destroy_hw_context(ctx, file->driver_priv);
+
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
+
+			__assign_ppgtt(ctx, parent->ppgtt);
+
+			if (!obj) {
+				obj = create_test_object(parent, file, &objects);
+				if (IS_ERR(obj)) {
+					err = PTR_ERR(obj);
+					goto out_unlock;
+				}
+			}
+
+			err = 0;
+			with_intel_runtime_pm(i915, wakeref)
+				err = gpu_fill(obj, ctx, engine, dw);
+			if (err) {
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				       ndwords, dw, max_dwords(obj),
+				       engine->name, ctx->hw_id,
+				       yesno(!!ctx->ppgtt), err);
+				goto out_unlock;
+			}
+
+			if (++dw == max_dwords(obj)) {
+				obj = NULL;
+				dw = 0;
+			}
+
+			ndwords++;
+			ncontexts++;
+		}
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
 
 out_unlock:
-	if (igt_live_test_end(&t))
-		err = -EIO;
-	mutex_unlock(&i915->drm.struct_mutex);
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
 
-	mock_file_free(i915, file);
-	return err;
+		mock_file_free(i915, file);
+		if (err)
+			return err;
+	}
+
+	return 0;
 }
 
 static struct i915_vma *rpcs_query_batch(struct i915_vma *vma)
@@ -1048,7 +1177,7 @@ static int igt_ctx_readonly(void *arg)
 	struct drm_i915_gem_object *obj = NULL;
 	struct i915_gem_context *ctx;
 	struct i915_hw_ppgtt *ppgtt;
-	unsigned long ndwords, dw;
+	unsigned long idx, ndwords, dw;
 	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
@@ -1129,6 +1258,7 @@ static int igt_ctx_readonly(void *arg)
 		ndwords, RUNTIME_INFO(i915)->num_engines);
 
 	dw = 0;
+	idx = 0;
 	list_for_each_entry(obj, &objects, st_link) {
 		unsigned int rem =
 			min_t(unsigned int, ndwords - dw, max_dwords(obj));
@@ -1138,7 +1268,7 @@ static int igt_ctx_readonly(void *arg)
 		if (i915_gem_object_is_readonly(obj))
 			num_writes = 0;
 
-		err = cpu_check(obj, num_writes);
+		err = cpu_check(obj, idx++, num_writes);
 		if (err)
 			break;
 
@@ -1723,6 +1853,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_ctx_exec),
 		SUBTEST(igt_ctx_readonly),
 		SUBTEST(igt_ctx_sseu),
+		SUBTEST(igt_shared_ctx_exec),
 		SUBTEST(igt_vm_isolation),
 	};
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 826fd51c331e..57b3d9867070 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 
 	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
 
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 out_unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 353b37b9f78e..5eddf9fcfe8a 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -55,13 +55,17 @@ mock_context(struct drm_i915_private *i915,
 		goto err_handles;
 
 	if (name) {
+		struct i915_hw_ppgtt *ppgtt;
+
 		ctx->name = kstrdup(name, GFP_KERNEL);
 		if (!ctx->name)
 			goto err_put;
 
-		ctx->ppgtt = mock_ppgtt(i915, name);
-		if (!ctx->ppgtt)
+		ppgtt = mock_ppgtt(i915, name);
+		if (!ppgtt)
 			goto err_put;
+
+		__set_ppgtt(ctx, ppgtt);
 	}
 
 	return ctx;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index f4fa0825722a..9fcfb54a13f2 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -341,6 +341,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_PERF_ADD_CONFIG	0x37
 #define DRM_I915_PERF_REMOVE_CONFIG	0x38
 #define DRM_I915_QUERY			0x39
+#define DRM_I915_GEM_VM_CREATE		0x3a
+#define DRM_I915_GEM_VM_DESTROY		0x3b
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -400,6 +402,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
 #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1449,6 +1453,26 @@ struct drm_i915_gem_context_destroy {
 	__u32 pad;
 };
 
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
 struct drm_i915_reg_read {
 	/*
 	 * Register offset.
@@ -1538,7 +1562,19 @@ struct drm_i915_gem_context_param {
  * On creation, all new contexts are marked as recoverable.
  */
 #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
+
+	/*
+	 * The id of the associated virtual memory address space (ppGTT) of
+	 * this context. Can be retrieved and passed to another context
+	 * (on the same fd) for both to use the same ppGTT and so share
+	 * address layouts, and avoid reloading the page tables on context
+	 * switches between themselves.
+	 *
+	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
+	 */
+#define I915_CONTEXT_PARAM_VM		0x9
 /* Must be kept compact -- no holes and well documented */
+
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (15 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 16:36   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
                   ` (23 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

It can be useful to have a single ioctl to create a context with all
the initial parameters instead of a series of create + setparam + setparam
ioctls. This extension to create context allows any of the parameters
to be passed in as a linked list to be applied to the newly constructed
context.

v2: Make a local copy of user setparam (Tvrtko)
v3: Use flags to detect availability of extension interface

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c         |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c | 447 +++++++++++++-----------
 include/uapi/drm/i915_drm.h             | 166 +++++----
 3 files changed, 339 insertions(+), 276 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 6b75d1b7b8bd..de8effed4381 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2997,7 +2997,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_SET_SPRITE_COLORKEY, intel_sprite_set_colorkey_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GET_SPRITE_COLORKEY, drm_noop, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8c35b6019f0d..f883d99653a3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -89,6 +89,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_trace.h"
+#include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
 #include "intel_workarounds.h"
 
@@ -1066,196 +1067,6 @@ static int set_ppgtt(struct i915_gem_context *ctx,
 	return err;
 }
 
-static bool client_is_banned(struct drm_i915_file_private *file_priv)
-{
-	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
-}
-
-int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
-				  struct drm_file *file)
-{
-	struct drm_i915_private *i915 = to_i915(dev);
-	struct drm_i915_gem_context_create *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (!DRIVER_CAPS(i915)->has_logical_contexts)
-		return -ENODEV;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	ret = i915_terminally_wedged(i915);
-	if (ret)
-		return ret;
-
-	if (client_is_banned(file_priv)) {
-		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
-			  current->comm,
-			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
-
-		return -EIO;
-	}
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
-	ctx = i915_gem_create_context(i915, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
-
-	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
-
-	args->ctx_id = ctx->user_handle;
-	DRM_DEBUG("HW context %d created\n", args->ctx_id);
-
-	return 0;
-}
-
-int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
-				   struct drm_file *file)
-{
-	struct drm_i915_gem_context_destroy *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
-		return -ENOENT;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		goto out;
-
-	__destroy_hw_context(ctx, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-
-out:
-	i915_gem_context_put(ctx);
-	return 0;
-}
-
-static int get_sseu(struct i915_gem_context *ctx,
-		    struct drm_i915_gem_context_param *args)
-{
-	struct drm_i915_gem_context_param_sseu user_sseu;
-	struct intel_engine_cs *engine;
-	struct intel_context *ce;
-	int ret;
-
-	if (args->size == 0)
-		goto out;
-	else if (args->size < sizeof(user_sseu))
-		return -EINVAL;
-
-	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
-			   sizeof(user_sseu)))
-		return -EFAULT;
-
-	if (user_sseu.flags || user_sseu.rsvd)
-		return -EINVAL;
-
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
-	if (!engine)
-		return -EINVAL;
-
-	/* Only use for mutex here is to serialize get_param and set_param. */
-	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
-	ce = to_intel_context(ctx, engine);
-
-	user_sseu.slice_mask = ce->sseu.slice_mask;
-	user_sseu.subslice_mask = ce->sseu.subslice_mask;
-	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
-	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
-
-	mutex_unlock(&ctx->i915->drm.struct_mutex);
-
-	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
-			 sizeof(user_sseu)))
-		return -EFAULT;
-
-out:
-	args->size = sizeof(user_sseu);
-
-	return 0;
-}
-
-int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
-{
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
-	int ret = 0;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
-	case I915_CONTEXT_PARAM_NO_ZEROMAP:
-		args->size = 0;
-		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-		break;
-	case I915_CONTEXT_PARAM_GTT_SIZE:
-		args->size = 0;
-
-		if (ctx->ppgtt)
-			args->value = ctx->ppgtt->vm.total;
-		else if (to_i915(dev)->mm.aliasing_ppgtt)
-			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
-		else
-			args->value = to_i915(dev)->ggtt.vm.total;
-		break;
-	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
-		args->size = 0;
-		args->value = i915_gem_context_no_error_capture(ctx);
-		break;
-	case I915_CONTEXT_PARAM_BANNABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_bannable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_RECOVERABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_recoverable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_PRIORITY:
-		args->size = 0;
-		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
-		break;
-	case I915_CONTEXT_PARAM_SSEU:
-		ret = get_sseu(ctx, args);
-		break;
-	case I915_CONTEXT_PARAM_VM:
-		ret = get_ppgtt(ctx, args);
-		break;
-	default:
-		ret = -EINVAL;
-		break;
-	}
-
-	i915_gem_context_put(ctx);
-	return ret;
-}
-
 static int gen8_emit_rpcs_config(struct i915_request *rq,
 				 struct intel_context *ce,
 				 struct intel_sseu sseu)
@@ -1531,18 +1342,11 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 }
 
-int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
+static int ctx_setparam(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
 {
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
 	int ret = 0;
 
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
 	switch (args->param) {
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
@@ -1552,6 +1356,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
 		break;
+
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1560,6 +1365,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_no_error_capture(ctx);
 		break;
+
 	case I915_CONTEXT_PARAM_BANNABLE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1586,7 +1392,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 			if (args->size)
 				ret = -EINVAL;
-			else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
+			else if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
 				ret = -ENODEV;
 			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
 				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
@@ -1614,6 +1420,247 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 
+	return ret;
+}
+
+static int create_setparam(struct i915_user_extension __user *ext, void *data)
+{
+	struct drm_i915_gem_context_create_ext_setparam local;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	if (local.setparam.ctx_id)
+		return -EINVAL;
+
+	return ctx_setparam(data, &local.setparam);
+}
+
+static const i915_user_extension_fn create_extensions[] = {
+	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+};
+
+static bool client_is_banned(struct drm_i915_file_private *file_priv)
+{
+	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
+}
+
+int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_context_create_ext *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
+		return -ENODEV;
+
+	if (args->flags & I915_CONTEXT_CREATE_FLAGS_UNKNOWN)
+		return -EINVAL;
+
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ret;
+
+	if (client_is_banned(file_priv)) {
+		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
+			  current->comm,
+			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
+
+		return -EIO;
+	}
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	ctx = i915_gem_create_context(i915, file_priv);
+	mutex_unlock(&dev->struct_mutex);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
+
+	if (args->flags & I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS) {
+		ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
+					   create_extensions,
+					   ARRAY_SIZE(create_extensions),
+					   ctx);
+		if (ret) {
+			idr_remove(&file_priv->context_idr, ctx->user_handle);
+			context_close(ctx);
+			return ret;
+		}
+	}
+
+	args->ctx_id = ctx->user_handle;
+	DRM_DEBUG("HW context %d created\n", args->ctx_id);
+
+	return 0;
+}
+
+int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file)
+{
+	struct drm_i915_gem_context_destroy *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (args->pad != 0)
+		return -EINVAL;
+
+	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
+		return -ENOENT;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		goto out;
+
+	__destroy_hw_context(ctx, file_priv);
+	mutex_unlock(&dev->struct_mutex);
+
+out:
+	i915_gem_context_put(ctx);
+	return 0;
+}
+
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+	int ret;
+
+	if (args->size == 0)
+		goto out;
+	else if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.flags || user_sseu.rsvd)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.engine_class,
+					  user_sseu.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	/* Only use for mutex here is to serialize get_param and set_param. */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ce = to_intel_context(ctx, engine);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+out:
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
+int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret = 0;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	switch (args->param) {
+	case I915_CONTEXT_PARAM_NO_ZEROMAP:
+		args->size = 0;
+		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
+		break;
+
+	case I915_CONTEXT_PARAM_GTT_SIZE:
+		args->size = 0;
+		if (ctx->ppgtt)
+			args->value = ctx->ppgtt->vm.total;
+		else if (to_i915(dev)->mm.aliasing_ppgtt)
+			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
+		else
+			args->value = to_i915(dev)->ggtt.vm.total;
+		break;
+
+	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->size = 0;
+		args->value = i915_gem_context_no_error_capture(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_BANNABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_bannable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_recoverable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->size = 0;
+		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
+		break;
+
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	i915_gem_context_put(ctx);
+	return ret;
+}
+
+int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = ctx_setparam(ctx, args);
+
 	i915_gem_context_put(ctx);
 	return ret;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9fcfb54a13f2..eec635fb2e1c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -392,6 +392,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
 #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
@@ -1443,85 +1444,17 @@ struct drm_i915_gem_wait {
 };
 
 struct drm_i915_gem_context_create {
-	/*  output: id of new context*/
-	__u32 ctx_id;
-	__u32 pad;
-};
-
-struct drm_i915_gem_context_destroy {
-	__u32 ctx_id;
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 pad;
 };
 
-/*
- * DRM_I915_GEM_VM_CREATE -
- *
- * Create a new virtual memory address space (ppGTT) for use within a context
- * on the same file. Extensions can be provided to configure exactly how the
- * address space is setup upon creation.
- *
- * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
- * returned.
- *
- * DRM_I915_GEM_VM_DESTROY -
- *
- * Destroys a previously created VM id.
- */
-struct drm_i915_gem_vm_control {
-	__u64 extensions;
+struct drm_i915_gem_context_create_ext {
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
-	__u32 id;
-};
-
-struct drm_i915_reg_read {
-	/*
-	 * Register offset.
-	 * For 64bit wide registers where the upper 32bits don't immediately
-	 * follow the lower 32bits, the offset of the lower 32bits must
-	 * be specified
-	 */
-	__u64 offset;
-#define I915_REG_READ_8B_WA (1ul << 0)
-
-	__u64 val; /* Return value */
-};
-/* Known registers:
- *
- * Render engine timestamp - 0x2358 + 64bit - gen7+
- * - Note this register returns an invalid value if using the default
- *   single instruction 8byte read, in order to workaround that pass
- *   flag I915_REG_READ_8B_WA in offset field.
- *
- */
-
-struct drm_i915_reset_stats {
-	__u32 ctx_id;
-	__u32 flags;
-
-	/* All resets since boot/module reload, for all contexts */
-	__u32 reset_count;
-
-	/* Number of batches lost when active in GPU, for this context */
-	__u32 batch_active;
-
-	/* Number of batches lost pending for execution, for this context */
-	__u32 batch_pending;
-
-	__u32 pad;
-};
-
-struct drm_i915_gem_userptr {
-	__u64 user_ptr;
-	__u64 user_size;
-	__u32 flags;
-#define I915_USERPTR_READ_ONLY 0x1
-#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
-	/**
-	 * Returned handle for the object.
-	 *
-	 * Object handles are nonzero.
-	 */
-	__u32 handle;
+#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
+#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
+	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
+	__u64 extensions;
 };
 
 struct drm_i915_gem_context_param {
@@ -1637,6 +1570,89 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct drm_i915_gem_context_create_ext_setparam {
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	struct i915_user_extension base;
+	struct drm_i915_gem_context_param setparam;
+};
+
+struct drm_i915_gem_context_destroy {
+	__u32 ctx_id;
+	__u32 pad;
+};
+
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
+struct drm_i915_reg_read {
+	/*
+	 * Register offset.
+	 * For 64bit wide registers where the upper 32bits don't immediately
+	 * follow the lower 32bits, the offset of the lower 32bits must
+	 * be specified
+	 */
+	__u64 offset;
+#define I915_REG_READ_8B_WA (1ul << 0)
+
+	__u64 val; /* Return value */
+};
+
+/* Known registers:
+ *
+ * Render engine timestamp - 0x2358 + 64bit - gen7+
+ * - Note this register returns an invalid value if using the default
+ *   single instruction 8byte read, in order to workaround that pass
+ *   flag I915_REG_READ_8B_WA in offset field.
+ *
+ */
+
+struct drm_i915_reset_stats {
+	__u32 ctx_id;
+	__u32 flags;
+
+	/* All resets since boot/module reload, for all contexts */
+	__u32 reset_count;
+
+	/* Number of batches lost when active in GPU, for this context */
+	__u32 batch_active;
+
+	/* Number of batches lost pending for execution, for this context */
+	__u32 batch_pending;
+
+	__u32 pad;
+};
+
+struct drm_i915_gem_userptr {
+	__u64 user_ptr;
+	__u64 user_size;
+	__u32 flags;
+#define I915_USERPTR_READ_ONLY 0x1
+#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (16 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 15:54   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 20/38] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
                   ` (22 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 32 ++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.h       |  3 ++
 drivers/gpu/drm/i915/i915_request.c           | 10 ++++-
 drivers/gpu/drm/i915/i915_request.h           |  5 ++-
 drivers/gpu/drm/i915/i915_sw_fence.c          | 39 ++++++++++++++++---
 drivers/gpu/drm/i915/i915_sw_fence.h          | 13 ++++++-
 drivers/gpu/drm/i915/intel_lrc.c              |  5 ++-
 .../gpu/drm/i915/selftests/i915_gem_context.c | 19 +++++----
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
 include/uapi/drm/i915_drm.h                   |  3 +-
 10 files changed, 104 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f883d99653a3..d8e2228636ba 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -239,6 +239,9 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 			ce->ops->destroy(ce);
 	}
 
+	if (ctx->timeline)
+		i915_timeline_put(ctx->timeline);
+
 	kfree(ctx->name);
 	put_pid(ctx->pid);
 
@@ -478,12 +481,17 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
 
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
-			struct drm_i915_file_private *file_priv)
+			struct drm_i915_file_private *file_priv,
+			unsigned int flags)
 {
 	struct i915_gem_context *ctx;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
+	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE &&
+	    !HAS_EXECLISTS(dev_priv))
+		return ERR_PTR(-EINVAL);
+
 	/* Reap the most stale context */
 	contexts_free_first(dev_priv);
 
@@ -506,6 +514,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 		i915_ppgtt_put(ppgtt);
 	}
 
+	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) {
+		struct i915_timeline *timeline;
+
+		timeline = i915_timeline_create(dev_priv, ctx->name, NULL);
+		if (IS_ERR(timeline)) {
+			__destroy_hw_context(ctx, file_priv);
+			return ERR_CAST(timeline);
+		}
+
+		ctx->timeline = timeline;
+	}
+
 	trace_i915_context_create(ctx);
 
 	return ctx;
@@ -534,7 +554,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	if (ret)
 		return ERR_PTR(ret);
 
-	ctx = i915_gem_create_context(to_i915(dev), NULL);
+	ctx = i915_gem_create_context(to_i915(dev), NULL, 0);
 	if (IS_ERR(ctx))
 		goto out;
 
@@ -570,7 +590,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 	struct i915_gem_context *ctx;
 	int err;
 
-	ctx = i915_gem_create_context(i915, NULL);
+	ctx = i915_gem_create_context(i915, NULL, 0);
 	if (IS_ERR(ctx))
 		return ctx;
 
@@ -702,7 +722,7 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	ctx = i915_gem_create_context(i915, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv, 0);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
@@ -818,7 +838,7 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	rq = i915_active_request_raw(&timeline->last_request,
 				     &engine->i915->drm.struct_mutex);
-	if (rq && rq->engine == engine) {
+	if (rq && rq->engine->mask & engine->mask) {
 		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
 			  timeline->name, engine->name,
 			  rq->fence.context, rq->fence.seqno);
@@ -1476,7 +1496,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(i915, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv, args->flags);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 97cf9d3d07ae..e1f270c098f0 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -43,6 +43,7 @@ struct drm_i915_private;
 struct drm_i915_file_private;
 struct i915_hw_ppgtt;
 struct i915_request;
+struct i915_timeline;
 struct i915_vma;
 struct intel_ring;
 
@@ -78,6 +79,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct i915_timeline *timeline;
+
 	/**
 	 * @ppgtt: unique address space (GTT)
 	 *
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9d111eedad5a..e0807a61dcf4 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1045,8 +1045,14 @@ void i915_request_add(struct i915_request *request)
 	prev = i915_active_request_raw(&timeline->last_request,
 				       &request->i915->drm.struct_mutex);
 	if (prev && !i915_request_completed(prev)) {
-		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
-					     &request->submitq);
+		if (is_power_of_2(prev->engine->mask | engine->mask))
+			i915_sw_fence_await_sw_fence(&request->submit,
+						     &prev->submit,
+						     &request->submitq);
+		else
+			__i915_sw_fence_await_dma_fence(&request->submit,
+							&prev->fence,
+							&request->dmaq);
 		if (engine->schedule)
 			__i915_sched_node_add_dependency(&request->sched,
 							 &prev->sched,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 3e0d62d35226..d4113074a8f6 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -128,7 +128,10 @@ struct i915_request {
 	 * It is used by the driver to then queue the request for execution.
 	 */
 	struct i915_sw_fence submit;
-	wait_queue_entry_t submitq;
+	union {
+		wait_queue_entry_t submitq;
+		struct i915_sw_dma_fence_cb dmaq;
+	};
 	struct list_head execute_cb;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 8d1400d378d7..5387aafd3424 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -359,11 +359,6 @@ int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 	return __i915_sw_fence_await_sw_fence(fence, signaler, NULL, gfp);
 }
 
-struct i915_sw_dma_fence_cb {
-	struct dma_fence_cb base;
-	struct i915_sw_fence *fence;
-};
-
 struct i915_sw_dma_fence_cb_timer {
 	struct i915_sw_dma_fence_cb base;
 	struct dma_fence *dma;
@@ -480,6 +475,40 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	return ret;
 }
 
+static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
+				     struct dma_fence_cb *data)
+{
+	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
+
+	i915_sw_fence_complete(cb->fence);
+}
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb)
+{
+	int ret;
+
+	debug_fence_assert(fence);
+
+	if (dma_fence_is_signaled(dma))
+		return 0;
+
+	cb->fence = fence;
+	i915_sw_fence_await(fence);
+
+	ret = dma_fence_add_callback(dma, &cb->base, __dma_i915_sw_fence_wake);
+	if (ret == 0) {
+		ret = 1;
+	} else {
+		i915_sw_fence_complete(fence);
+		if (ret == -ENOENT) /* fence already signaled */
+			ret = 0;
+	}
+
+	return ret;
+}
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 6dec9e1d1102..9cb5c3b307a6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -9,14 +9,13 @@
 #ifndef _I915_SW_FENCE_H_
 #define _I915_SW_FENCE_H_
 
+#include <linux/dma-fence.h>
 #include <linux/gfp.h>
 #include <linux/kref.h>
 #include <linux/notifier.h> /* for NOTIFY_DONE */
 #include <linux/wait.h>
 
 struct completion;
-struct dma_fence;
-struct dma_fence_ops;
 struct reservation_object;
 
 struct i915_sw_fence {
@@ -68,10 +67,20 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 				     struct i915_sw_fence *after,
 				     gfp_t gfp);
+
+struct i915_sw_dma_fence_cb {
+	struct dma_fence_cb base;
+	struct i915_sw_fence *fence;
+};
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb);
 int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  struct dma_fence *dma,
 				  unsigned long timeout,
 				  gfp_t gfp);
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 50a7c87705c8..d50a33c578c5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2853,7 +2853,10 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_deref_obj;
 	}
 
-	timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
+	if (ctx->timeline)
+		timeline = i915_timeline_get(ctx->timeline);
+	else
+		timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
 	if (IS_ERR(timeline)) {
 		ret = PTR_ERR(timeline);
 		goto error_deref_obj;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 9133afc03135..15f016ca8e0d 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
 	}
 
 	for (n = 0; n < nctx; n++) {
-		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
+		ctx[n] = i915_gem_create_context(i915, file->driver_priv, 0);
 		if (IS_ERR(ctx[n])) {
 			err = PTR_ERR(ctx[n]);
 			goto out_unlock;
@@ -526,7 +526,8 @@ static int igt_ctx_exec(void *arg)
 			struct i915_gem_context *ctx;
 			intel_wakeref_t wakeref;
 
-			ctx = i915_gem_create_context(i915, file->driver_priv);
+			ctx = i915_gem_create_context(i915,
+						      file->driver_priv, 0);
 			if (IS_ERR(ctx)) {
 				err = PTR_ERR(ctx);
 				goto out_unlock;
@@ -623,7 +624,8 @@ static int igt_shared_ctx_exec(void *arg)
 		if (err)
 			goto out_unlock;
 
-		parent = i915_gem_create_context(i915, file->driver_priv);
+		parent = i915_gem_create_context(i915,
+						 file->driver_priv, 0);
 		if (IS_ERR(parent)) {
 			err = PTR_ERR(parent);
 			if (err == -ENODEV) /* no logical ctx support */
@@ -645,7 +647,8 @@ static int igt_shared_ctx_exec(void *arg)
 			if (ctx)
 				__destroy_hw_context(ctx, file->driver_priv);
 
-			ctx = i915_gem_create_context(i915, file->driver_priv);
+			ctx = i915_gem_create_context(i915,
+						      file->driver_priv, 0);
 			if (IS_ERR(ctx)) {
 				err = PTR_ERR(ctx);
 				goto out_unlock;
@@ -1091,7 +1094,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1201,7 +1204,7 @@ static int igt_ctx_readonly(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1527,13 +1530,13 @@ static int igt_vm_isolation(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx_a = i915_gem_create_context(i915, file->driver_priv);
+	ctx_a = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx_a)) {
 		err = PTR_ERR(ctx_a);
 		goto out_unlock;
 	}
 
-	ctx_b = i915_gem_create_context(i915, file->driver_priv);
+	ctx_b = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx_b)) {
 		err = PTR_ERR(ctx_b);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 5eddf9fcfe8a..5d0ff2293abc 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -95,7 +95,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
 {
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	return i915_gem_create_context(i915, file->driver_priv);
+	return i915_gem_create_context(i915, file->driver_priv, 0);
 }
 
 struct i915_gem_context *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index eec635fb2e1c..451d2f36830b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1452,8 +1452,9 @@ struct drm_i915_gem_context_create_ext {
 	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
+#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
-	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
+	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
 	__u64 extensions;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 20/38] drm/i915: Allow userspace to clone contexts on creation
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (17 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 21/38] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
                   ` (21 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

A usecase arose out of handling context recovery in mesa, whereby they
wish to recreate a context with fresh logical state but preserving all
other details of the original. Currently, they create a new context and
iterate over which bits they want to copy across, but it would much more
convenient if they were able to just pass in a target context to clone
during creation. This essentially extends the setparam during creation
to pull the details from a target context instead of the user supplied
parameters.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 71 +++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h             | 14 +++++
 2 files changed, 85 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d8e2228636ba..48fb2ffb5f8c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1456,8 +1456,79 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
 	return ctx_setparam(data, &local.setparam);
 }
 
+static void clone_sseu(struct i915_gem_context *dst,
+		       struct i915_gem_context *src)
+{
+	const struct intel_sseu default_sseu =
+		intel_device_default_sseu(dst->i915);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, dst->i915, id) {
+		struct intel_context *ce;
+		struct intel_sseu sseu;
+
+		ce = to_intel_context(src, engine);
+		if (!ce)
+			continue;
+
+		sseu = ce->sseu;
+		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
+			continue;
+
+		ce = to_intel_context(dst, engine);
+		ce->sseu = sseu;
+	}
+}
+
+static int create_clone(struct i915_user_extension __user *ext, void *data)
+{
+	struct drm_i915_gem_context_create_ext_clone local;
+	struct i915_gem_context *dst = data;
+	struct i915_gem_context *src;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	if (local.flags & I915_CONTEXT_CLONE_UNKNOWN)
+		return -EINVAL;
+
+	if (local.rsvd)
+		return -EINVAL;
+
+	rcu_read_lock();
+	src = __i915_gem_context_lookup_rcu(dst->file_priv, local.clone);
+	rcu_read_unlock();
+	if (!src)
+		return -ENOENT;
+
+	if (local.flags & I915_CONTEXT_CLONE_FLAGS)
+		dst->user_flags = src->user_flags;
+
+	if (local.flags & I915_CONTEXT_CLONE_SCHED)
+		dst->sched = src->sched;
+
+	if (local.flags & I915_CONTEXT_CLONE_SSEU)
+		clone_sseu(dst, src);
+
+	if (local.flags & I915_CONTEXT_CLONE_TIMELINE && src->timeline) {
+		if (dst->timeline)
+			i915_timeline_put(dst->timeline);
+		dst->timeline = i915_timeline_get(src->timeline);
+	}
+
+	if (local.flags & I915_CONTEXT_CLONE_VM && src->ppgtt) {
+		if (dst->ppgtt)
+			i915_ppgtt_put(dst->ppgtt);
+		dst->ppgtt = i915_ppgtt_get(src->ppgtt);
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+	[I915_CONTEXT_CREATE_EXT_CLONE] = create_clone,
 };
 
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 451d2f36830b..60cbb2e4f140 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1577,6 +1577,20 @@ struct drm_i915_gem_context_create_ext_setparam {
 	struct drm_i915_gem_context_param setparam;
 };
 
+struct drm_i915_gem_context_create_ext_clone {
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
+	struct i915_user_extension base;
+	__u32 clone;
+	__u32 flags;
+#define I915_CONTEXT_CLONE_FLAGS	(1u << 0)
+#define I915_CONTEXT_CLONE_SCHED	(1u << 1)
+#define I915_CONTEXT_CLONE_SSEU		(1u << 2)
+#define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
+#define I915_CONTEXT_CLONE_VM		(1u << 4)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+	__u64 rsvd;
+};
+
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 21/38] drm/i915: Fix I915_EXEC_RING_MASK
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (18 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 20/38] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:29   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 22/38] drm/i915: Remove last traces of exec-id (GEM_BUSY) Chris Wilson
                   ` (20 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: tvrtko.ursulin, Chris Wilson, Tvrtko Ursulin, stable

This was supposed to be a mask of all known rings, but it is being used
by execbuffer to filter out invalid rings, and so is instead mapping high
unused values onto valid rings. Instead of a mask of all known rings,
we need it to be the mask of all possible rings.

Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring")
Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
---
 include/uapi/drm/i915_drm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 60cbb2e4f140..4e59cc87527b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1004,7 +1004,7 @@ struct drm_i915_gem_execbuffer2 {
 	 * struct drm_i915_gem_exec_fence *fences.
 	 */
 	__u64 cliprects_ptr;
-#define I915_EXEC_RING_MASK              (7<<0)
+#define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
 #define I915_EXEC_BSD                    (2<<0)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 22/38] drm/i915: Remove last traces of exec-id (GEM_BUSY)
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (19 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 21/38] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
                   ` (19 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

As we allow per-context engine allows the legacy concept of
I915_EXEC_RING no longer applies universally. We are still exposing the
unrelated exec-id in GEM_BUSY, so transition this ioctl (once more
slightly changing its ABI, but no one cares) over to only reporting the
uabi-class (not instance as we can not foreseeably fit those into the
small bitmask).

The only user of the extended ring information from GEM_BUSY is ddx/sna,
which tries to use the non-rcs business information to guide which
engine to use for subsequent operations on foreign bo. All that matters
for it is the decision between rcs and !rcs, so it is unaffected by the
change in higher bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 32 +++++++++++++------------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 10 --------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 include/uapi/drm/i915_drm.h             | 32 +++++++++++++------------
 4 files changed, 34 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3204f79d7aa6..90bb951b8c99 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3810,20 +3810,17 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 
 static __always_inline unsigned int __busy_read_flag(unsigned int id)
 {
-	/* Note that we could alias engines in the execbuf API, but
-	 * that would be very unwise as it prevents userspace from
-	 * fine control over engine selection. Ahem.
-	 *
-	 * This should be something like EXEC_MAX_ENGINE instead of
-	 * I915_NUM_ENGINES.
-	 */
-	BUILD_BUG_ON(I915_NUM_ENGINES > 16);
+	if (id == I915_ENGINE_CLASS_INVALID)
+		return 0xffff0000;
+
+	GEM_BUG_ON(id >= 16);
 	return 0x10000 << id;
 }
 
 static __always_inline unsigned int __busy_write_id(unsigned int id)
 {
-	/* The uABI guarantees an active writer is also amongst the read
+	/*
+	 * The uABI guarantees an active writer is also amongst the read
 	 * engines. This would be true if we accessed the activity tracking
 	 * under the lock, but as we perform the lookup of the object and
 	 * its activity locklessly we can not guarantee that the last_write
@@ -3831,16 +3828,20 @@ static __always_inline unsigned int __busy_write_id(unsigned int id)
 	 * last_read - hence we always set both read and write busy for
 	 * last_write.
 	 */
-	return id | __busy_read_flag(id);
+	if (id == I915_ENGINE_CLASS_INVALID)
+		return 0xffffffff;
+
+	return (id + 1) | __busy_read_flag(id);
 }
 
 static __always_inline unsigned int
 __busy_set_if_active(const struct dma_fence *fence,
 		     unsigned int (*flag)(unsigned int id))
 {
-	struct i915_request *rq;
+	const struct i915_request *rq;
 
-	/* We have to check the current hw status of the fence as the uABI
+	/*
+	 * We have to check the current hw status of the fence as the uABI
 	 * guarantees forward progress. We could rely on the idle worker
 	 * to eventually flush us, but to minimise latency just ask the
 	 * hardware.
@@ -3851,11 +3852,11 @@ __busy_set_if_active(const struct dma_fence *fence,
 		return 0;
 
 	/* opencode to_request() in order to avoid const warnings */
-	rq = container_of(fence, struct i915_request, fence);
+	rq = container_of(fence, const struct i915_request, fence);
 	if (i915_request_completed(rq))
 		return 0;
 
-	return flag(rq->engine->uabi_id);
+	return flag(rq->engine->uabi_class);
 }
 
 static __always_inline unsigned int
@@ -3889,7 +3890,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	if (!obj)
 		goto out;
 
-	/* A discrepancy here is that we do not report the status of
+	/*
+	 * A discrepancy here is that we do not report the status of
 	 * non-i915 fences, i.e. even though we may report the object as idle,
 	 * a call to set-domain may still stall waiting for foreign rendering.
 	 * This also means that wait-ioctl may report an object as busy,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index c7839c61bb8a..2559dcc25a6a 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -84,7 +84,6 @@ static const struct engine_class_info intel_engine_classes[] = {
 #define MAX_MMIO_BASES 3
 struct engine_info {
 	unsigned int hw_id;
-	unsigned int uabi_id;
 	u8 class;
 	u8 instance;
 	/* mmio bases table *must* be sorted in reverse gen order */
@@ -97,7 +96,6 @@ struct engine_info {
 static const struct engine_info intel_engines[] = {
 	[RCS] = {
 		.hw_id = RCS_HW,
-		.uabi_id = I915_EXEC_RENDER,
 		.class = RENDER_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -106,7 +104,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[BCS] = {
 		.hw_id = BCS_HW,
-		.uabi_id = I915_EXEC_BLT,
 		.class = COPY_ENGINE_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -115,7 +112,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS] = {
 		.hw_id = VCS_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -126,7 +122,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS2] = {
 		.hw_id = VCS2_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 1,
 		.mmio_bases = {
@@ -136,7 +131,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS3] = {
 		.hw_id = VCS3_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 2,
 		.mmio_bases = {
@@ -145,7 +139,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS4] = {
 		.hw_id = VCS4_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 3,
 		.mmio_bases = {
@@ -154,7 +147,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VECS] = {
 		.hw_id = VECS_HW,
-		.uabi_id = I915_EXEC_VEBOX,
 		.class = VIDEO_ENHANCEMENT_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -164,7 +156,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VECS2] = {
 		.hw_id = VECS2_HW,
-		.uabi_id = I915_EXEC_VEBOX,
 		.class = VIDEO_ENHANCEMENT_CLASS,
 		.instance = 1,
 		.mmio_bases = {
@@ -324,7 +315,6 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	engine->class = info->class;
 	engine->instance = info->instance;
 
-	engine->uabi_id = info->uabi_id;
 	engine->uabi_class = intel_engine_classes[info->class].uabi_class;
 
 	engine->context_size = __intel_engine_context_size(dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 22a345bea13c..13819de57ef9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -336,7 +336,6 @@ struct intel_engine_cs {
 	unsigned int guc_id;
 	intel_engine_mask_t mask;
 
-	u8 uabi_id;
 	u8 uabi_class;
 
 	u8 class;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4e59cc87527b..50d154954d5f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1152,32 +1152,34 @@ struct drm_i915_gem_busy {
 	 * as busy may become idle before the ioctl is completed.
 	 *
 	 * Furthermore, if the object is busy, which engine is busy is only
-	 * provided as a guide. There are race conditions which prevent the
-	 * report of which engines are busy from being always accurate.
-	 * However, the converse is not true. If the object is idle, the
-	 * result of the ioctl, that all engines are idle, is accurate.
+	 * provided as a guide and only indirectly by reporting its class
+	 * (there may be more than one engine in each class). There are race
+	 * conditions which prevent the report of which engines are busy from
+	 * being always accurate.  However, the converse is not true. If the
+	 * object is idle, the result of the ioctl, that all engines are idle,
+	 * is accurate.
 	 *
 	 * The returned dword is split into two fields to indicate both
-	 * the engines on which the object is being read, and the
-	 * engine on which it is currently being written (if any).
+	 * the engine classess on which the object is being read, and the
+	 * engine class on which it is currently being written (if any).
 	 *
 	 * The low word (bits 0:15) indicate if the object is being written
 	 * to by any engine (there can only be one, as the GEM implicit
 	 * synchronisation rules force writes to be serialised). Only the
-	 * engine for the last write is reported.
+	 * engine class (offset by 1, I915_ENGINE_CLASS_RENDER is reported as
+	 * 1 not 0 etc) for the last write is reported.
 	 *
-	 * The high word (bits 16:31) are a bitmask of which engines are
-	 * currently reading from the object. Multiple engines may be
+	 * The high word (bits 16:31) are a bitmask of which engines classes
+	 * are currently reading from the object. Multiple engines may be
 	 * reading from the object simultaneously.
 	 *
-	 * The value of each engine is the same as specified in the
-	 * EXECBUFFER2 ioctl, i.e. I915_EXEC_RENDER, I915_EXEC_BSD etc.
-	 * Note I915_EXEC_DEFAULT is a symbolic value and is mapped to
-	 * the I915_EXEC_RENDER engine for execution, and so it is never
+	 * The value of each engine class is the same as specified in the
+	 * I915_CONTEXT_SET_ENGINES parameter and via perf, i.e.
+	 * I915_ENGINE_CLASS_RENDER, I915_ENGINE_CLASS_COPY, etc.
 	 * reported as active itself. Some hardware may have parallel
 	 * execution engines, e.g. multiple media engines, which are
-	 * mapped to the same identifier in the EXECBUFFER2 ioctl and
-	 * so are not separately reported for busyness.
+	 * mapped to the same class identifier and so are not separately
+	 * reported for busyness.
 	 *
 	 * Caveat emptor:
 	 * Only the boolean result of this query is reliable; that is whether
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (20 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 22/38] drm/i915: Remove last traces of exec-id (GEM_BUSY) Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 15:33   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 24/38] drm/i915: Allow a context to define its set of engines Chris Wilson
                   ` (18 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Needed for a following patch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 07c0af316f86..53d0d70c97fa 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2312,10 +2312,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (args->flags & I915_EXEC_IS_PINNED)
 		eb.batch_flags |= I915_DISPATCH_PINNED;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
-	if (!eb.engine)
-		return -EINVAL;
-
 	if (args->flags & I915_EXEC_FENCE_IN) {
 		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
 		if (!in_fence)
@@ -2340,6 +2336,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
+	eb.engine = eb_select_engine(eb.i915, file, args);
+	if (!eb.engine) {
+		err = -EINVAL;
+		goto err_engine;
+	}
+
 	/*
 	 * Take a local wakeref for preparing to dispatch the execbuf as
 	 * we expect to access the hardware fairly frequently in the
@@ -2505,6 +2507,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	mutex_unlock(&dev->struct_mutex);
 err_rpm:
 	intel_runtime_pm_put(eb.i915, wakeref);
+err_engine:
 	i915_gem_context_put(eb.ctx);
 err_destroy:
 	eb_destroy(&eb);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 24/38] drm/i915: Allow a context to define its set of engines
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (21 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 25/38] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
                   ` (17 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
execbuf. It's use will be revealed in the next patch.

v2: Fixup freeing of local on success of get_engines()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 190 ++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h    |   4 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  22 ++-
 include/uapi/drm/i915_drm.h                |  34 +++-
 4 files changed, 237 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 48fb2ffb5f8c..85067a3dc72c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -99,6 +99,21 @@ static struct i915_global_context {
 	struct kmem_cache *slab_luts;
 } global;
 
+static struct intel_engine_cs *
+lookup_user_engine(struct i915_gem_context *ctx,
+		   unsigned long flags, u16 class, u16 instance)
+#define LOOKUP_USER_INDEX BIT(0)
+{
+	if (flags & LOOKUP_USER_INDEX) {
+		if (instance >= ctx->nengine)
+			return NULL;
+
+		return ctx->engines[instance];
+	}
+
+	return intel_engine_lookup_user(ctx->i915, class, instance);
+}
+
 struct i915_lut_handle *i915_lut_handle_alloc(void)
 {
 	return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
@@ -232,6 +247,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
+	kfree(ctx->engines);
+
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
 		struct intel_context *ce = &ctx->__engine[n];
 
@@ -1339,9 +1356,9 @@ static int set_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1359,9 +1376,154 @@ static int set_sseu(struct i915_gem_context *ctx,
 
 	args->size = sizeof(user_sseu);
 
+	return 0;
+};
+
+struct set_engines {
+	struct i915_gem_context *ctx;
+	struct intel_engine_cs **engines;
+	unsigned int nengine;
+};
+
+static const i915_user_extension_fn set_engines__extensions[] = {
+};
+
+static int
+set_engines(struct i915_gem_context *ctx,
+	    const struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines __user *user;
+	struct set_engines set = { .ctx = ctx };
+	u64 size, extensions;
+	unsigned int n;
+	int err;
+
+	user = u64_to_user_ptr(args->value);
+	size = args->size;
+	if (!size)
+		goto out;
+
+	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
+	if (size < sizeof(*user) || size % sizeof(*user->class_instance))
+		return -EINVAL;
+
+	set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
+	if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK)
+		return -EINVAL;
+
+	set.engines = kmalloc_array(set.nengine,
+				    sizeof(*set.engines),
+				    GFP_KERNEL);
+	if (!set.engines)
+		return -ENOMEM;
+
+	for (n = 0; n < set.nengine; n++) {
+		u16 class, inst;
+
+		if (get_user(class, &user->class_instance[n].engine_class) ||
+		    get_user(inst, &user->class_instance[n].engine_instance)) {
+			kfree(set.engines);
+			return -EFAULT;
+		}
+
+		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
+		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
+			set.engines[n] = NULL;
+			continue;
+		}
+
+		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
+		if (!set.engines[n]) {
+			kfree(set.engines);
+			return -ENOENT;
+		}
+	}
+
+	err = -EFAULT;
+	if (!get_user(extensions, &user->extensions))
+		err = i915_user_extensions(u64_to_user_ptr(extensions),
+					   set_engines__extensions,
+					   ARRAY_SIZE(set_engines__extensions),
+					   &set);
+	if (err) {
+		kfree(set.engines);
+		return err;
+	}
+
+out:
+	mutex_lock(&ctx->i915->drm.struct_mutex);
+	kfree(ctx->engines);
+	ctx->engines = set.engines;
+	ctx->nengine = set.nengine;
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
 	return 0;
 }
 
+static int
+get_engines(struct i915_gem_context *ctx,
+	    struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines *local;
+	unsigned int n, count, size;
+	int err = 0;
+
+restart:
+	count = READ_ONCE(ctx->nengine);
+	if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
+		return -ENOMEM; /* unrepresentable! */
+
+	size = sizeof(*local) + count * sizeof(*local->class_instance);
+	if (!args->size) {
+		args->size = size;
+		return 0;
+	}
+	if (args->size < size)
+		return -EINVAL;
+
+	local = kmalloc(size, GFP_KERNEL);
+	if (!local)
+		return -ENOMEM;
+
+	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
+		err = -EINTR;
+		goto out;
+	}
+
+	if (READ_ONCE(ctx->nengine) != count) {
+		mutex_unlock(&ctx->i915->drm.struct_mutex);
+		kfree(local);
+		goto restart;
+	}
+
+	local->extensions = 0;
+	for (n = 0; n < count; n++) {
+		if (ctx->engines[n]) {
+			local->class_instance[n].engine_class =
+				ctx->engines[n]->uabi_class;
+			local->class_instance[n].engine_instance =
+				ctx->engines[n]->instance;
+		} else {
+			local->class_instance[n].engine_class =
+				I915_ENGINE_CLASS_INVALID;
+			local->class_instance[n].engine_instance =
+				I915_ENGINE_CLASS_INVALID_NONE;
+		}
+	}
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
+		err = -EFAULT;
+		goto out;
+	}
+	args->size = size;
+
+out:
+	kfree(local);
+	return err;
+}
+
 static int ctx_setparam(struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
 {
@@ -1434,6 +1596,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
 		ret = set_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = set_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@@ -1523,6 +1689,14 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 		dst->ppgtt = i915_ppgtt_get(src->ppgtt);
 	}
 
+	if (local.flags & I915_CONTEXT_CLONE_ENGINES && src->nengine) {
+		dst->engines = kmemdup(src->engines,
+				       sizeof(*src->engines) * src->nengine,
+				       GFP_KERNEL);
+		if (!dst->engines)
+			return -ENOMEM;
+	}
+
 	return 0;
 }
 
@@ -1642,9 +1816,9 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1728,6 +1902,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = get_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index e1f270c098f0..f09b5badbe73 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -79,6 +79,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct intel_engine_cs **engines;
+
 	struct i915_timeline *timeline;
 
 	/**
@@ -148,6 +150,8 @@ struct i915_gem_context {
 #define CONTEXT_CLOSED			1
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
 
+	unsigned int nengine;
+
 	/**
 	 * @hw_id: - unique identifier for the context
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 53d0d70c97fa..ff5810a932a8 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2090,13 +2090,23 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
 };
 
 static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
+eb_select_engine(struct i915_execbuffer *eb,
 		 struct drm_file *file,
 		 struct drm_i915_gem_execbuffer2 *args)
 {
 	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
+	if (eb->ctx->engines) {
+		if (user_ring_id >= eb->ctx->nengine) {
+			DRM_DEBUG("execbuf with unknown ring: %u\n",
+				  user_ring_id);
+			return NULL;
+		}
+
+		return eb->ctx->engines[user_ring_id];
+	}
+
 	if (user_ring_id > I915_USER_RINGS) {
 		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
 		return NULL;
@@ -2109,11 +2119,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 		return NULL;
 	}
 
-	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(dev_priv)) {
+	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(eb->i915)) {
 		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
 
 		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
-			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
+			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
 		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
 			   bsd_idx <= I915_EXEC_BSD_RING2) {
 			bsd_idx >>= I915_EXEC_BSD_SHIFT;
@@ -2124,9 +2134,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 			return NULL;
 		}
 
-		engine = dev_priv->engine[_VCS(bsd_idx)];
+		engine = eb->i915->engine[_VCS(bsd_idx)];
 	} else {
-		engine = dev_priv->engine[user_ring_map[user_ring_id]];
+		engine = eb->i915->engine[user_ring_map[user_ring_id]];
 	}
 
 	if (!engine) {
@@ -2336,7 +2346,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
+	eb.engine = eb_select_engine(&eb, file, args);
 	if (!eb.engine) {
 		err = -EINVAL;
 		goto err_engine;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 50d154954d5f..85ccba4f04e8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -124,6 +124,8 @@ enum drm_i915_gem_engine_class {
 	I915_ENGINE_CLASS_INVALID	= -1
 };
 
+#define I915_ENGINE_CLASS_INVALID_NONE -1
+
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
  *
@@ -1509,6 +1511,26 @@ struct drm_i915_gem_context_param {
 	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
 	 */
 #define I915_CONTEXT_PARAM_VM		0x9
+
+/*
+ * I915_CONTEXT_PARAM_ENGINES:
+ *
+ * Bind this context to operate on this subset of available engines. Henceforth,
+ * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
+ * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
+ * and upwards. Slots 0...N are filled in using the specified (class, instance).
+ * Use
+ *	engine_class: I915_ENGINE_CLASS_INVALID,
+ *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
+ * to specify a gap in the array that can be filled in later, e.g. by a
+ * virtual engine used for load balancing.
+ *
+ * Setting the number of engines bound to the context to 0, by passing a zero
+ * sized argument, will revert back to default settings.
+ *
+ * See struct i915_context_param_engines.
+ */
+#define I915_CONTEXT_PARAM_ENGINES	0xa
 /* Must be kept compact -- no holes and well documented */
 
 	__u64 value;
@@ -1573,6 +1595,15 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct i915_context_param_engines {
+	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+
+	struct {
+		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
+		__u16 engine_instance;
+	} class_instance[0];
+} __attribute__((packed));
+
 struct drm_i915_gem_context_create_ext_setparam {
 #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
 	struct i915_user_extension base;
@@ -1589,7 +1620,8 @@ struct drm_i915_gem_context_create_ext_clone {
 #define I915_CONTEXT_CLONE_SSEU		(1u << 2)
 #define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
 #define I915_CONTEXT_CLONE_VM		(1u << 4)
-#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+#define I915_CONTEXT_CLONE_ENGINES	(1u << 5)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_ENGINES << 1)
 	__u64 rsvd;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 25/38] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (22 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 24/38] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:03 ` [PATCH 26/38] drm/i915: Pass around the intel_context Chris Wilson
                   ` (16 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Allow the user to specify a local engine index (as opposed to
class:index) that they can use to refer to a preset engine inside the
ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
This will be useful for setting SSEU parameters on virtual engines that
are local to the context and do not have a valid global class:instance
lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
 include/uapi/drm/i915_drm.h             |  3 ++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 85067a3dc72c..d04fa649bc0e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1341,6 +1341,7 @@ static int set_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_sseu sseu;
+	unsigned long lookup;
 	int ret;
 
 	if (args->size < sizeof(user_sseu))
@@ -1353,10 +1354,17 @@ static int set_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
@@ -1802,6 +1810,7 @@ static int get_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_context *ce;
+	unsigned long lookup;
 	int ret;
 
 	if (args->size == 0)
@@ -1813,10 +1822,17 @@ static int get_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 85ccba4f04e8..b68e1ad12c0f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1565,9 +1565,10 @@ struct drm_i915_gem_context_param_sseu {
 	__u16 engine_instance;
 
 	/*
-	 * Unused for now. Must be cleared to zero.
+	 * Unknown flags must be cleared to zero.
 	 */
 	__u32 flags;
+#define I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX (1u << 0)
 
 	/*
 	 * Mask of slices to enable for the context. Valid values are a subset
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 26/38] drm/i915: Pass around the intel_context
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (23 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 25/38] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 16:16   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header Chris Wilson
                   ` (15 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Instead of passing the gem_context and engine to find the instance of
the intel_context to use, pass around the intel_context instead. This is
useful for the next few patches, where the intel_context is no longer a
direct lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 +-
 drivers/gpu/drm/i915/i915_perf.c | 32 ++++++++++++++--------------
 drivers/gpu/drm/i915/intel_lrc.c | 36 +++++++++++++++++---------------
 3 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cbc2b59722ae..493cba8a76aa 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3134,7 +3134,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
-			    struct i915_gem_context *ctx,
+			    struct intel_context *ce,
 			    u32 *reg_state);
 
 /* i915_gem_evict.c */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9ebf99f3d8d3..f969a0512465 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1629,13 +1629,14 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
  * It's fine to put out-of-date values into these per-context registers
  * in the case that the OA unit has been disabled.
  */
-static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
-					   u32 *reg_state,
-					   const struct i915_oa_config *oa_config)
+static void
+gen8_update_reg_state_unlocked(struct intel_context *ce,
+			       u32 *reg_state,
+			       const struct i915_oa_config *oa_config)
 {
-	struct drm_i915_private *dev_priv = ctx->i915;
-	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
-	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
+	struct drm_i915_private *i915 = ce->gem_context->i915;
+	u32 ctx_oactxctrl = i915->perf.oa.ctx_oactxctrl_offset;
+	u32 ctx_flexeu0 = i915->perf.oa.ctx_flexeu0_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
 	i915_reg_t flex_regs[] = {
 		EU_PERF_CNTL0,
@@ -1649,8 +1650,8 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 	int i;
 
 	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
-		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
-		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
+		(i915->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
+		(i915->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
 		GEN8_OA_COUNTER_RESUME);
 
 	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
@@ -1678,10 +1679,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
 
-	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-		gen8_make_rpcs(dev_priv,
-			       &to_intel_context(ctx,
-						 dev_priv->engine[RCS])->sseu));
+	CTX_REG(reg_state,
+		CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(i915, &ce->sseu));
 }
 
 /*
@@ -1754,7 +1754,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
 
-		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
+		gen8_update_reg_state_unlocked(ce, regs, oa_config);
 
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
@@ -2138,8 +2138,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 }
 
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
-			    struct i915_gem_context *ctx,
-			    u32 *reg_state)
+			    struct intel_context *ce,
+			    u32 *regs)
 {
 	struct i915_perf_stream *stream;
 
@@ -2148,7 +2148,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 
 	stream = engine->i915->perf.oa.exclusive_stream;
 	if (stream)
-		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
+		gen8_update_reg_state_unlocked(ce, regs, stream->oa_config);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d50a33c578c5..a2210f79dc67 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -170,7 +170,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
 static void execlists_init_reg_state(u32 *reg_state,
-				     struct i915_gem_context *ctx,
+				     struct intel_context *ce,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
 
@@ -1314,9 +1314,10 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
 	regs[CTX_RING_TAIL + 1] = ring->tail;
 
 	/* RPCS */
-	if (engine->class == RENDER_CLASS)
-		regs[CTX_R_PWR_CLK_STATE + 1] = gen8_make_rpcs(engine->i915,
-							       &ce->sseu);
+	if (engine->class == RENDER_CLASS) {
+		regs[CTX_R_PWR_CLK_STATE + 1] =
+			gen8_make_rpcs(engine->i915, &ce->sseu);
+	}
 }
 
 static struct intel_context *
@@ -2021,7 +2022,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
 	intel_ring_update_space(rq->ring);
 
-	execlists_init_reg_state(regs, rq->gem_context, engine, rq->ring);
+	execlists_init_reg_state(regs, rq->hw_context, engine, rq->ring);
 	__execlists_update_reg_state(engine, rq->hw_context);
 
 out_unlock:
@@ -2659,7 +2660,7 @@ static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
 }
 
 static void execlists_init_reg_state(u32 *regs,
-				     struct i915_gem_context *ctx,
+				     struct intel_context *ce,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring)
 {
@@ -2735,24 +2736,24 @@ static void execlists_init_reg_state(u32 *regs,
 	CTX_REG(regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(engine, 0), 0);
 	CTX_REG(regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(engine, 0), 0);
 
-	if (i915_vm_is_48bit(&ctx->ppgtt->vm)) {
+	if (i915_vm_is_48bit(&ce->gem_context->ppgtt->vm)) {
 		/* 64b PPGTT (48bit canonical)
 		 * PDP0_DESCRIPTOR contains the base address to PML4 and
 		 * other PDP Descriptors are ignored.
 		 */
-		ASSIGN_CTX_PML4(ctx->ppgtt, regs);
+		ASSIGN_CTX_PML4(ce->gem_context->ppgtt, regs);
 	} else {
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 3);
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 2);
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 1);
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 0);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 3);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 2);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 1);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 0);
 	}
 
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
 
-		i915_oa_init_reg_state(engine, ctx, regs);
+		i915_oa_init_reg_state(engine, ce, regs);
 	}
 
 	regs[CTX_END] = MI_BATCH_BUFFER_END;
@@ -2761,7 +2762,7 @@ static void execlists_init_reg_state(u32 *regs,
 }
 
 static int
-populate_lr_context(struct i915_gem_context *ctx,
+populate_lr_context(struct intel_context *ce,
 		    struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *engine,
 		    struct intel_ring *ring)
@@ -2807,11 +2808,12 @@ populate_lr_context(struct i915_gem_context *ctx,
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
 	regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
-	execlists_init_reg_state(regs, ctx, engine, ring);
+	execlists_init_reg_state(regs, ce, engine, ring);
 	if (!engine->default_state)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
-	if (ctx == ctx->i915->preempt_context && INTEL_GEN(engine->i915) < 11)
+	if (ce->gem_context == engine->i915->preempt_context &&
+	    INTEL_GEN(engine->i915) < 11)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
 					   CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT);
@@ -2869,7 +2871,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_deref_obj;
 	}
 
-	ret = populate_lr_context(ctx, ctx_obj, engine, ring);
+	ret = populate_lr_context(ce, ctx_obj, engine, ring);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
 		goto error_ring_free;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (24 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 26/38] drm/i915: Pass around the intel_context Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 16:19   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs Chris Wilson
                   ` (14 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

This complex struct pulling in half the driver deserves its own
isolation in preparation for intel_context becoming an outright
complicated class of its own.

In order to split this beast into its own header also requests splitting
several of its dependent types and their dependencies into their own
headers as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.h       | 245 +-------
 drivers/gpu/drm/i915/i915_gem_context_types.h | 188 +++++++
 drivers/gpu/drm/i915/i915_timeline.h          |  70 +--
 drivers/gpu/drm/i915/i915_timeline_types.h    |  80 +++
 drivers/gpu/drm/i915/intel_context.h          |  47 ++
 drivers/gpu/drm/i915/intel_context_types.h    |  60 ++
 drivers/gpu/drm/i915/intel_engine_types.h     | 521 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc.h              |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h       | 502 +----------------
 drivers/gpu/drm/i915/intel_workarounds.h      |  13 +-
 .../gpu/drm/i915/intel_workarounds_types.h    |  25 +
 11 files changed, 928 insertions(+), 824 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_context_types.h
 create mode 100644 drivers/gpu/drm/i915/i915_timeline_types.h
 create mode 100644 drivers/gpu/drm/i915/intel_context.h
 create mode 100644 drivers/gpu/drm/i915/intel_context_types.h
 create mode 100644 drivers/gpu/drm/i915/intel_engine_types.h
 create mode 100644 drivers/gpu/drm/i915/intel_workarounds_types.h

diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index f09b5badbe73..110d5881c9de 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -25,225 +25,17 @@
 #ifndef __I915_GEM_CONTEXT_H__
 #define __I915_GEM_CONTEXT_H__
 
-#include <linux/bitops.h>
-#include <linux/list.h>
-#include <linux/radix-tree.h>
+#include "i915_gem_context_types.h"
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "intel_context.h"
 #include "intel_device_info.h"
 #include "intel_ringbuffer.h"
 
-struct pid;
-
 struct drm_device;
 struct drm_file;
 
-struct drm_i915_private;
-struct drm_i915_file_private;
-struct i915_hw_ppgtt;
-struct i915_request;
-struct i915_timeline;
-struct i915_vma;
-struct intel_ring;
-
-#define DEFAULT_CONTEXT_HANDLE 0
-
-struct intel_context;
-
-struct intel_context_ops {
-	void (*unpin)(struct intel_context *ce);
-	void (*destroy)(struct intel_context *ce);
-};
-
-/*
- * Powergating configuration for a particular (context,engine).
- */
-struct intel_sseu {
-	u8 slice_mask;
-	u8 subslice_mask;
-	u8 min_eus_per_subslice;
-	u8 max_eus_per_subslice;
-};
-
-/**
- * struct i915_gem_context - client state
- *
- * The struct i915_gem_context represents the combined view of the driver and
- * logical hardware state for a particular client.
- */
-struct i915_gem_context {
-	/** i915: i915 device backpointer */
-	struct drm_i915_private *i915;
-
-	/** file_priv: owning file descriptor */
-	struct drm_i915_file_private *file_priv;
-
-	struct intel_engine_cs **engines;
-
-	struct i915_timeline *timeline;
-
-	/**
-	 * @ppgtt: unique address space (GTT)
-	 *
-	 * In full-ppgtt mode, each context has its own address space ensuring
-	 * complete seperation of one client from all others.
-	 *
-	 * In other modes, this is a NULL pointer with the expectation that
-	 * the caller uses the shared global GTT.
-	 */
-	struct i915_hw_ppgtt *ppgtt;
-
-	/**
-	 * @pid: process id of creator
-	 *
-	 * Note that who created the context may not be the principle user,
-	 * as the context may be shared across a local socket. However,
-	 * that should only affect the default context, all contexts created
-	 * explicitly by the client are expected to be isolated.
-	 */
-	struct pid *pid;
-
-	/**
-	 * @name: arbitrary name
-	 *
-	 * A name is constructed for the context from the creator's process
-	 * name, pid and user handle in order to uniquely identify the
-	 * context in messages.
-	 */
-	const char *name;
-
-	/** link: place with &drm_i915_private.context_list */
-	struct list_head link;
-	struct llist_node free_link;
-
-	/**
-	 * @ref: reference count
-	 *
-	 * A reference to a context is held by both the client who created it
-	 * and on each request submitted to the hardware using the request
-	 * (to ensure the hardware has access to the state until it has
-	 * finished all pending writes). See i915_gem_context_get() and
-	 * i915_gem_context_put() for access.
-	 */
-	struct kref ref;
-
-	/**
-	 * @rcu: rcu_head for deferred freeing.
-	 */
-	struct rcu_head rcu;
-
-	/**
-	 * @user_flags: small set of booleans controlled by the user
-	 */
-	unsigned long user_flags;
-#define UCONTEXT_NO_ZEROMAP		0
-#define UCONTEXT_NO_ERROR_CAPTURE	1
-#define UCONTEXT_BANNABLE		2
-#define UCONTEXT_RECOVERABLE		3
-
-	/**
-	 * @flags: small set of booleans
-	 */
-	unsigned long flags;
-#define CONTEXT_BANNED			0
-#define CONTEXT_CLOSED			1
-#define CONTEXT_FORCE_SINGLE_SUBMISSION	2
-
-	unsigned int nengine;
-
-	/**
-	 * @hw_id: - unique identifier for the context
-	 *
-	 * The hardware needs to uniquely identify the context for a few
-	 * functions like fault reporting, PASID, scheduling. The
-	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
-	 * id for the lifetime of the context.
-	 *
-	 * @hw_id_pin_count: - number of times this context had been pinned
-	 * for use (should be, at most, once per engine).
-	 *
-	 * @hw_id_link: - all contexts with an assigned id are tracked
-	 * for possible repossession.
-	 */
-	unsigned int hw_id;
-	atomic_t hw_id_pin_count;
-	struct list_head hw_id_link;
-
-	struct list_head active_engines;
-	struct mutex mutex;
-
-	/**
-	 * @user_handle: userspace identifier
-	 *
-	 * A unique per-file identifier is generated from
-	 * &drm_i915_file_private.contexts.
-	 */
-	u32 user_handle;
-
-	struct i915_sched_attr sched;
-
-	/** engine: per-engine logical HW state */
-	struct intel_context {
-		struct i915_gem_context *gem_context;
-		struct intel_engine_cs *engine;
-		struct intel_engine_cs *active;
-		struct list_head active_link;
-		struct list_head signal_link;
-		struct list_head signals;
-		struct i915_vma *state;
-		struct intel_ring *ring;
-		u32 *lrc_reg_state;
-		u64 lrc_desc;
-		int pin_count;
-
-		/**
-		 * active_tracker: Active tracker for the external rq activity
-		 * on this intel_context object.
-		 */
-		struct i915_active_request active_tracker;
-
-		const struct intel_context_ops *ops;
-
-		/** sseu: Control eu/slice partitioning */
-		struct intel_sseu sseu;
-	} __engine[I915_NUM_ENGINES];
-
-	/** ring_size: size for allocating the per-engine ring buffer */
-	u32 ring_size;
-	/** desc_template: invariant fields for the HW context descriptor */
-	u32 desc_template;
-
-	/** guilty_count: How many times this context has caused a GPU hang. */
-	atomic_t guilty_count;
-	/**
-	 * @active_count: How many times this context was active during a GPU
-	 * hang, but did not cause it.
-	 */
-	atomic_t active_count;
-
-	/**
-	 * @hang_timestamp: The last time(s) this context caused a GPU hang
-	 */
-	unsigned long hang_timestamp[2];
-#define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */
-
-	/** remap_slice: Bitmask of cache lines that need remapping */
-	u8 remap_slice;
-
-	/** handles_vma: rbtree to look up our context specific obj/vma for
-	 * the user handle. (user handles are per fd, but the binding is
-	 * per vm, which may be one per context or shared with the global GTT)
-	 */
-	struct radix_tree_root handles_vma;
-
-	/** handles_list: reverse list of all the rbtree entries in use for
-	 * this context, which allows us to free all the allocations on
-	 * context close.
-	 */
-	struct list_head handles_list;
-};
-
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
 {
 	return test_bit(CONTEXT_CLOSED, &ctx->flags);
@@ -345,35 +137,6 @@ static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
 	return !ctx->file_priv;
 }
 
-static inline struct intel_context *
-to_intel_context(struct i915_gem_context *ctx,
-		 const struct intel_engine_cs *engine)
-{
-	return &ctx->__engine[engine->id];
-}
-
-static inline struct intel_context *
-intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
-{
-	return engine->context_pin(engine, ctx);
-}
-
-static inline void __intel_context_pin(struct intel_context *ce)
-{
-	GEM_BUG_ON(!ce->pin_count);
-	ce->pin_count++;
-}
-
-static inline void intel_context_unpin(struct intel_context *ce)
-{
-	GEM_BUG_ON(!ce->pin_count);
-	if (--ce->pin_count)
-		return;
-
-	GEM_BUG_ON(!ce->ops);
-	ce->ops->unpin(ce);
-}
-
 /* i915_gem_context.c */
 int __must_check i915_gem_contexts_init(struct drm_i915_private *dev_priv);
 void i915_gem_contexts_lost(struct drm_i915_private *dev_priv);
@@ -422,10 +185,6 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_release);
 }
 
-void intel_context_init(struct intel_context *ce,
-			struct i915_gem_context *ctx,
-			struct intel_engine_cs *engine);
-
 struct i915_lut_handle *i915_lut_handle_alloc(void);
 void i915_lut_handle_free(struct i915_lut_handle *lut);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
new file mode 100644
index 000000000000..8baa7a5e7fdb
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -0,0 +1,188 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __I915_GEM_CONTEXT_TYPES_H__
+#define __I915_GEM_CONTEXT_TYPES_H__
+
+#include <linux/atomic.h>
+#include <linux/list.h>
+#include <linux/llist.h>
+#include <linux/kref.h>
+#include <linux/mutex.h>
+#include <linux/radix-tree.h>
+#include <linux/rcupdate.h>
+#include <linux/types.h>
+
+#include "i915_gem.h" /* I915_NUM_ENGINES */
+#include "i915_scheduler.h"
+#include "intel_context_types.h"
+
+struct pid;
+
+struct drm_i915_private;
+struct drm_i915_file_private;
+struct i915_hw_ppgtt;
+struct i915_timeline;
+struct intel_ring;
+
+/**
+ * struct i915_gem_context - client state
+ *
+ * The struct i915_gem_context represents the combined view of the driver and
+ * logical hardware state for a particular client.
+ */
+struct i915_gem_context {
+	/** i915: i915 device backpointer */
+	struct drm_i915_private *i915;
+
+	/** file_priv: owning file descriptor */
+	struct drm_i915_file_private *file_priv;
+
+	struct intel_engine_cs **engines;
+
+	struct i915_timeline *timeline;
+
+	/**
+	 * @ppgtt: unique address space (GTT)
+	 *
+	 * In full-ppgtt mode, each context has its own address space ensuring
+	 * complete seperation of one client from all others.
+	 *
+	 * In other modes, this is a NULL pointer with the expectation that
+	 * the caller uses the shared global GTT.
+	 */
+	struct i915_hw_ppgtt *ppgtt;
+
+	/**
+	 * @pid: process id of creator
+	 *
+	 * Note that who created the context may not be the principle user,
+	 * as the context may be shared across a local socket. However,
+	 * that should only affect the default context, all contexts created
+	 * explicitly by the client are expected to be isolated.
+	 */
+	struct pid *pid;
+
+	/**
+	 * @name: arbitrary name
+	 *
+	 * A name is constructed for the context from the creator's process
+	 * name, pid and user handle in order to uniquely identify the
+	 * context in messages.
+	 */
+	const char *name;
+
+	/** link: place with &drm_i915_private.context_list */
+	struct list_head link;
+	struct llist_node free_link;
+
+	/**
+	 * @ref: reference count
+	 *
+	 * A reference to a context is held by both the client who created it
+	 * and on each request submitted to the hardware using the request
+	 * (to ensure the hardware has access to the state until it has
+	 * finished all pending writes). See i915_gem_context_get() and
+	 * i915_gem_context_put() for access.
+	 */
+	struct kref ref;
+
+	/**
+	 * @rcu: rcu_head for deferred freeing.
+	 */
+	struct rcu_head rcu;
+
+	/**
+	 * @user_flags: small set of booleans controlled by the user
+	 */
+	unsigned long user_flags;
+#define UCONTEXT_NO_ZEROMAP		0
+#define UCONTEXT_NO_ERROR_CAPTURE	1
+#define UCONTEXT_BANNABLE		2
+#define UCONTEXT_RECOVERABLE		3
+
+	/**
+	 * @flags: small set of booleans
+	 */
+	unsigned long flags;
+#define CONTEXT_BANNED			0
+#define CONTEXT_CLOSED			1
+#define CONTEXT_FORCE_SINGLE_SUBMISSION	2
+
+	unsigned int nengine;
+
+	/**
+	 * @hw_id: - unique identifier for the context
+	 *
+	 * The hardware needs to uniquely identify the context for a few
+	 * functions like fault reporting, PASID, scheduling. The
+	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
+	 * id for the lifetime of the context.
+	 *
+	 * @hw_id_pin_count: - number of times this context had been pinned
+	 * for use (should be, at most, once per engine).
+	 *
+	 * @hw_id_link: - all contexts with an assigned id are tracked
+	 * for possible repossession.
+	 */
+	unsigned int hw_id;
+	atomic_t hw_id_pin_count;
+	struct list_head hw_id_link;
+
+	struct list_head active_engines;
+	struct mutex mutex;
+
+	/**
+	 * @user_handle: userspace identifier
+	 *
+	 * A unique per-file identifier is generated from
+	 * &drm_i915_file_private.contexts.
+	 */
+	u32 user_handle;
+#define DEFAULT_CONTEXT_HANDLE 0
+
+	struct i915_sched_attr sched;
+
+	/** engine: per-engine logical HW state */
+	struct intel_context __engine[I915_NUM_ENGINES];
+
+	/** ring_size: size for allocating the per-engine ring buffer */
+	u32 ring_size;
+	/** desc_template: invariant fields for the HW context descriptor */
+	u32 desc_template;
+
+	/** guilty_count: How many times this context has caused a GPU hang. */
+	atomic_t guilty_count;
+	/**
+	 * @active_count: How many times this context was active during a GPU
+	 * hang, but did not cause it.
+	 */
+	atomic_t active_count;
+
+	/**
+	 * @hang_timestamp: The last time(s) this context caused a GPU hang
+	 */
+	unsigned long hang_timestamp[2];
+#define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */
+
+
+	/** remap_slice: Bitmask of cache lines that need remapping */
+	u8 remap_slice;
+
+	/** handles_vma: rbtree to look up our context specific obj/vma for
+	 * the user handle. (user handles are per fd, but the binding is
+	 * per vm, which may be one per context or shared with the global GTT)
+	 */
+	struct radix_tree_root handles_vma;
+
+	/** handles_list: reverse list of all the rbtree entries in use for
+	 * this context, which allows us to free all the allocations on
+	 * context close.
+	 */
+	struct list_head handles_list;
+};
+
+#endif /* __I915_GEM_CONTEXT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 60b1dfad93ed..9126c8206490 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -25,76 +25,10 @@
 #ifndef I915_TIMELINE_H
 #define I915_TIMELINE_H
 
-#include <linux/list.h>
-#include <linux/kref.h>
+#include <linux/lockdep.h>
 
-#include "i915_active.h"
-#include "i915_request.h"
 #include "i915_syncmap.h"
-#include "i915_utils.h"
-
-struct i915_vma;
-struct i915_timeline_cacheline;
-
-struct i915_timeline {
-	u64 fence_context;
-	u32 seqno;
-
-	spinlock_t lock;
-#define TIMELINE_CLIENT 0 /* default subclass */
-#define TIMELINE_ENGINE 1
-
-	struct mutex mutex; /* protects the flow of requests */
-
-	unsigned int pin_count;
-	const u32 *hwsp_seqno;
-	struct i915_vma *hwsp_ggtt;
-	u32 hwsp_offset;
-
-	struct i915_timeline_cacheline *hwsp_cacheline;
-
-	bool has_initial_breadcrumb;
-
-	/**
-	 * List of breadcrumbs associated with GPU requests currently
-	 * outstanding.
-	 */
-	struct list_head requests;
-
-	/* Contains an RCU guarded pointer to the last request. No reference is
-	 * held to the request, users must carefully acquire a reference to
-	 * the request using i915_active_request_get_request_rcu(), or hold the
-	 * struct_mutex.
-	 */
-	struct i915_active_request last_request;
-
-	/**
-	 * We track the most recent seqno that we wait on in every context so
-	 * that we only have to emit a new await and dependency on a more
-	 * recent sync point. As the contexts may be executed out-of-order, we
-	 * have to track each individually and can not rely on an absolute
-	 * global_seqno. When we know that all tracked fences are completed
-	 * (i.e. when the driver is idle), we know that the syncmap is
-	 * redundant and we can discard it without loss of generality.
-	 */
-	struct i915_syncmap *sync;
-
-	/**
-	 * Barrier provides the ability to serialize ordering between different
-	 * timelines.
-	 *
-	 * Users can call i915_timeline_set_barrier which will make all
-	 * subsequent submissions to this timeline be executed only after the
-	 * barrier has been completed.
-	 */
-	struct i915_active_request barrier;
-
-	struct list_head link;
-	const char *name;
-	struct drm_i915_private *i915;
-
-	struct kref kref;
-};
+#include "i915_timeline_types.h"
 
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *tl,
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
new file mode 100644
index 000000000000..8ff146dc05ba
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -0,0 +1,80 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2016 Intel Corporation
+ */
+
+#ifndef __I915_TIMELINE_TYPES_H__
+#define __I915_TIMELINE_TYPES_H__
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/types.h>
+
+#include "i915_active.h"
+
+struct drm_i915_private;
+struct i915_vma;
+struct i915_timeline_cacheline;
+struct i915_syncmap;
+
+struct i915_timeline {
+	u64 fence_context;
+	u32 seqno;
+
+	spinlock_t lock;
+#define TIMELINE_CLIENT 0 /* default subclass */
+#define TIMELINE_ENGINE 1
+	struct mutex mutex; /* protects the flow of requests */
+
+	unsigned int pin_count;
+	const u32 *hwsp_seqno;
+	struct i915_vma *hwsp_ggtt;
+	u32 hwsp_offset;
+
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
+	bool has_initial_breadcrumb;
+
+	/**
+	 * List of breadcrumbs associated with GPU requests currently
+	 * outstanding.
+	 */
+	struct list_head requests;
+
+	/* Contains an RCU guarded pointer to the last request. No reference is
+	 * held to the request, users must carefully acquire a reference to
+	 * the request using i915_active_request_get_request_rcu(), or hold the
+	 * struct_mutex.
+	 */
+	struct i915_active_request last_request;
+
+	/**
+	 * We track the most recent seqno that we wait on in every context so
+	 * that we only have to emit a new await and dependency on a more
+	 * recent sync point. As the contexts may be executed out-of-order, we
+	 * have to track each individually and can not rely on an absolute
+	 * global_seqno. When we know that all tracked fences are completed
+	 * (i.e. when the driver is idle), we know that the syncmap is
+	 * redundant and we can discard it without loss of generality.
+	 */
+	struct i915_syncmap *sync;
+
+	/**
+	 * Barrier provides the ability to serialize ordering between different
+	 * timelines.
+	 *
+	 * Users can call i915_timeline_set_barrier which will make all
+	 * subsequent submissions to this timeline be executed only after the
+	 * barrier has been completed.
+	 */
+	struct i915_active_request barrier;
+
+	struct list_head link;
+	const char *name;
+	struct drm_i915_private *i915;
+
+	struct kref kref;
+};
+
+#endif /* __I915_TIMELINE_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
new file mode 100644
index 000000000000..dd947692bb0b
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -0,0 +1,47 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_CONTEXT_H__
+#define __INTEL_CONTEXT_H__
+
+#include "i915_gem_context_types.h"
+#include "intel_context_types.h"
+#include "intel_engine_types.h"
+
+void intel_context_init(struct intel_context *ce,
+			struct i915_gem_context *ctx,
+			struct intel_engine_cs *engine);
+
+static inline struct intel_context *
+to_intel_context(struct i915_gem_context *ctx,
+		 const struct intel_engine_cs *engine)
+{
+	return &ctx->__engine[engine->id];
+}
+
+static inline struct intel_context *
+intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
+{
+	return engine->context_pin(engine, ctx);
+}
+
+static inline void __intel_context_pin(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->pin_count);
+	ce->pin_count++;
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->pin_count);
+	if (--ce->pin_count)
+		return;
+
+	GEM_BUG_ON(!ce->ops);
+	ce->ops->unpin(ce);
+}
+
+#endif /* __INTEL_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
new file mode 100644
index 000000000000..16e1306e9595
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -0,0 +1,60 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_CONTEXT_TYPES__
+#define __INTEL_CONTEXT_TYPES__
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+#include "i915_active_types.h"
+
+struct i915_gem_context;
+struct i915_vma;
+struct intel_context;
+struct intel_ring;
+
+struct intel_context_ops {
+	void (*unpin)(struct intel_context *ce);
+	void (*destroy)(struct intel_context *ce);
+};
+
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+struct intel_sseu {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 min_eus_per_subslice;
+	u8 max_eus_per_subslice;
+};
+
+struct intel_context {
+	struct i915_gem_context *gem_context;
+	struct intel_engine_cs *engine;
+	struct intel_engine_cs *active;
+	struct list_head active_link;
+	struct list_head signal_link;
+	struct list_head signals;
+	struct i915_vma *state;
+	struct intel_ring *ring;
+	u32 *lrc_reg_state;
+	u64 lrc_desc;
+	int pin_count;
+
+	/**
+	 * active_tracker: Active tracker for the external rq activity
+	 * on this intel_context object.
+	 */
+	struct i915_active_request active_tracker;
+
+	const struct intel_context_ops *ops;
+
+	/** sseu: Control eu/slice partitioning */
+	struct intel_sseu sseu;
+};
+
+#endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
new file mode 100644
index 000000000000..5ec6e72d0ffb
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -0,0 +1,521 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_ENGINE_TYPES__
+#define __INTEL_ENGINE_TYPES__
+
+#include <linux/hashtable.h>
+#include <linux/irq_work.h>
+#include <linux/list.h>
+#include <linux/types.h>
+
+#include "i915_timeline_types.h"
+#include "intel_device_info.h"
+#include "intel_workarounds_types.h"
+
+#include "i915_gem_batch_pool.h"
+#include "i915_pmu.h"
+
+#define I915_MAX_SLICES	3
+#define I915_MAX_SUBSLICES 8
+
+#define I915_CMD_HASH_ORDER 9
+
+struct drm_i915_reg_table;
+struct i915_gem_context;
+struct i915_request;
+struct i915_sched_attr;
+
+struct intel_hw_status_page {
+	struct i915_vma *vma;
+	u32 *addr;
+};
+
+struct intel_instdone {
+	u32 instdone;
+	/* The following exist only in the RCS engine */
+	u32 slice_common;
+	u32 sampler[I915_MAX_SLICES][I915_MAX_SUBSLICES];
+	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
+};
+
+struct intel_engine_hangcheck {
+	u64 acthd;
+	u32 last_seqno;
+	u32 next_seqno;
+	unsigned long action_timestamp;
+	struct intel_instdone instdone;
+};
+
+struct intel_ring {
+	struct i915_vma *vma;
+	void *vaddr;
+
+	struct i915_timeline *timeline;
+	struct list_head request_list;
+	struct list_head active_link;
+
+	u32 head;
+	u32 tail;
+	u32 emit;
+
+	u32 space;
+	u32 size;
+	u32 effective_size;
+};
+
+/*
+ * we use a single page to load ctx workarounds so all of these
+ * values are referred in terms of dwords
+ *
+ * struct i915_wa_ctx_bb:
+ *  offset: specifies batch starting position, also helpful in case
+ *    if we want to have multiple batches at different offsets based on
+ *    some criteria. It is not a requirement at the moment but provides
+ *    an option for future use.
+ *  size: size of the batch in DWORDS
+ */
+struct i915_ctx_workarounds {
+	struct i915_wa_ctx_bb {
+		u32 offset;
+		u32 size;
+	} indirect_ctx, per_ctx;
+	struct i915_vma *vma;
+};
+
+#define I915_MAX_VCS	4
+#define I915_MAX_VECS	2
+
+/*
+ * Engine IDs definitions.
+ * Keep instances of the same type engine together.
+ */
+enum intel_engine_id {
+	RCS = 0,
+	BCS,
+	VCS,
+	VCS2,
+	VCS3,
+	VCS4,
+#define _VCS(n) (VCS + (n))
+	VECS,
+	VECS2
+#define _VECS(n) (VECS + (n))
+};
+
+struct st_preempt_hang {
+	struct completion completion;
+	unsigned int count;
+	bool inject_hang;
+};
+
+/**
+ * struct intel_engine_execlists - execlist submission queue and port state
+ *
+ * The struct intel_engine_execlists represents the combined logical state of
+ * driver and the hardware state for execlist mode of submission.
+ */
+struct intel_engine_execlists {
+	/**
+	 * @tasklet: softirq tasklet for bottom handler
+	 */
+	struct tasklet_struct tasklet;
+
+	/**
+	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
+	 */
+	struct i915_priolist default_priolist;
+
+	/**
+	 * @no_priolist: priority lists disabled
+	 */
+	bool no_priolist;
+
+	/**
+	 * @submit_reg: gen-specific execlist submission register
+	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
+	 * the ExecList Submission Queue Contents register array for Gen11+
+	 */
+	u32 __iomem *submit_reg;
+
+	/**
+	 * @ctrl_reg: the enhanced execlists control register, used to load the
+	 * submit queue on the HW and to request preemptions to idle
+	 */
+	u32 __iomem *ctrl_reg;
+
+	/**
+	 * @port: execlist port states
+	 *
+	 * For each hardware ELSP (ExecList Submission Port) we keep
+	 * track of the last request and the number of times we submitted
+	 * that port to hw. We then count the number of times the hw reports
+	 * a context completion or preemption. As only one context can
+	 * be active on hw, we limit resubmission of context to port[0]. This
+	 * is called Lite Restore, of the context.
+	 */
+	struct execlist_port {
+		/**
+		 * @request_count: combined request and submission count
+		 */
+		struct i915_request *request_count;
+#define EXECLIST_COUNT_BITS 2
+#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
+#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
+#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
+#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
+#define port_set(p, packed) ((p)->request_count = (packed))
+#define port_isset(p) ((p)->request_count)
+#define port_index(p, execlists) ((p) - (execlists)->port)
+
+		/**
+		 * @context_id: context ID for port
+		 */
+		GEM_DEBUG_DECL(u32 context_id);
+
+#define EXECLIST_MAX_PORTS 2
+	} port[EXECLIST_MAX_PORTS];
+
+	/**
+	 * @active: is the HW active? We consider the HW as active after
+	 * submitting any context for execution and until we have seen the
+	 * last context completion event. After that, we do not expect any
+	 * more events until we submit, and so can park the HW.
+	 *
+	 * As we have a small number of different sources from which we feed
+	 * the HW, we track the state of each inside a single bitfield.
+	 */
+	unsigned int active;
+#define EXECLISTS_ACTIVE_USER 0
+#define EXECLISTS_ACTIVE_PREEMPT 1
+#define EXECLISTS_ACTIVE_HWACK 2
+
+	/**
+	 * @port_mask: number of execlist ports - 1
+	 */
+	unsigned int port_mask;
+
+	/**
+	 * @queue_priority_hint: Highest pending priority.
+	 *
+	 * When we add requests into the queue, or adjust the priority of
+	 * executing requests, we compute the maximum priority of those
+	 * pending requests. We can then use this value to determine if
+	 * we need to preempt the executing requests to service the queue.
+	 * However, since the we may have recorded the priority of an inflight
+	 * request we wanted to preempt but since completed, at the time of
+	 * dequeuing the priority hint may no longer may match the highest
+	 * available request priority.
+	 */
+	int queue_priority_hint;
+
+	/**
+	 * @queue: queue of requests, in priority lists
+	 */
+	struct rb_root_cached queue;
+
+	/**
+	 * @csb_write: control register for Context Switch buffer
+	 *
+	 * Note this register may be either mmio or HWSP shadow.
+	 */
+	u32 *csb_write;
+
+	/**
+	 * @csb_status: status array for Context Switch buffer
+	 *
+	 * Note these register may be either mmio or HWSP shadow.
+	 */
+	u32 *csb_status;
+
+	/**
+	 * @preempt_complete_status: expected CSB upon completing preemption
+	 */
+	u32 preempt_complete_status;
+
+	/**
+	 * @csb_head: context status buffer head
+	 */
+	u8 csb_head;
+
+	I915_SELFTEST_DECLARE(struct st_preempt_hang preempt_hang;)
+};
+
+#define INTEL_ENGINE_CS_MAX_NAME 8
+
+struct intel_engine_cs {
+	struct drm_i915_private *i915;
+	char name[INTEL_ENGINE_CS_MAX_NAME];
+
+	enum intel_engine_id id;
+	unsigned int hw_id;
+	unsigned int guc_id;
+	intel_engine_mask_t mask;
+
+	u8 uabi_class;
+
+	u8 class;
+	u8 instance;
+	u32 context_size;
+	u32 mmio_base;
+
+	struct intel_ring *buffer;
+
+	struct i915_timeline timeline;
+
+	struct drm_i915_gem_object *default_state;
+	void *pinned_default_state;
+
+	/* Rather than have every client wait upon all user interrupts,
+	 * with the herd waking after every interrupt and each doing the
+	 * heavyweight seqno dance, we delegate the task (of being the
+	 * bottom-half of the user interrupt) to the first client. After
+	 * every interrupt, we wake up one client, who does the heavyweight
+	 * coherent seqno read and either goes back to sleep (if incomplete),
+	 * or wakes up all the completed clients in parallel, before then
+	 * transferring the bottom-half status to the next client in the queue.
+	 *
+	 * Compared to walking the entire list of waiters in a single dedicated
+	 * bottom-half, we reduce the latency of the first waiter by avoiding
+	 * a context switch, but incur additional coherent seqno reads when
+	 * following the chain of request breadcrumbs. Since it is most likely
+	 * that we have a single client waiting on each seqno, then reducing
+	 * the overhead of waking that client is much preferred.
+	 */
+	struct intel_breadcrumbs {
+		spinlock_t irq_lock;
+		struct list_head signalers;
+
+		struct irq_work irq_work; /* for use from inside irq_lock */
+
+		unsigned int irq_enabled;
+
+		bool irq_armed;
+	} breadcrumbs;
+
+	struct intel_engine_pmu {
+		/**
+		 * @enable: Bitmask of enable sample events on this engine.
+		 *
+		 * Bits correspond to sample event types, for instance
+		 * I915_SAMPLE_QUEUED is bit 0 etc.
+		 */
+		u32 enable;
+		/**
+		 * @enable_count: Reference count for the enabled samplers.
+		 *
+		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
+		 */
+		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
+		/**
+		 * @sample: Counter values for sampling events.
+		 *
+		 * Our internal timer stores the current counters in this field.
+		 *
+		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
+		 */
+		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
+	} pmu;
+
+	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool batch_pool;
+
+	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds wa_ctx;
+	struct i915_wa_list ctx_wa_list;
+	struct i915_wa_list wa_list;
+	struct i915_wa_list whitelist;
+
+	u32             irq_keep_mask; /* always keep these interrupts */
+	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
+	void		(*irq_enable)(struct intel_engine_cs *engine);
+	void		(*irq_disable)(struct intel_engine_cs *engine);
+
+	int		(*init_hw)(struct intel_engine_cs *engine);
+
+	struct {
+		void (*prepare)(struct intel_engine_cs *engine);
+		void (*reset)(struct intel_engine_cs *engine, bool stalled);
+		void (*finish)(struct intel_engine_cs *engine);
+	} reset;
+
+	void		(*park)(struct intel_engine_cs *engine);
+	void		(*unpark)(struct intel_engine_cs *engine);
+
+	void		(*set_default_submission)(struct intel_engine_cs *engine);
+
+	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
+					     struct i915_gem_context *ctx);
+
+	int		(*request_alloc)(struct i915_request *rq);
+	int		(*init_context)(struct i915_request *rq);
+
+	int		(*emit_flush)(struct i915_request *request, u32 mode);
+#define EMIT_INVALIDATE	BIT(0)
+#define EMIT_FLUSH	BIT(1)
+#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
+	int		(*emit_bb_start)(struct i915_request *rq,
+					 u64 offset, u32 length,
+					 unsigned int dispatch_flags);
+#define I915_DISPATCH_SECURE BIT(0)
+#define I915_DISPATCH_PINNED BIT(1)
+	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
+	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
+						 u32 *cs);
+	unsigned int	emit_fini_breadcrumb_dw;
+
+	/* Pass the request to the hardware queue (e.g. directly into
+	 * the legacy ringbuffer or to the end of an execlist).
+	 *
+	 * This is called from an atomic context with irqs disabled; must
+	 * be irq safe.
+	 */
+	void		(*submit_request)(struct i915_request *rq);
+
+	/*
+	 * Call when the priority on a request has changed and it and its
+	 * dependencies may need rescheduling. Note the request itself may
+	 * not be ready to run!
+	 */
+	void		(*schedule)(struct i915_request *request,
+				    const struct i915_sched_attr *attr);
+
+	/*
+	 * Cancel all requests on the hardware, or queued for execution.
+	 * This should only cancel the ready requests that have been
+	 * submitted to the engine (via the engine->submit_request callback).
+	 * This is called when marking the device as wedged.
+	 */
+	void		(*cancel_requests)(struct intel_engine_cs *engine);
+
+	void		(*cleanup)(struct intel_engine_cs *engine);
+
+	struct intel_engine_execlists execlists;
+
+	/* Contexts are pinned whilst they are active on the GPU. The last
+	 * context executed remains active whilst the GPU is idle - the
+	 * switch away and write to the context object only occurs on the
+	 * next execution.  Contexts are only unpinned on retirement of the
+	 * following request ensuring that we can always write to the object
+	 * on the context switch even after idling. Across suspend, we switch
+	 * to the kernel context and trash it as the save may not happen
+	 * before the hardware is powered down.
+	 */
+	struct intel_context *last_retired_context;
+
+	/* status_notifier: list of callbacks for context-switch changes */
+	struct atomic_notifier_head context_status_notifier;
+
+	struct intel_engine_hangcheck hangcheck;
+
+#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
+#define I915_ENGINE_SUPPORTS_STATS   BIT(1)
+#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+	unsigned int flags;
+
+	/*
+	 * Table of commands the command parser needs to know about
+	 * for this engine.
+	 */
+	DECLARE_HASHTABLE(cmd_hash, I915_CMD_HASH_ORDER);
+
+	/*
+	 * Table of registers allowed in commands that read/write registers.
+	 */
+	const struct drm_i915_reg_table *reg_tables;
+	int reg_table_count;
+
+	/*
+	 * Returns the bitmask for the length field of the specified command.
+	 * Return 0 for an unrecognized/invalid command.
+	 *
+	 * If the command parser finds an entry for a command in the engine's
+	 * cmd_tables, it gets the command's length based on the table entry.
+	 * If not, it calls this function to determine the per-engine length
+	 * field encoding for the command (i.e. different opcode ranges use
+	 * certain bits to encode the command length in the header).
+	 */
+	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	struct {
+		/**
+		 * @lock: Lock protecting the below fields.
+		 */
+		seqlock_t lock;
+		/**
+		 * @enabled: Reference count indicating number of listeners.
+		 */
+		unsigned int enabled;
+		/**
+		 * @active: Number of contexts currently scheduled in.
+		 */
+		unsigned int active;
+		/**
+		 * @enabled_at: Timestamp when busy stats were enabled.
+		 */
+		ktime_t enabled_at;
+		/**
+		 * @start: Timestamp of the last idle to active transition.
+		 *
+		 * Idle is defined as active == 0, active is active > 0.
+		 */
+		ktime_t start;
+		/**
+		 * @total: Total time this engine was busy.
+		 *
+		 * Accumulated time not counting the most recent block in cases
+		 * where engine is currently busy (active > 0).
+		 */
+		ktime_t total;
+	} stats;
+};
+
+static inline bool
+intel_engine_needs_cmd_parser(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_NEEDS_CMD_PARSER;
+}
+
+static inline bool
+intel_engine_supports_stats(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
+}
+
+static inline bool
+intel_engine_has_preemption(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
+}
+
+static inline bool
+intel_engine_has_semaphores(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
+}
+
+#define instdone_slice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
+
+#define instdone_subslice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])
+
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))
+
+#endif /* __INTEL_ENGINE_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 744220296653..77ec1bd4df5a 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -32,6 +32,7 @@
 #include "intel_guc_log.h"
 #include "intel_guc_reg.h"
 #include "intel_uc_fw.h"
+#include "i915_utils.h"
 #include "i915_vma.h"
 
 struct guc_preempt_work {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 13819de57ef9..5fe38f74b480 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -15,14 +15,11 @@
 #include "i915_request.h"
 #include "i915_selftest.h"
 #include "i915_timeline.h"
-#include "intel_device_info.h"
+#include "intel_engine_types.h"
 #include "intel_gpu_commands.h"
 #include "intel_workarounds.h"
 
 struct drm_printer;
-struct i915_sched_attr;
-
-#define I915_CMD_HASH_ORDER 9
 
 /* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
  * but keeps the logic simple. Indeed, the whole purpose of this macro is just
@@ -32,11 +29,6 @@ struct i915_sched_attr;
 #define CACHELINE_BYTES 64
 #define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(u32))
 
-struct intel_hw_status_page {
-	struct i915_vma *vma;
-	u32 *addr;
-};
-
 #define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
 #define I915_WRITE_TAIL(engine, val) I915_WRITE(RING_TAIL((engine)->mmio_base), val)
 
@@ -91,498 +83,6 @@ hangcheck_action_to_str(const enum intel_engine_hangcheck_action a)
 	return "unknown";
 }
 
-#define I915_MAX_SLICES	3
-#define I915_MAX_SUBSLICES 8
-
-#define instdone_slice_mask(dev_priv__) \
-	(IS_GEN(dev_priv__, 7) ? \
-	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
-
-#define instdone_subslice_mask(dev_priv__) \
-	(IS_GEN(dev_priv__, 7) ? \
-	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])
-
-#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
-	for ((slice__) = 0, (subslice__) = 0; \
-	     (slice__) < I915_MAX_SLICES; \
-	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
-	       (slice__) += ((subslice__) == 0)) \
-		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
-			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))
-
-struct intel_instdone {
-	u32 instdone;
-	/* The following exist only in the RCS engine */
-	u32 slice_common;
-	u32 sampler[I915_MAX_SLICES][I915_MAX_SUBSLICES];
-	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
-};
-
-struct intel_engine_hangcheck {
-	u64 acthd;
-	u32 last_seqno;
-	u32 next_seqno;
-	unsigned long action_timestamp;
-	struct intel_instdone instdone;
-};
-
-struct intel_ring {
-	struct i915_vma *vma;
-	void *vaddr;
-
-	struct i915_timeline *timeline;
-	struct list_head request_list;
-	struct list_head active_link;
-
-	u32 head;
-	u32 tail;
-	u32 emit;
-
-	u32 space;
-	u32 size;
-	u32 effective_size;
-};
-
-struct i915_gem_context;
-struct drm_i915_reg_table;
-
-/*
- * we use a single page to load ctx workarounds so all of these
- * values are referred in terms of dwords
- *
- * struct i915_wa_ctx_bb:
- *  offset: specifies batch starting position, also helpful in case
- *    if we want to have multiple batches at different offsets based on
- *    some criteria. It is not a requirement at the moment but provides
- *    an option for future use.
- *  size: size of the batch in DWORDS
- */
-struct i915_ctx_workarounds {
-	struct i915_wa_ctx_bb {
-		u32 offset;
-		u32 size;
-	} indirect_ctx, per_ctx;
-	struct i915_vma *vma;
-};
-
-struct i915_request;
-
-#define I915_MAX_VCS	4
-#define I915_MAX_VECS	2
-
-/*
- * Engine IDs definitions.
- * Keep instances of the same type engine together.
- */
-enum intel_engine_id {
-	RCS = 0,
-	BCS,
-	VCS,
-	VCS2,
-	VCS3,
-	VCS4,
-#define _VCS(n) (VCS + (n))
-	VECS,
-	VECS2
-#define _VECS(n) (VECS + (n))
-};
-
-struct st_preempt_hang {
-	struct completion completion;
-	unsigned int count;
-	bool inject_hang;
-};
-
-/**
- * struct intel_engine_execlists - execlist submission queue and port state
- *
- * The struct intel_engine_execlists represents the combined logical state of
- * driver and the hardware state for execlist mode of submission.
- */
-struct intel_engine_execlists {
-	/**
-	 * @tasklet: softirq tasklet for bottom handler
-	 */
-	struct tasklet_struct tasklet;
-
-	/**
-	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
-	 */
-	struct i915_priolist default_priolist;
-
-	/**
-	 * @no_priolist: priority lists disabled
-	 */
-	bool no_priolist;
-
-	/**
-	 * @submit_reg: gen-specific execlist submission register
-	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
-	 * the ExecList Submission Queue Contents register array for Gen11+
-	 */
-	u32 __iomem *submit_reg;
-
-	/**
-	 * @ctrl_reg: the enhanced execlists control register, used to load the
-	 * submit queue on the HW and to request preemptions to idle
-	 */
-	u32 __iomem *ctrl_reg;
-
-	/**
-	 * @port: execlist port states
-	 *
-	 * For each hardware ELSP (ExecList Submission Port) we keep
-	 * track of the last request and the number of times we submitted
-	 * that port to hw. We then count the number of times the hw reports
-	 * a context completion or preemption. As only one context can
-	 * be active on hw, we limit resubmission of context to port[0]. This
-	 * is called Lite Restore, of the context.
-	 */
-	struct execlist_port {
-		/**
-		 * @request_count: combined request and submission count
-		 */
-		struct i915_request *request_count;
-#define EXECLIST_COUNT_BITS 2
-#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
-#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
-#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
-#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
-#define port_set(p, packed) ((p)->request_count = (packed))
-#define port_isset(p) ((p)->request_count)
-#define port_index(p, execlists) ((p) - (execlists)->port)
-
-		/**
-		 * @context_id: context ID for port
-		 */
-		GEM_DEBUG_DECL(u32 context_id);
-
-#define EXECLIST_MAX_PORTS 2
-	} port[EXECLIST_MAX_PORTS];
-
-	/**
-	 * @active: is the HW active? We consider the HW as active after
-	 * submitting any context for execution and until we have seen the
-	 * last context completion event. After that, we do not expect any
-	 * more events until we submit, and so can park the HW.
-	 *
-	 * As we have a small number of different sources from which we feed
-	 * the HW, we track the state of each inside a single bitfield.
-	 */
-	unsigned int active;
-#define EXECLISTS_ACTIVE_USER 0
-#define EXECLISTS_ACTIVE_PREEMPT 1
-#define EXECLISTS_ACTIVE_HWACK 2
-
-	/**
-	 * @port_mask: number of execlist ports - 1
-	 */
-	unsigned int port_mask;
-
-	/**
-	 * @queue_priority_hint: Highest pending priority.
-	 *
-	 * When we add requests into the queue, or adjust the priority of
-	 * executing requests, we compute the maximum priority of those
-	 * pending requests. We can then use this value to determine if
-	 * we need to preempt the executing requests to service the queue.
-	 * However, since the we may have recorded the priority of an inflight
-	 * request we wanted to preempt but since completed, at the time of
-	 * dequeuing the priority hint may no longer may match the highest
-	 * available request priority.
-	 */
-	int queue_priority_hint;
-
-	/**
-	 * @queue: queue of requests, in priority lists
-	 */
-	struct rb_root_cached queue;
-
-	/**
-	 * @csb_write: control register for Context Switch buffer
-	 *
-	 * Note this register may be either mmio or HWSP shadow.
-	 */
-	u32 *csb_write;
-
-	/**
-	 * @csb_status: status array for Context Switch buffer
-	 *
-	 * Note these register may be either mmio or HWSP shadow.
-	 */
-	u32 *csb_status;
-
-	/**
-	 * @preempt_complete_status: expected CSB upon completing preemption
-	 */
-	u32 preempt_complete_status;
-
-	/**
-	 * @csb_head: context status buffer head
-	 */
-	u8 csb_head;
-
-	I915_SELFTEST_DECLARE(struct st_preempt_hang preempt_hang;)
-};
-
-#define INTEL_ENGINE_CS_MAX_NAME 8
-
-struct intel_engine_cs {
-	struct drm_i915_private *i915;
-	char name[INTEL_ENGINE_CS_MAX_NAME];
-
-	enum intel_engine_id id;
-	unsigned int hw_id;
-	unsigned int guc_id;
-	intel_engine_mask_t mask;
-
-	u8 uabi_class;
-
-	u8 class;
-	u8 instance;
-	u32 context_size;
-	u32 mmio_base;
-
-	struct intel_ring *buffer;
-
-	struct i915_timeline timeline;
-
-	struct drm_i915_gem_object *default_state;
-	void *pinned_default_state;
-
-	/* Rather than have every client wait upon all user interrupts,
-	 * with the herd waking after every interrupt and each doing the
-	 * heavyweight seqno dance, we delegate the task (of being the
-	 * bottom-half of the user interrupt) to the first client. After
-	 * every interrupt, we wake up one client, who does the heavyweight
-	 * coherent seqno read and either goes back to sleep (if incomplete),
-	 * or wakes up all the completed clients in parallel, before then
-	 * transferring the bottom-half status to the next client in the queue.
-	 *
-	 * Compared to walking the entire list of waiters in a single dedicated
-	 * bottom-half, we reduce the latency of the first waiter by avoiding
-	 * a context switch, but incur additional coherent seqno reads when
-	 * following the chain of request breadcrumbs. Since it is most likely
-	 * that we have a single client waiting on each seqno, then reducing
-	 * the overhead of waking that client is much preferred.
-	 */
-	struct intel_breadcrumbs {
-		spinlock_t irq_lock;
-		struct list_head signalers;
-
-		struct irq_work irq_work; /* for use from inside irq_lock */
-
-		unsigned int irq_enabled;
-
-		bool irq_armed;
-	} breadcrumbs;
-
-	struct intel_engine_pmu {
-		/**
-		 * @enable: Bitmask of enable sample events on this engine.
-		 *
-		 * Bits correspond to sample event types, for instance
-		 * I915_SAMPLE_QUEUED is bit 0 etc.
-		 */
-		u32 enable;
-		/**
-		 * @enable_count: Reference count for the enabled samplers.
-		 *
-		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
-		 */
-		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
-		/**
-		 * @sample: Counter values for sampling events.
-		 *
-		 * Our internal timer stores the current counters in this field.
-		 *
-		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
-		 */
-		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
-	} pmu;
-
-	/*
-	 * A pool of objects to use as shadow copies of client batch buffers
-	 * when the command parser is enabled. Prevents the client from
-	 * modifying the batch contents after software parsing.
-	 */
-	struct i915_gem_batch_pool batch_pool;
-
-	struct intel_hw_status_page status_page;
-	struct i915_ctx_workarounds wa_ctx;
-	struct i915_wa_list ctx_wa_list;
-	struct i915_wa_list wa_list;
-	struct i915_wa_list whitelist;
-
-	u32             irq_keep_mask; /* always keep these interrupts */
-	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
-	void		(*irq_enable)(struct intel_engine_cs *engine);
-	void		(*irq_disable)(struct intel_engine_cs *engine);
-
-	int		(*init_hw)(struct intel_engine_cs *engine);
-
-	struct {
-		void (*prepare)(struct intel_engine_cs *engine);
-		void (*reset)(struct intel_engine_cs *engine, bool stalled);
-		void (*finish)(struct intel_engine_cs *engine);
-	} reset;
-
-	void		(*park)(struct intel_engine_cs *engine);
-	void		(*unpark)(struct intel_engine_cs *engine);
-
-	void		(*set_default_submission)(struct intel_engine_cs *engine);
-
-	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
-					     struct i915_gem_context *ctx);
-
-	int		(*request_alloc)(struct i915_request *rq);
-	int		(*init_context)(struct i915_request *rq);
-
-	int		(*emit_flush)(struct i915_request *request, u32 mode);
-#define EMIT_INVALIDATE	BIT(0)
-#define EMIT_FLUSH	BIT(1)
-#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
-	int		(*emit_bb_start)(struct i915_request *rq,
-					 u64 offset, u32 length,
-					 unsigned int dispatch_flags);
-#define I915_DISPATCH_SECURE BIT(0)
-#define I915_DISPATCH_PINNED BIT(1)
-	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
-	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
-						 u32 *cs);
-	unsigned int	emit_fini_breadcrumb_dw;
-
-	/* Pass the request to the hardware queue (e.g. directly into
-	 * the legacy ringbuffer or to the end of an execlist).
-	 *
-	 * This is called from an atomic context with irqs disabled; must
-	 * be irq safe.
-	 */
-	void		(*submit_request)(struct i915_request *rq);
-
-	/*
-	 * Call when the priority on a request has changed and it and its
-	 * dependencies may need rescheduling. Note the request itself may
-	 * not be ready to run!
-	 */
-	void		(*schedule)(struct i915_request *request,
-				    const struct i915_sched_attr *attr);
-
-	/*
-	 * Cancel all requests on the hardware, or queued for execution.
-	 * This should only cancel the ready requests that have been
-	 * submitted to the engine (via the engine->submit_request callback).
-	 * This is called when marking the device as wedged.
-	 */
-	void		(*cancel_requests)(struct intel_engine_cs *engine);
-
-	void		(*cleanup)(struct intel_engine_cs *engine);
-
-	struct intel_engine_execlists execlists;
-
-	/* Contexts are pinned whilst they are active on the GPU. The last
-	 * context executed remains active whilst the GPU is idle - the
-	 * switch away and write to the context object only occurs on the
-	 * next execution.  Contexts are only unpinned on retirement of the
-	 * following request ensuring that we can always write to the object
-	 * on the context switch even after idling. Across suspend, we switch
-	 * to the kernel context and trash it as the save may not happen
-	 * before the hardware is powered down.
-	 */
-	struct intel_context *last_retired_context;
-
-	/* status_notifier: list of callbacks for context-switch changes */
-	struct atomic_notifier_head context_status_notifier;
-
-	struct intel_engine_hangcheck hangcheck;
-
-#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
-#define I915_ENGINE_SUPPORTS_STATS   BIT(1)
-#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
-#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-	unsigned int flags;
-
-	/*
-	 * Table of commands the command parser needs to know about
-	 * for this engine.
-	 */
-	DECLARE_HASHTABLE(cmd_hash, I915_CMD_HASH_ORDER);
-
-	/*
-	 * Table of registers allowed in commands that read/write registers.
-	 */
-	const struct drm_i915_reg_table *reg_tables;
-	int reg_table_count;
-
-	/*
-	 * Returns the bitmask for the length field of the specified command.
-	 * Return 0 for an unrecognized/invalid command.
-	 *
-	 * If the command parser finds an entry for a command in the engine's
-	 * cmd_tables, it gets the command's length based on the table entry.
-	 * If not, it calls this function to determine the per-engine length
-	 * field encoding for the command (i.e. different opcode ranges use
-	 * certain bits to encode the command length in the header).
-	 */
-	u32 (*get_cmd_length_mask)(u32 cmd_header);
-
-	struct {
-		/**
-		 * @lock: Lock protecting the below fields.
-		 */
-		seqlock_t lock;
-		/**
-		 * @enabled: Reference count indicating number of listeners.
-		 */
-		unsigned int enabled;
-		/**
-		 * @active: Number of contexts currently scheduled in.
-		 */
-		unsigned int active;
-		/**
-		 * @enabled_at: Timestamp when busy stats were enabled.
-		 */
-		ktime_t enabled_at;
-		/**
-		 * @start: Timestamp of the last idle to active transition.
-		 *
-		 * Idle is defined as active == 0, active is active > 0.
-		 */
-		ktime_t start;
-		/**
-		 * @total: Total time this engine was busy.
-		 *
-		 * Accumulated time not counting the most recent block in cases
-		 * where engine is currently busy (active > 0).
-		 */
-		ktime_t total;
-	} stats;
-};
-
-static inline bool
-intel_engine_needs_cmd_parser(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_NEEDS_CMD_PARSER;
-}
-
-static inline bool
-intel_engine_supports_stats(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
-}
-
-static inline bool
-intel_engine_has_preemption(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
-}
-
-static inline bool
-intel_engine_has_semaphores(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
-}
-
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
 static inline bool __execlists_need_preempt(int prio, int last)
diff --git a/drivers/gpu/drm/i915/intel_workarounds.h b/drivers/gpu/drm/i915/intel_workarounds.h
index 7c734714b05e..a1bf51c611a9 100644
--- a/drivers/gpu/drm/i915/intel_workarounds.h
+++ b/drivers/gpu/drm/i915/intel_workarounds.h
@@ -9,18 +9,7 @@
 
 #include <linux/slab.h>
 
-struct i915_wa {
-	i915_reg_t	  reg;
-	u32		  mask;
-	u32		  val;
-};
-
-struct i915_wa_list {
-	const char	*name;
-	struct i915_wa	*list;
-	unsigned int	count;
-	unsigned int	wa_count;
-};
+#include "intel_workarounds_types.h"
 
 static inline void intel_wa_list_free(struct i915_wa_list *wal)
 {
diff --git a/drivers/gpu/drm/i915/intel_workarounds_types.h b/drivers/gpu/drm/i915/intel_workarounds_types.h
new file mode 100644
index 000000000000..032a0dc49275
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_workarounds_types.h
@@ -0,0 +1,25 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2014-2018 Intel Corporation
+ */
+
+#ifndef __INTEL_WORKAROUNDS_TYPES_H__
+#define __INTEL_WORKAROUNDS_TYPES_H__
+
+#include "i915_reg.h"
+
+struct i915_wa {
+	i915_reg_t	  reg;
+	u32		  mask;
+	u32		  val;
+};
+
+struct i915_wa_list {
+	const char	*name;
+	struct i915_wa	*list;
+	unsigned int	count;
+	unsigned int	wa_count;
+};
+
+#endif /* __INTEL_WORKAROUNDS_TYPES_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (25 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 16:27   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 29/38] drm/i915: Move over to intel_context_lookup() Chris Wilson
                   ` (13 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

If we place a pointer to the engine specific intel_context_ops in the
engine itself, we can assign the ops pointer on initialising the
context, and then rely on it being set. This simplifies the code in
later patches.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c      |  1 +
 drivers/gpu/drm/i915/intel_engine_types.h    |  1 +
 drivers/gpu/drm/i915/intel_lrc.c             | 13 ++++++------
 drivers/gpu/drm/i915/intel_ringbuffer.c      | 22 +++++++++-----------
 drivers/gpu/drm/i915/selftests/mock_engine.c | 13 ++++++------
 5 files changed, 24 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d04fa649bc0e..04c24caf30d2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -377,6 +377,7 @@ intel_context_init(struct intel_context *ce,
 {
 	ce->gem_context = ctx;
 	ce->engine = engine;
+	ce->ops = engine->context;
 
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 5ec6e72d0ffb..546b790871ad 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -351,6 +351,7 @@ struct intel_engine_cs {
 
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
+	const struct intel_context_ops *context;
 	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
 					     struct i915_gem_context *ctx);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a2210f79dc67..4f6f09137662 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1381,11 +1381,6 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	return ERR_PTR(ret);
 }
 
-static const struct intel_context_ops execlists_context_ops = {
-	.unpin = execlists_context_unpin,
-	.destroy = execlists_context_destroy,
-};
-
 static struct intel_context *
 execlists_context_pin(struct intel_engine_cs *engine,
 		      struct i915_gem_context *ctx)
@@ -1399,11 +1394,14 @@ execlists_context_pin(struct intel_engine_cs *engine,
 		return ce;
 	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
 
-	ce->ops = &execlists_context_ops;
-
 	return __execlists_context_pin(engine, ctx, ce);
 }
 
+static const struct intel_context_ops execlists_context_ops = {
+	.unpin = execlists_context_unpin,
+	.destroy = execlists_context_destroy,
+};
+
 static int gen8_emit_init_breadcrumb(struct i915_request *rq)
 {
 	u32 *cs;
@@ -2347,6 +2345,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->reset.reset = execlists_reset;
 	engine->reset.finish = execlists_reset_finish;
 
+	engine->context = &execlists_context_ops;
 	engine->context_pin = execlists_context_pin;
 	engine->request_alloc = execlists_request_alloc;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 764dcc5d5856..848b68e090d5 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1348,7 +1348,7 @@ intel_ring_free(struct intel_ring *ring)
 	kfree(ring);
 }
 
-static void intel_ring_context_destroy(struct intel_context *ce)
+static void ring_context_destroy(struct intel_context *ce)
 {
 	GEM_BUG_ON(ce->pin_count);
 
@@ -1425,7 +1425,7 @@ static void __context_unpin(struct intel_context *ce)
 	i915_vma_unpin(vma);
 }
 
-static void intel_ring_context_unpin(struct intel_context *ce)
+static void ring_context_unpin(struct intel_context *ce)
 {
 	__context_unpin_ppgtt(ce->gem_context);
 	__context_unpin(ce);
@@ -1548,14 +1548,8 @@ __ring_context_pin(struct intel_engine_cs *engine,
 	return ERR_PTR(err);
 }
 
-static const struct intel_context_ops ring_context_ops = {
-	.unpin = intel_ring_context_unpin,
-	.destroy = intel_ring_context_destroy,
-};
-
 static struct intel_context *
-intel_ring_context_pin(struct intel_engine_cs *engine,
-		       struct i915_gem_context *ctx)
+ring_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct intel_context *ce = to_intel_context(ctx, engine);
 
@@ -1565,11 +1559,14 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
 		return ce;
 	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
 
-	ce->ops = &ring_context_ops;
-
 	return __ring_context_pin(engine, ctx, ce);
 }
 
+static const struct intel_context_ops ring_context_ops = {
+	.unpin = ring_context_unpin,
+	.destroy = ring_context_destroy,
+};
+
 static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 {
 	struct i915_timeline *timeline;
@@ -2275,7 +2272,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	engine->reset.reset = reset_ring;
 	engine->reset.finish = reset_finish;
 
-	engine->context_pin = intel_ring_context_pin;
+	engine->context = &ring_context_ops;
+	engine->context_pin = ring_context_pin;
 	engine->request_alloc = ring_request_alloc;
 
 	/*
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 8032a8a9542f..6c09e5162feb 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -137,11 +137,6 @@ static void mock_context_destroy(struct intel_context *ce)
 		mock_ring_free(ce->ring);
 }
 
-static const struct intel_context_ops mock_context_ops = {
-	.unpin = mock_context_unpin,
-	.destroy = mock_context_destroy,
-};
-
 static struct intel_context *
 mock_context_pin(struct intel_engine_cs *engine,
 		 struct i915_gem_context *ctx)
@@ -160,8 +155,6 @@ mock_context_pin(struct intel_engine_cs *engine,
 
 	mock_timeline_pin(ce->ring->timeline);
 
-	ce->ops = &mock_context_ops;
-
 	mutex_lock(&ctx->mutex);
 	list_add(&ce->active_link, &ctx->active_engines);
 	mutex_unlock(&ctx->mutex);
@@ -174,6 +167,11 @@ mock_context_pin(struct intel_engine_cs *engine,
 	return ERR_PTR(err);
 }
 
+static const struct intel_context_ops mock_context_ops = {
+	.unpin = mock_context_unpin,
+	.destroy = mock_context_destroy,
+};
+
 static int mock_request_alloc(struct i915_request *request)
 {
 	INIT_LIST_HEAD(&request->mock.link);
@@ -232,6 +230,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.mask = BIT(id);
 	engine->base.status_page.addr = (void *)(engine + 1);
 
+	engine->base.context = &mock_context_ops;
 	engine->base.context_pin = mock_context_pin;
 	engine->base.request_alloc = mock_request_alloc;
 	engine->base.emit_flush = mock_emit_flush;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 29/38] drm/i915: Move over to intel_context_lookup()
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (26 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 17:01   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops Chris Wilson
                   ` (12 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gvt/mmio_context.c       |   3 +-
 drivers/gpu/drm/i915/i915_gem.c               |   9 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |  78 ++++------
 drivers/gpu/drm/i915/i915_gem_context.h       |   1 -
 drivers/gpu/drm/i915/i915_gem_context_types.h |   7 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |   4 +-
 drivers/gpu/drm/i915/i915_perf.c              |   4 +-
 drivers/gpu/drm/i915/intel_context.c          | 139 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_context.h          |  37 ++++-
 drivers/gpu/drm/i915/intel_context_types.h    |   2 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
 drivers/gpu/drm/i915/intel_engine_types.h     |   5 +
 drivers/gpu/drm/i915/intel_guc_ads.c          |   4 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c              |  29 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  23 ++-
 drivers/gpu/drm/i915/selftests/mock_context.c |   7 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |   6 +-
 19 files changed, 271 insertions(+), 94 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_context.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 89105b1aaf12..d7292b349c0d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -86,6 +86,7 @@ i915-y += \
 	  i915_trace_points.o \
 	  i915_vma.o \
 	  intel_breadcrumbs.o \
+	  intel_context.o \
 	  intel_engine_cs.o \
 	  intel_hangcheck.o \
 	  intel_lrc.o \
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
index 7d84cfb9051a..442a74805129 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.c
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
@@ -492,7 +492,8 @@ static void switch_mmio(struct intel_vgpu *pre,
 			 * itself.
 			 */
 			if (mmio->in_context &&
-			    !is_inhibit_context(&s->shadow_ctx->__engine[ring_id]))
+			    !is_inhibit_context(intel_context_lookup(s->shadow_ctx,
+								     dev_priv->engine[ring_id])))
 				continue;
 
 			if (mmio->mask)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 90bb951b8c99..c6a25c3276ee 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4694,15 +4694,20 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 	}
 
 	for_each_engine(engine, i915, id) {
+		struct intel_context *ce;
 		struct i915_vma *state;
 		void *vaddr;
 
-		GEM_BUG_ON(to_intel_context(ctx, engine)->pin_count);
+		ce = intel_context_lookup(ctx, engine);
+		if (!ce)
+			continue;
 
-		state = to_intel_context(ctx, engine)->state;
+		state = ce->state;
 		if (!state)
 			continue;
 
+		GEM_BUG_ON(ce->pin_count);
+
 		/*
 		 * As we will hold a reference to the logical state, it will
 		 * not be torn down with the context, and importantly the
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 04c24caf30d2..60b17f6a727d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -238,7 +238,7 @@ static void release_hw_id(struct i915_gem_context *ctx)
 
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
-	unsigned int n;
+	struct intel_context *it, *n;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
@@ -249,12 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	kfree(ctx->engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
-		struct intel_context *ce = &ctx->__engine[n];
-
-		if (ce->ops)
-			ce->ops->destroy(ce);
-	}
+	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
+		it->ops->destroy(it);
 
 	if (ctx->timeline)
 		i915_timeline_put(ctx->timeline);
@@ -361,40 +357,11 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
-static void intel_context_retire(struct i915_active_request *active,
-				 struct i915_request *rq)
-{
-	struct intel_context *ce =
-		container_of(active, typeof(*ce), active_tracker);
-
-	intel_context_unpin(ce);
-}
-
-void
-intel_context_init(struct intel_context *ce,
-		   struct i915_gem_context *ctx,
-		   struct intel_engine_cs *engine)
-{
-	ce->gem_context = ctx;
-	ce->engine = engine;
-	ce->ops = engine->context;
-
-	INIT_LIST_HEAD(&ce->signal_link);
-	INIT_LIST_HEAD(&ce->signals);
-
-	/* Use the whole device by default */
-	ce->sseu = intel_device_default_sseu(ctx->i915);
-
-	i915_active_request_init(&ce->active_tracker,
-				 NULL, intel_context_retire);
-}
-
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
-	unsigned int n;
 	int ret;
 	int i;
 
@@ -409,8 +376,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	INIT_LIST_HEAD(&ctx->active_engines);
 	mutex_init(&ctx->mutex);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
-		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
+	ctx->hw_contexts = RB_ROOT;
+	spin_lock_init(&ctx->hw_contexts_lock);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
@@ -959,8 +926,6 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 		struct intel_ring *ring;
 		struct i915_request *rq;
 
-		GEM_BUG_ON(!to_intel_context(i915->kernel_context, engine));
-
 		rq = i915_request_alloc(engine, i915->kernel_context);
 		if (IS_ERR(rq))
 			return PTR_ERR(rq);
@@ -1195,9 +1160,13 @@ __i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
 				    struct intel_engine_cs *engine,
 				    struct intel_sseu sseu)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
 	int ret = 0;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
 	GEM_BUG_ON(engine->id != RCS);
 
@@ -1631,8 +1600,8 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
 	return ctx_setparam(data, &local.setparam);
 }
 
-static void clone_sseu(struct i915_gem_context *dst,
-		       struct i915_gem_context *src)
+static int clone_sseu(struct i915_gem_context *dst,
+		      struct i915_gem_context *src)
 {
 	const struct intel_sseu default_sseu =
 		intel_device_default_sseu(dst->i915);
@@ -1643,7 +1612,7 @@ static void clone_sseu(struct i915_gem_context *dst,
 		struct intel_context *ce;
 		struct intel_sseu sseu;
 
-		ce = to_intel_context(src, engine);
+		ce = intel_context_lookup(src, engine);
 		if (!ce)
 			continue;
 
@@ -1651,9 +1620,14 @@ static void clone_sseu(struct i915_gem_context *dst,
 		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
 			continue;
 
-		ce = to_intel_context(dst, engine);
+		ce = intel_context_instance(dst, engine);
+		if (IS_ERR(ce))
+			return PTR_ERR(ce);
+
 		ce->sseu = sseu;
 	}
+
+	return 0;
 }
 
 static int create_clone(struct i915_user_extension __user *ext, void *data)
@@ -1661,6 +1635,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 	struct drm_i915_gem_context_create_ext_clone local;
 	struct i915_gem_context *dst = data;
 	struct i915_gem_context *src;
+	int err;
 
 	if (copy_from_user(&local, ext, sizeof(local)))
 		return -EFAULT;
@@ -1683,8 +1658,11 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 	if (local.flags & I915_CONTEXT_CLONE_SCHED)
 		dst->sched = src->sched;
 
-	if (local.flags & I915_CONTEXT_CLONE_SSEU)
-		clone_sseu(dst, src);
+	if (local.flags & I915_CONTEXT_CLONE_SSEU) {
+		err = clone_sseu(dst, src);
+		if (err)
+			return err;
+	}
 
 	if (local.flags & I915_CONTEXT_CLONE_TIMELINE && src->timeline) {
 		if (dst->timeline)
@@ -1839,13 +1817,15 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (!engine)
 		return -EINVAL;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	/* Only use for mutex here is to serialize get_param and set_param. */
 	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
 	if (ret)
 		return ret;
 
-	ce = to_intel_context(ctx, engine);
-
 	user_sseu.slice_mask = ce->sseu.slice_mask;
 	user_sseu.subslice_mask = ce->sseu.subslice_mask;
 	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 110d5881c9de..5b0e423edf86 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -31,7 +31,6 @@
 #include "i915_scheduler.h"
 #include "intel_context.h"
 #include "intel_device_info.h"
-#include "intel_ringbuffer.h"
 
 struct drm_device;
 struct drm_file;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index 8baa7a5e7fdb..b1df720710f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -13,10 +13,10 @@
 #include <linux/kref.h>
 #include <linux/mutex.h>
 #include <linux/radix-tree.h>
+#include <linux/rbtree.h>
 #include <linux/rcupdate.h>
 #include <linux/types.h>
 
-#include "i915_gem.h" /* I915_NUM_ENGINES */
 #include "i915_scheduler.h"
 #include "intel_context_types.h"
 
@@ -146,8 +146,9 @@ struct i915_gem_context {
 
 	struct i915_sched_attr sched;
 
-	/** engine: per-engine logical HW state */
-	struct intel_context __engine[I915_NUM_ENGINES];
+	/** hw_contexts: per-engine logical HW state */
+	struct rb_root hw_contexts;
+	spinlock_t hw_contexts_lock;
 
 	/** ring_size: size for allocating the per-engine ring buffer */
 	u32 ring_size;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ff5810a932a8..5ea2e1c8b927 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -794,8 +794,8 @@ static int eb_wait_for_ring(const struct i915_execbuffer *eb)
 	 * keeping all of their resources pinned.
 	 */
 
-	ce = to_intel_context(eb->ctx, eb->engine);
-	if (!ce->ring) /* first use, assume empty! */
+	ce = intel_context_lookup(eb->ctx, eb->engine);
+	if (!ce || !ce->ring) /* first use, assume empty! */
 		return 0;
 
 	rq = __eb_wait_for_ring(ce->ring);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f969a0512465..ecca231ca83a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1740,11 +1740,11 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
-		struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce = intel_context_lookup(ctx, engine);
 		u32 *regs;
 
 		/* OA settings will be set upon first use */
-		if (!ce->state)
+		if (!ce || !ce->state)
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, map_type);
diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
new file mode 100644
index 000000000000..242b1b6ad253
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context.c
@@ -0,0 +1,139 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_gem_context.h"
+#include "intel_context.h"
+#include "intel_ringbuffer.h"
+
+struct intel_context *
+intel_context_lookup(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	struct intel_context *ce = NULL;
+	struct rb_node *p;
+
+	spin_lock(&ctx->hw_contexts_lock);
+	p = ctx->hw_contexts.rb_node;
+	while (p) {
+		struct intel_context *this =
+			rb_entry(p, struct intel_context, node);
+
+		if (this->engine == engine) {
+			GEM_BUG_ON(this->gem_context != ctx);
+			ce = this;
+			break;
+		}
+
+		if (this->engine < engine)
+			p = p->rb_right;
+		else
+			p = p->rb_left;
+	}
+	spin_unlock(&ctx->hw_contexts_lock);
+
+	return ce;
+}
+
+struct intel_context *
+__intel_context_insert(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context *ce)
+{
+	struct rb_node **p, *parent;
+	int err = 0;
+
+	spin_lock(&ctx->hw_contexts_lock);
+
+	parent = NULL;
+	p = &ctx->hw_contexts.rb_node;
+	while (*p) {
+		struct intel_context *this;
+
+		parent = *p;
+		this = rb_entry(parent, struct intel_context, node);
+
+		if (this->engine == engine) {
+			err = -EEXIST;
+			ce = this;
+			break;
+		}
+
+		if (this->engine < engine)
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+	if (!err) {
+		rb_link_node(&ce->node, parent, p);
+		rb_insert_color(&ce->node, &ctx->hw_contexts);
+	}
+
+	spin_unlock(&ctx->hw_contexts_lock);
+
+	return ce;
+}
+
+void __intel_context_remove(struct intel_context *ce)
+{
+	struct i915_gem_context *ctx = ce->gem_context;
+
+	spin_lock(&ctx->hw_contexts_lock);
+	rb_erase(&ce->node, &ctx->hw_contexts);
+	spin_unlock(&ctx->hw_contexts_lock);
+}
+
+struct intel_context *
+intel_context_instance(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine)
+{
+	struct intel_context *ce, *pos;
+
+	ce = intel_context_lookup(ctx, engine);
+	if (likely(ce))
+		return ce;
+
+	ce = kzalloc(sizeof(*ce), GFP_KERNEL);
+	if (!ce)
+		return ERR_PTR(-ENOMEM);
+
+	intel_context_init(ce, ctx, engine);
+
+	pos = __intel_context_insert(ctx, engine, ce);
+	if (unlikely(pos != ce)) /* Beaten! Use their HW context instead */
+		kfree(ce);
+
+	GEM_BUG_ON(intel_context_lookup(ctx, engine) != pos);
+	return pos;
+}
+
+static void intel_context_retire(struct i915_active_request *active,
+				 struct i915_request *rq)
+{
+	struct intel_context *ce =
+		container_of(active, typeof(*ce), active_tracker);
+
+	intel_context_unpin(ce);
+}
+
+void
+intel_context_init(struct intel_context *ce,
+		   struct i915_gem_context *ctx,
+		   struct intel_engine_cs *engine)
+{
+	ce->gem_context = ctx;
+	ce->engine = engine;
+	ce->ops = engine->context;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
+
+	/* Use the whole device by default */
+	ce->sseu = intel_device_default_sseu(ctx->i915);
+
+	i915_active_request_init(&ce->active_tracker,
+				 NULL, intel_context_retire);
+}
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
index dd947692bb0b..c3fffd9b8ae4 100644
--- a/drivers/gpu/drm/i915/intel_context.h
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -7,7 +7,6 @@
 #ifndef __INTEL_CONTEXT_H__
 #define __INTEL_CONTEXT_H__
 
-#include "i915_gem_context_types.h"
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
 
@@ -15,12 +14,36 @@ void intel_context_init(struct intel_context *ce,
 			struct i915_gem_context *ctx,
 			struct intel_engine_cs *engine);
 
-static inline struct intel_context *
-to_intel_context(struct i915_gem_context *ctx,
-		 const struct intel_engine_cs *engine)
-{
-	return &ctx->__engine[engine->id];
-}
+/**
+ * intel_context_lookup - Find the matching HW context for this (ctx, engine)
+ * @ctx - the parent GEM context
+ * @engine - the target HW engine
+ *
+ * May return NULL if the HW context hasn't been instantiated (i.e. unused).
+ */
+struct intel_context *
+intel_context_lookup(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine);
+
+/**
+ * intel_context_instance - Lookup or allocate the HW context for (ctx, engine)
+ * @ctx - the parent GEM context
+ * @engine - the target HW engine
+ *
+ * Returns the existing HW context for this pair of (GEM context, engine), or
+ * allocates and initialises a fresh context. Once allocated, the HW context
+ * remains resident until the GEM context is destroyed.
+ */
+struct intel_context *
+intel_context_instance(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine);
+
+struct intel_context *
+__intel_context_insert(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context *ce);
+void
+__intel_context_remove(struct intel_context *ce);
 
 static inline struct intel_context *
 intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
index 16e1306e9595..857f5c335324 100644
--- a/drivers/gpu/drm/i915/intel_context_types.h
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -8,6 +8,7 @@
 #define __INTEL_CONTEXT_TYPES__
 
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/types.h>
 
 #include "i915_active_types.h"
@@ -52,6 +53,7 @@ struct intel_context {
 	struct i915_active_request active_tracker;
 
 	const struct intel_context_ops *ops;
+	struct rb_node node;
 
 	/** sseu: Control eu/slice partitioning */
 	struct intel_sseu sseu;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 2559dcc25a6a..112301560745 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -644,7 +644,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
-	intel_context_unpin(to_intel_context(ctx, engine));
+	intel_context_unpin(intel_context_lookup(ctx, engine));
 }
 
 struct measure_breadcrumb {
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 546b790871ad..5dac6b439f95 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -231,6 +231,11 @@ struct intel_engine_execlists {
 	 */
 	u32 *csb_status;
 
+	/**
+	 * @preempt_context: the HW context for injecting preempt-to-idle
+	 */
+	struct intel_context *preempt_context;
+
 	/**
 	 * @preempt_complete_status: expected CSB upon completing preemption
 	 */
diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
index f0db62887f50..da220561ac41 100644
--- a/drivers/gpu/drm/i915/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/intel_guc_ads.c
@@ -121,8 +121,8 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	 * to find it. Note that we have to skip our header (1 page),
 	 * because our GuC shared data is there.
 	 */
-	kernel_ctx_vma = to_intel_context(dev_priv->kernel_context,
-					  dev_priv->engine[RCS])->state;
+	kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
+					      dev_priv->engine[RCS])->state;
 	blob->ads.golden_context_lrca =
 		intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 119d3326bb5e..4935e0555135 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -382,7 +382,7 @@ static void guc_stage_desc_init(struct intel_guc_client *client)
 	desc->db_id = client->doorbell_id;
 
 	for_each_engine_masked(engine, dev_priv, client->engines, tmp) {
-		struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce = intel_context_lookup(ctx, engine);
 		u32 guc_engine_id = engine->guc_id;
 		struct guc_execlist_context *lrc = &desc->lrc[guc_engine_id];
 
@@ -567,7 +567,7 @@ static void inject_preempt_context(struct work_struct *work)
 					     preempt_work[engine->id]);
 	struct intel_guc_client *client = guc->preempt_client;
 	struct guc_stage_desc *stage_desc = __get_stage_desc(client);
-	struct intel_context *ce = to_intel_context(client->owner, engine);
+	struct intel_context *ce = intel_context_lookup(client->owner, engine);
 	u32 data[7];
 
 	if (!ce->ring->emit) { /* recreate upon load/resume */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 4f6f09137662..0800f8edffeb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -622,8 +622,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 static void inject_preempt_context(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
-	struct intel_context *ce =
-		to_intel_context(engine->i915->preempt_context, engine);
+	struct intel_context *ce = execlists->preempt_context;
 	unsigned int n;
 
 	GEM_BUG_ON(execlists->preempt_complete_status !=
@@ -1231,19 +1230,24 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
-static void execlists_context_destroy(struct intel_context *ce)
+static void __execlists_context_fini(struct intel_context *ce)
 {
-	GEM_BUG_ON(ce->pin_count);
-
-	if (!ce->state)
-		return;
-
 	intel_ring_free(ce->ring);
 
 	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
 	i915_gem_object_put(ce->state->obj);
 }
 
+static void execlists_context_destroy(struct intel_context *ce)
+{
+	GEM_BUG_ON(ce->pin_count);
+
+	if (ce->state)
+		__execlists_context_fini(ce);
+
+	kfree(ce);
+}
+
 static void execlists_context_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine;
@@ -1385,7 +1389,11 @@ static struct intel_context *
 execlists_context_pin(struct intel_engine_cs *engine,
 		      struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!ctx->ppgtt);
@@ -2436,8 +2444,9 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 	execlists->preempt_complete_status = ~0u;
 	if (i915->preempt_context) {
 		struct intel_context *ce =
-			to_intel_context(i915->preempt_context, engine);
+			intel_context_lookup(i915->preempt_context, engine);
 
+		execlists->preempt_context = ce;
 		execlists->preempt_complete_status =
 			upper_32_bits(ce->lrc_desc);
 	}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 848b68e090d5..9002f7f6d32e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1348,15 +1348,20 @@ intel_ring_free(struct intel_ring *ring)
 	kfree(ring);
 }
 
+static void __ring_context_fini(struct intel_context *ce)
+{
+	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
+	i915_gem_object_put(ce->state->obj);
+}
+
 static void ring_context_destroy(struct intel_context *ce)
 {
 	GEM_BUG_ON(ce->pin_count);
 
-	if (!ce->state)
-		return;
+	if (ce->state)
+		__ring_context_fini(ce);
 
-	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
-	i915_gem_object_put(ce->state->obj);
+	kfree(ce);
 }
 
 static int __context_pin_ppgtt(struct i915_gem_context *ctx)
@@ -1551,7 +1556,11 @@ __ring_context_pin(struct intel_engine_cs *engine,
 static struct intel_context *
 ring_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 
@@ -1753,8 +1762,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		 * placeholder we use to flush other contexts.
 		 */
 		*cs++ = MI_SET_CONTEXT;
-		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
-							  engine)->state) |
+		*cs++ = i915_ggtt_offset(intel_context_lookup(i915->kernel_context,
+							      engine)->state) |
 			MI_MM_SPACE_GTT |
 			MI_RESTORE_INHIBIT;
 	}
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 5d0ff2293abc..1d6dc2fe36ab 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -30,7 +30,6 @@ mock_context(struct drm_i915_private *i915,
 	     const char *name)
 {
 	struct i915_gem_context *ctx;
-	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -41,15 +40,15 @@ mock_context(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&ctx->link);
 	ctx->i915 = i915;
 
+	ctx->hw_contexts = RB_ROOT;
+	spin_lock_init(&ctx->hw_contexts_lock);
+
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
 	INIT_LIST_HEAD(&ctx->active_engines);
 	mutex_init(&ctx->mutex);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
-		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
-
 	ret = i915_gem_context_pin_hw_id(ctx);
 	if (ret < 0)
 		goto err_handles;
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 6c09e5162feb..f1a80006289d 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -141,9 +141,13 @@ static struct intel_context *
 mock_context_pin(struct intel_engine_cs *engine,
 		 struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
 	int err = -ENOMEM;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
+
 	if (ce->pin_count++)
 		return ce;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (27 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 29/38] drm/i915: Move over to intel_context_lookup() Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 17:31   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine Chris Wilson
                   ` (11 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Push the intel_context pin callback down from intel_engine_cs onto the
context itself by virtue of having a central caller for
intel_context_pin() being able to lookup the intel_context itself.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_context.c         |  34 ++++++
 drivers/gpu/drm/i915/intel_context.h         |   7 +-
 drivers/gpu/drm/i915/intel_context_types.h   |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h    |   2 -
 drivers/gpu/drm/i915/intel_lrc.c             | 114 ++++++++-----------
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  45 ++------
 drivers/gpu/drm/i915/selftests/mock_engine.c |  32 +-----
 7 files changed, 98 insertions(+), 137 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
index 242b1b6ad253..7de02be0b552 100644
--- a/drivers/gpu/drm/i915/intel_context.c
+++ b/drivers/gpu/drm/i915/intel_context.c
@@ -110,6 +110,40 @@ intel_context_instance(struct i915_gem_context *ctx,
 	return pos;
 }
 
+struct intel_context *
+intel_context_pin(struct i915_gem_context *ctx,
+		  struct intel_engine_cs *engine)
+{
+	struct intel_context *ce;
+	int err;
+
+	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
+
+	if (unlikely(!ce->pin_count++)) {
+		err = ce->ops->pin(ce);
+		if (err)
+			goto err_unpin;
+
+		mutex_lock(&ctx->mutex);
+		list_add(&ce->active_link, &ctx->active_engines);
+		mutex_unlock(&ctx->mutex);
+
+		i915_gem_context_get(ctx);
+		GEM_BUG_ON(ce->gem_context != ctx);
+	}
+	GEM_BUG_ON(!ce->pin_count); /* no overflow! */
+
+	return ce;
+
+err_unpin:
+	ce->pin_count = 0;
+	return ERR_PTR(err);
+}
+
 static void intel_context_retire(struct i915_active_request *active,
 				 struct i915_request *rq)
 {
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
index c3fffd9b8ae4..aa34311a472e 100644
--- a/drivers/gpu/drm/i915/intel_context.h
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -45,11 +45,8 @@ __intel_context_insert(struct i915_gem_context *ctx,
 void
 __intel_context_remove(struct intel_context *ce);
 
-static inline struct intel_context *
-intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
-{
-	return engine->context_pin(engine, ctx);
-}
+struct intel_context *
+intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine);
 
 static inline void __intel_context_pin(struct intel_context *ce)
 {
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
index 857f5c335324..804fa93de853 100644
--- a/drivers/gpu/drm/i915/intel_context_types.h
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -19,6 +19,7 @@ struct intel_context;
 struct intel_ring;
 
 struct intel_context_ops {
+	int (*pin)(struct intel_context *ce);
 	void (*unpin)(struct intel_context *ce);
 	void (*destroy)(struct intel_context *ce);
 };
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 5dac6b439f95..aefc66605729 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -357,8 +357,6 @@ struct intel_engine_cs {
 	void		(*set_default_submission)(struct intel_engine_cs *engine);
 
 	const struct intel_context_ops *context;
-	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
-					     struct i915_gem_context *ctx);
 
 	int		(*request_alloc)(struct i915_request *rq);
 	int		(*init_context)(struct i915_request *rq);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0800f8edffeb..8dddea8e1c97 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -166,9 +166,8 @@
 
 #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
-static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
-					    struct intel_engine_cs *engine,
-					    struct intel_context *ce);
+static int execlists_context_deferred_alloc(struct intel_context *ce,
+					    struct intel_engine_cs *engine);
 static void execlists_init_reg_state(u32 *reg_state,
 				     struct intel_context *ce,
 				     struct intel_engine_cs *engine,
@@ -330,11 +329,10 @@ assert_priority_queue(const struct i915_request *prev,
  * engine info, SW context ID and SW counter need to form a unique number
  * (Context ID) per lrc.
  */
-static void
-intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
-				   struct intel_engine_cs *engine,
-				   struct intel_context *ce)
+static u64
+lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
 {
+	struct i915_gem_context *ctx = ce->gem_context;
 	u64 desc;
 
 	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (BIT(GEN8_CTX_ID_WIDTH)));
@@ -352,7 +350,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 	 * Consider updating oa_get_render_ctx_id in i915_perf.c when changing
 	 * anything below.
 	 */
-	if (INTEL_GEN(ctx->i915) >= 11) {
+	if (INTEL_GEN(engine->i915) >= 11) {
 		GEM_BUG_ON(ctx->hw_id >= BIT(GEN11_SW_CTX_ID_WIDTH));
 		desc |= (u64)ctx->hw_id << GEN11_SW_CTX_ID_SHIFT;
 								/* bits 37-47 */
@@ -369,7 +367,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 		desc |= (u64)ctx->hw_id << GEN8_CTX_ID_SHIFT;	/* bits 32-52 */
 	}
 
-	ce->lrc_desc = desc;
+	return desc;
 }
 
 static void unwind_wa_tail(struct i915_request *rq)
@@ -1284,7 +1282,7 @@ static void execlists_context_unpin(struct intel_context *ce)
 	i915_gem_context_put(ce->gem_context);
 }
 
-static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
+static int __context_pin(struct i915_vma *vma)
 {
 	unsigned int flags;
 	int err;
@@ -1307,11 +1305,14 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
 }
 
 static void
-__execlists_update_reg_state(struct intel_engine_cs *engine,
-			     struct intel_context *ce)
+__execlists_update_reg_state(struct intel_context *ce,
+			     struct intel_engine_cs *engine)
 {
-	u32 *regs = ce->lrc_reg_state;
 	struct intel_ring *ring = ce->ring;
+	u32 *regs = ce->lrc_reg_state;
+
+	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->head));
+	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
 
 	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(ring->vma);
 	regs[CTX_RING_HEAD + 1] = ring->head;
@@ -1324,25 +1325,26 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
 	}
 }
 
-static struct intel_context *
-__execlists_context_pin(struct intel_engine_cs *engine,
-			struct i915_gem_context *ctx,
-			struct intel_context *ce)
+static int
+__execlists_context_pin(struct intel_context *ce,
+			struct intel_engine_cs *engine)
 {
 	void *vaddr;
 	int ret;
 
-	ret = execlists_context_deferred_alloc(ctx, engine, ce);
+	GEM_BUG_ON(!ce->gem_context->ppgtt);
+
+	ret = execlists_context_deferred_alloc(ce, engine);
 	if (ret)
 		goto err;
 	GEM_BUG_ON(!ce->state);
 
-	ret = __context_pin(ctx, ce->state);
+	ret = __context_pin(ce->state);
 	if (ret)
 		goto err;
 
 	vaddr = i915_gem_object_pin_map(ce->state->obj,
-					i915_coherent_map_type(ctx->i915) |
+					i915_coherent_map_type(engine->i915) |
 					I915_MAP_OVERRIDE);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
@@ -1353,26 +1355,16 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 	if (ret)
 		goto unpin_map;
 
-	ret = i915_gem_context_pin_hw_id(ctx);
+	ret = i915_gem_context_pin_hw_id(ce->gem_context);
 	if (ret)
 		goto unpin_ring;
 
-	intel_lr_context_descriptor_update(ctx, engine, ce);
-
-	GEM_BUG_ON(!intel_ring_offset_valid(ce->ring, ce->ring->head));
-
+	ce->lrc_desc = lrc_descriptor(ce, engine);
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
-
-	__execlists_update_reg_state(engine, ce);
+	__execlists_update_reg_state(ce, engine);
 
 	ce->state->obj->pin_global++;
-
-	mutex_lock(&ctx->mutex);
-	list_add(&ce->active_link, &ctx->active_engines);
-	mutex_unlock(&ctx->mutex);
-
-	i915_gem_context_get(ctx);
-	return ce;
+	return 0;
 
 unpin_ring:
 	intel_ring_unpin(ce->ring);
@@ -1381,31 +1373,16 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 unpin_vma:
 	__i915_vma_unpin(ce->state);
 err:
-	ce->pin_count = 0;
-	return ERR_PTR(ret);
+	return ret;
 }
 
-static struct intel_context *
-execlists_context_pin(struct intel_engine_cs *engine,
-		      struct i915_gem_context *ctx)
+static int execlists_context_pin(struct intel_context *ce)
 {
-	struct intel_context *ce;
-
-	ce = intel_context_instance(ctx, engine);
-	if (IS_ERR(ce))
-		return ce;
-
-	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-	GEM_BUG_ON(!ctx->ppgtt);
-
-	if (likely(ce->pin_count++))
-		return ce;
-	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
-
-	return __execlists_context_pin(engine, ctx, ce);
+	return __execlists_context_pin(ce, ce->engine);
 }
 
 static const struct intel_context_ops execlists_context_ops = {
+	.pin = execlists_context_pin,
 	.unpin = execlists_context_unpin,
 	.destroy = execlists_context_destroy,
 };
@@ -2029,7 +2006,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	intel_ring_update_space(rq->ring);
 
 	execlists_init_reg_state(regs, rq->hw_context, engine, rq->ring);
-	__execlists_update_reg_state(engine, rq->hw_context);
+	__execlists_update_reg_state(rq->hw_context, engine);
 
 out_unlock:
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
@@ -2354,7 +2331,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
 	engine->reset.finish = execlists_reset_finish;
 
 	engine->context = &execlists_context_ops;
-	engine->context_pin = execlists_context_pin;
 	engine->request_alloc = execlists_request_alloc;
 
 	engine->emit_flush = gen8_emit_flush;
@@ -2831,9 +2807,16 @@ populate_lr_context(struct intel_context *ce,
 	return ret;
 }
 
-static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
-					    struct intel_engine_cs *engine,
-					    struct intel_context *ce)
+static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
+{
+	if (ctx->timeline)
+		return i915_timeline_get(ctx->timeline);
+	else
+		return i915_timeline_create(ctx->i915, ctx->name, NULL);
+}
+
+static int execlists_context_deferred_alloc(struct intel_context *ce,
+					    struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_object *ctx_obj;
 	struct i915_vma *vma;
@@ -2853,26 +2836,25 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 	 */
 	context_size += LRC_HEADER_PAGES * PAGE_SIZE;
 
-	ctx_obj = i915_gem_object_create(ctx->i915, context_size);
+	ctx_obj = i915_gem_object_create(engine->i915, context_size);
 	if (IS_ERR(ctx_obj))
 		return PTR_ERR(ctx_obj);
 
-	vma = i915_vma_instance(ctx_obj, &ctx->i915->ggtt.vm, NULL);
+	vma = i915_vma_instance(ctx_obj, &engine->i915->ggtt.vm, NULL);
 	if (IS_ERR(vma)) {
 		ret = PTR_ERR(vma);
 		goto error_deref_obj;
 	}
 
-	if (ctx->timeline)
-		timeline = i915_timeline_get(ctx->timeline);
-	else
-		timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
+	timeline = get_timeline(ce->gem_context);
 	if (IS_ERR(timeline)) {
 		ret = PTR_ERR(timeline);
 		goto error_deref_obj;
 	}
 
-	ring = intel_engine_create_ring(engine, timeline, ctx->ring_size);
+	ring = intel_engine_create_ring(engine,
+					timeline,
+					ce->gem_context->ring_size);
 	i915_timeline_put(timeline);
 	if (IS_ERR(ring)) {
 		ret = PTR_ERR(ring);
@@ -2917,7 +2899,7 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 		list_for_each_entry(ce, &ctx->active_engines, active_link) {
 			GEM_BUG_ON(!ce->ring);
 			intel_ring_reset(ce->ring, 0);
-			__execlists_update_reg_state(ce->engine, ce);
+			__execlists_update_reg_state(ce, ce->engine);
 		}
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9002f7f6d32e..6af2d393b7ff 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1508,11 +1508,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	return ERR_PTR(err);
 }
 
-static struct intel_context *
-__ring_context_pin(struct intel_engine_cs *engine,
-		   struct i915_gem_context *ctx,
-		   struct intel_context *ce)
+static int ring_context_pin(struct intel_context *ce)
 {
+	struct intel_engine_cs *engine = ce->engine;
 	int err;
 
 	/* One ringbuffer to rule them all */
@@ -1523,55 +1521,29 @@ __ring_context_pin(struct intel_engine_cs *engine,
 		struct i915_vma *vma;
 
 		vma = alloc_context_vma(engine);
-		if (IS_ERR(vma)) {
-			err = PTR_ERR(vma);
-			goto err;
-		}
+		if (IS_ERR(vma))
+			return PTR_ERR(vma);
 
 		ce->state = vma;
 	}
 
 	err = __context_pin(ce);
 	if (err)
-		goto err;
+		return err;
 
 	err = __context_pin_ppgtt(ce->gem_context);
 	if (err)
 		goto err_unpin;
 
-	mutex_lock(&ctx->mutex);
-	list_add(&ce->active_link, &ctx->active_engines);
-	mutex_unlock(&ctx->mutex);
-
-	i915_gem_context_get(ctx);
-	return ce;
+	return 0;
 
 err_unpin:
 	__context_unpin(ce);
-err:
-	ce->pin_count = 0;
-	return ERR_PTR(err);
-}
-
-static struct intel_context *
-ring_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
-{
-	struct intel_context *ce;
-
-	ce = intel_context_instance(ctx, engine);
-	if (IS_ERR(ce))
-		return ce;
-
-	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-
-	if (likely(ce->pin_count++))
-		return ce;
-	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
-
-	return __ring_context_pin(engine, ctx, ce);
+	return err;
 }
 
 static const struct intel_context_ops ring_context_ops = {
+	.pin = ring_context_pin,
 	.unpin = ring_context_unpin,
 	.destroy = ring_context_destroy,
 };
@@ -2282,7 +2254,6 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
 	engine->reset.finish = reset_finish;
 
 	engine->context = &ring_context_ops;
-	engine->context_pin = ring_context_pin;
 	engine->request_alloc = ring_request_alloc;
 
 	/*
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index f1a80006289d..5a5c3ba22061 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -137,41 +137,20 @@ static void mock_context_destroy(struct intel_context *ce)
 		mock_ring_free(ce->ring);
 }
 
-static struct intel_context *
-mock_context_pin(struct intel_engine_cs *engine,
-		 struct i915_gem_context *ctx)
+static int mock_context_pin(struct intel_context *ce)
 {
-	struct intel_context *ce;
-	int err = -ENOMEM;
-
-	ce = intel_context_instance(ctx, engine);
-	if (IS_ERR(ce))
-		return ce;
-
-	if (ce->pin_count++)
-		return ce;
-
 	if (!ce->ring) {
-		ce->ring = mock_ring(engine);
+		ce->ring = mock_ring(ce->engine);
 		if (!ce->ring)
-			goto err;
+			return -ENOMEM;
 	}
 
 	mock_timeline_pin(ce->ring->timeline);
-
-	mutex_lock(&ctx->mutex);
-	list_add(&ce->active_link, &ctx->active_engines);
-	mutex_unlock(&ctx->mutex);
-
-	i915_gem_context_get(ctx);
-	return ce;
-
-err:
-	ce->pin_count = 0;
-	return ERR_PTR(err);
+	return 0;
 }
 
 static const struct intel_context_ops mock_context_ops = {
+	.pin = mock_context_pin,
 	.unpin = mock_context_unpin,
 	.destroy = mock_context_destroy,
 };
@@ -235,7 +214,6 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.status_page.addr = (void *)(engine + 1);
 
 	engine->base.context = &mock_context_ops;
-	engine->base.context_pin = mock_context_pin;
 	engine->base.request_alloc = mock_request_alloc;
 	engine->base.emit_flush = mock_emit_flush;
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (28 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-05 18:07   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management Chris Wilson
                   ` (10 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Each engine acquires a pin on the kernel contexts (normal and preempt)
so that the logical state is always available on demand. Keep track of
each engines pin by storing the returned pointer on the engine for quick
access.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_engine_cs.c       | 65 ++++++++++----------
 drivers/gpu/drm/i915/intel_engine_types.h    |  8 +--
 drivers/gpu/drm/i915/intel_guc_ads.c         |  3 +-
 drivers/gpu/drm/i915/intel_lrc.c             | 13 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c      |  3 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c |  5 +-
 6 files changed, 45 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 112301560745..0f311752ee08 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -641,12 +641,6 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 		i915->caps.scheduler = 0;
 }
 
-static void __intel_context_unpin(struct i915_gem_context *ctx,
-				  struct intel_engine_cs *engine)
-{
-	intel_context_unpin(intel_context_lookup(ctx, engine));
-}
-
 struct measure_breadcrumb {
 	struct i915_request rq;
 	struct i915_timeline timeline;
@@ -697,6 +691,20 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 	return dw;
 }
 
+static int pin_context(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context **out)
+{
+	struct intel_context *ce;
+
+	ce = intel_context_pin(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	*out = ce;
+	return 0;
+}
+
 /**
  * intel_engines_init_common - initialize cengine state which might require hw access
  * @engine: Engine to initialize.
@@ -711,11 +719,8 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 int intel_engine_init_common(struct intel_engine_cs *engine)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct intel_context *ce;
 	int ret;
 
-	engine->set_default_submission(engine);
-
 	/* We may need to do things with the shrinker which
 	 * require us to immediately switch back to the default
 	 * context. This can cause a problem as pinning the
@@ -723,36 +728,34 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
 	 * be available. To avoid this we always pin the default
 	 * context.
 	 */
-	ce = intel_context_pin(i915->kernel_context, engine);
-	if (IS_ERR(ce))
-		return PTR_ERR(ce);
+	ret = pin_context(i915->kernel_context, engine,
+			  &engine->kernel_context);
+	if (ret)
+		return ret;
 
 	/*
 	 * Similarly the preempt context must always be available so that
-	 * we can interrupt the engine at any time.
+	 * we can interrupt the engine at any time. However, as preemption
+	 * is optional, we allow it to fail.
 	 */
-	if (i915->preempt_context) {
-		ce = intel_context_pin(i915->preempt_context, engine);
-		if (IS_ERR(ce)) {
-			ret = PTR_ERR(ce);
-			goto err_unpin_kernel;
-		}
-	}
+	if (i915->preempt_context)
+		pin_context(i915->preempt_context, engine,
+			    &engine->preempt_context);
 
 	ret = measure_breadcrumb_dw(engine);
 	if (ret < 0)
-		goto err_unpin_preempt;
+		goto err_unpin;
 
 	engine->emit_fini_breadcrumb_dw = ret;
 
-	return 0;
+	engine->set_default_submission(engine);
 
-err_unpin_preempt:
-	if (i915->preempt_context)
-		__intel_context_unpin(i915->preempt_context, engine);
+	return 0;
 
-err_unpin_kernel:
-	__intel_context_unpin(i915->kernel_context, engine);
+err_unpin:
+	if (engine->preempt_context)
+		intel_context_unpin(engine->preempt_context);
+	intel_context_unpin(engine->kernel_context);
 	return ret;
 }
 
@@ -765,8 +768,6 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
  */
 void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 {
-	struct drm_i915_private *i915 = engine->i915;
-
 	cleanup_status_page(engine);
 
 	intel_engine_fini_breadcrumbs(engine);
@@ -776,9 +777,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
 	if (engine->default_state)
 		i915_gem_object_put(engine->default_state);
 
-	if (i915->preempt_context)
-		__intel_context_unpin(i915->preempt_context, engine);
-	__intel_context_unpin(i915->kernel_context, engine);
+	if (engine->preempt_context)
+		intel_context_unpin(engine->preempt_context);
+	intel_context_unpin(engine->kernel_context);
 
 	i915_timeline_fini(&engine->timeline);
 
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index aefc66605729..85158e119638 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -231,11 +231,6 @@ struct intel_engine_execlists {
 	 */
 	u32 *csb_status;
 
-	/**
-	 * @preempt_context: the HW context for injecting preempt-to-idle
-	 */
-	struct intel_context *preempt_context;
-
 	/**
 	 * @preempt_complete_status: expected CSB upon completing preemption
 	 */
@@ -271,6 +266,9 @@ struct intel_engine_cs {
 
 	struct i915_timeline timeline;
 
+	struct intel_context *kernel_context; /* pinned */
+	struct intel_context *preempt_context; /* pinned; optional */
+
 	struct drm_i915_gem_object *default_state;
 	void *pinned_default_state;
 
diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
index da220561ac41..5fa169dadd67 100644
--- a/drivers/gpu/drm/i915/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/intel_guc_ads.c
@@ -121,8 +121,7 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	 * to find it. Note that we have to skip our header (1 page),
 	 * because our GuC shared data is there.
 	 */
-	kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
-					      dev_priv->engine[RCS])->state;
+	kernel_ctx_vma = dev_priv->engine[RCS]->kernel_context->state;
 	blob->ads.golden_context_lrca =
 		intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8dddea8e1c97..473cfea4c503 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -620,7 +620,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 static void inject_preempt_context(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
-	struct intel_context *ce = execlists->preempt_context;
+	struct intel_context *ce = engine->preempt_context;
 	unsigned int n;
 
 	GEM_BUG_ON(execlists->preempt_complete_status !=
@@ -2316,7 +2316,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 
 	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
-	if (engine->i915->preempt_context)
+	if (engine->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
 }
 
@@ -2418,14 +2418,9 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 	}
 
 	execlists->preempt_complete_status = ~0u;
-	if (i915->preempt_context) {
-		struct intel_context *ce =
-			intel_context_lookup(i915->preempt_context, engine);
-
-		execlists->preempt_context = ce;
+	if (engine->preempt_context)
 		execlists->preempt_complete_status =
-			upper_32_bits(ce->lrc_desc);
-	}
+			upper_32_bits(engine->preempt_context->lrc_desc);
 
 	execlists->csb_status =
 		&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6af2d393b7ff..4f137d0a5316 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1734,8 +1734,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		 * placeholder we use to flush other contexts.
 		 */
 		*cs++ = MI_SET_CONTEXT;
-		*cs++ = i915_ggtt_offset(intel_context_lookup(i915->kernel_context,
-							      engine)->state) |
+		*cs++ = i915_ggtt_offset(engine->kernel_context->state) |
 			MI_MM_SPACE_GTT |
 			MI_RESTORE_INHIBIT;
 	}
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 5a5c3ba22061..6434e968ecb0 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -233,7 +233,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	timer_setup(&engine->hw_delay, hw_delay_complete, 0);
 	INIT_LIST_HEAD(&engine->hw_queue);
 
-	if (IS_ERR(intel_context_pin(i915->kernel_context, &engine->base)))
+	if (pin_context(i915->kernel_context, &engine->base,
+			&engine->base.kernel_context))
 		goto err_breadcrumbs;
 
 	return &engine->base;
@@ -276,7 +277,7 @@ void mock_engine_free(struct intel_engine_cs *engine)
 	if (ce)
 		intel_context_unpin(ce);
 
-	__intel_context_unpin(engine->i915->kernel_context, engine);
+	intel_context_unpin(engine->kernel_context);
 
 	intel_engine_fini_breadcrumbs(engine);
 	i915_timeline_fini(&engine->timeline);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (29 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-06  9:45   ` Tvrtko Ursulin
  2019-03-01 14:03 ` [PATCH 33/38] drm/i915: Load balancing across a virtual engine Chris Wilson
                   ` (9 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Introduce a mutex to start locking the HW contexts independently of
struct_mutex, with a view to reducing the coarse struct_mutex. The
intel_context.pin_mutex is used to guard the transition to and from being
pinned on the gpu, and so is required before starting to build any
request, and released when the execution is completed.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c               |  2 +-
 drivers/gpu/drm/i915/i915_gem_context.c       | 76 +++++++------------
 drivers/gpu/drm/i915/intel_context.c          | 56 ++++++++++++--
 drivers/gpu/drm/i915/intel_context.h          | 38 ++++++----
 drivers/gpu/drm/i915/intel_context_types.h    |  8 +-
 drivers/gpu/drm/i915/intel_lrc.c              |  4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  4 +-
 .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  2 +-
 9 files changed, 110 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c6a25c3276ee..4babea524fc8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4706,7 +4706,7 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 		if (!state)
 			continue;
 
-		GEM_BUG_ON(ce->pin_count);
+		GEM_BUG_ON(intel_context_is_pinned(ce));
 
 		/*
 		 * As we will hold a reference to the logical state, it will
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 60b17f6a727d..321b26e302e5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1096,23 +1096,27 @@ static int gen8_emit_rpcs_config(struct i915_request *rq,
 }
 
 static int
-gen8_modify_rpcs_gpu(struct intel_context *ce,
-		     struct intel_engine_cs *engine,
-		     struct intel_sseu sseu)
+gen8_modify_rpcs_gpu(struct intel_context *ce, struct intel_sseu sseu)
 {
-	struct drm_i915_private *i915 = engine->i915;
+	struct drm_i915_private *i915 = ce->engine->i915;
 	struct i915_request *rq, *prev;
 	intel_wakeref_t wakeref;
 	int ret;
 
-	GEM_BUG_ON(!ce->pin_count);
+	lockdep_assert_held(&ce->pin_mutex);
 
-	lockdep_assert_held(&i915->drm.struct_mutex);
+	/*
+	 * If context is not idle we have to submit an ordered request to modify
+	 * its context image via the kernel context. Pristine and idle contexts
+	 * will be configured on pinning.
+	 */
+	if (!intel_context_is_pinned(ce))
+		return 0;
 
 	/* Submitting requests etc needs the hw awake. */
 	wakeref = intel_runtime_pm_get(i915);
 
-	rq = i915_request_alloc(engine, i915->kernel_context);
+	rq = i915_request_alloc(ce->engine, i915->kernel_context);
 	if (IS_ERR(rq)) {
 		ret = PTR_ERR(rq);
 		goto out_put;
@@ -1156,53 +1160,30 @@ gen8_modify_rpcs_gpu(struct intel_context *ce,
 }
 
 static int
-__i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
-				    struct intel_engine_cs *engine,
-				    struct intel_sseu sseu)
+i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
+				  struct intel_engine_cs *engine,
+				  struct intel_sseu sseu)
 {
 	struct intel_context *ce;
 	int ret = 0;
 
-	ce = intel_context_instance(ctx, engine);
-	if (IS_ERR(ce))
-		return PTR_ERR(ce);
-
 	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
 	GEM_BUG_ON(engine->id != RCS);
 
+	ce = intel_context_pin_lock(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	/* Nothing to do if unmodified. */
 	if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
-		return 0;
-
-	/*
-	 * If context is not idle we have to submit an ordered request to modify
-	 * its context image via the kernel context. Pristine and idle contexts
-	 * will be configured on pinning.
-	 */
-	if (ce->pin_count)
-		ret = gen8_modify_rpcs_gpu(ce, engine, sseu);
+		goto unlock;
 
+	ret = gen8_modify_rpcs_gpu(ce, sseu);
 	if (!ret)
 		ce->sseu = sseu;
 
-	return ret;
-}
-
-static int
-i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
-				  struct intel_engine_cs *engine,
-				  struct intel_sseu sseu)
-{
-	int ret;
-
-	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
-	ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
-
-	mutex_unlock(&ctx->i915->drm.struct_mutex);
-
+unlock:
+	intel_context_pin_unlock(ce);
 	return ret;
 }
 
@@ -1620,11 +1601,12 @@ static int clone_sseu(struct i915_gem_context *dst,
 		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
 			continue;
 
-		ce = intel_context_instance(dst, engine);
+		ce = intel_context_pin_lock(dst, engine);
 		if (IS_ERR(ce))
 			return PTR_ERR(ce);
 
 		ce->sseu = sseu;
+		intel_context_pin_unlock(ce);
 	}
 
 	return 0;
@@ -1790,7 +1772,6 @@ static int get_sseu(struct i915_gem_context *ctx,
 	struct intel_engine_cs *engine;
 	struct intel_context *ce;
 	unsigned long lookup;
-	int ret;
 
 	if (args->size == 0)
 		goto out;
@@ -1817,21 +1798,16 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (!engine)
 		return -EINVAL;
 
-	ce = intel_context_instance(ctx, engine);
+	ce = intel_context_pin_lock(ctx, engine); /* serialise with set_sseu */
 	if (IS_ERR(ce))
 		return PTR_ERR(ce);
 
-	/* Only use for mutex here is to serialize get_param and set_param. */
-	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
 	user_sseu.slice_mask = ce->sseu.slice_mask;
 	user_sseu.subslice_mask = ce->sseu.subslice_mask;
 	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
 	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
 
-	mutex_unlock(&ctx->i915->drm.struct_mutex);
+	intel_context_pin_unlock(ce);
 
 	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
 			 sizeof(user_sseu)))
diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
index 7de02be0b552..7c4171114e5b 100644
--- a/drivers/gpu/drm/i915/intel_context.c
+++ b/drivers/gpu/drm/i915/intel_context.c
@@ -86,7 +86,7 @@ void __intel_context_remove(struct intel_context *ce)
 	spin_unlock(&ctx->hw_contexts_lock);
 }
 
-struct intel_context *
+static struct intel_context *
 intel_context_instance(struct i915_gem_context *ctx,
 		       struct intel_engine_cs *engine)
 {
@@ -110,6 +110,21 @@ intel_context_instance(struct i915_gem_context *ctx,
 	return pos;
 }
 
+struct intel_context *
+intel_context_pin_lock(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine)
+	__acquires(ce->pin_mutex)
+{
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
+
+	mutex_lock(&ce->pin_mutex);
+	return ce;
+}
+
 struct intel_context *
 intel_context_pin(struct i915_gem_context *ctx,
 		  struct intel_engine_cs *engine)
@@ -117,16 +132,20 @@ intel_context_pin(struct i915_gem_context *ctx,
 	struct intel_context *ce;
 	int err;
 
-	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
-
 	ce = intel_context_instance(ctx, engine);
 	if (IS_ERR(ce))
 		return ce;
 
-	if (unlikely(!ce->pin_count++)) {
+	if (likely(atomic_inc_not_zero(&ce->pin_count)))
+		return ce;
+
+	if (mutex_lock_interruptible(&ce->pin_mutex))
+		return ERR_PTR(-EINTR);
+
+	if (likely(!atomic_read(&ce->pin_count))) {
 		err = ce->ops->pin(ce);
 		if (err)
-			goto err_unpin;
+			goto err;
 
 		mutex_lock(&ctx->mutex);
 		list_add(&ce->active_link, &ctx->active_engines);
@@ -134,16 +153,35 @@ intel_context_pin(struct i915_gem_context *ctx,
 
 		i915_gem_context_get(ctx);
 		GEM_BUG_ON(ce->gem_context != ctx);
+
+		smp_mb__before_atomic();
 	}
-	GEM_BUG_ON(!ce->pin_count); /* no overflow! */
 
+	atomic_inc(&ce->pin_count);
+	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
+
+	mutex_unlock(&ce->pin_mutex);
 	return ce;
 
-err_unpin:
-	ce->pin_count = 0;
+err:
+	mutex_unlock(&ce->pin_mutex);
 	return ERR_PTR(err);
 }
 
+void intel_context_unpin(struct intel_context *ce)
+{
+	if (likely(atomic_add_unless(&ce->pin_count, -1, 1)))
+		return;
+
+	/* We may be called from inside intel_context_pin() to evict another */
+	mutex_lock_nested(&ce->pin_mutex, SINGLE_DEPTH_NESTING);
+
+	if (likely(atomic_dec_and_test(&ce->pin_count)))
+		ce->ops->unpin(ce);
+
+	mutex_unlock(&ce->pin_mutex);
+}
+
 static void intel_context_retire(struct i915_active_request *active,
 				 struct i915_request *rq)
 {
@@ -165,6 +203,8 @@ intel_context_init(struct intel_context *ce,
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
 
+	mutex_init(&ce->pin_mutex);
+
 	/* Use the whole device by default */
 	ce->sseu = intel_device_default_sseu(ctx->i915);
 
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
index aa34311a472e..3657f3a34f8e 100644
--- a/drivers/gpu/drm/i915/intel_context.h
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -7,6 +7,8 @@
 #ifndef __INTEL_CONTEXT_H__
 #define __INTEL_CONTEXT_H__
 
+#include <linux/lockdep.h>
+
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
 
@@ -26,18 +28,30 @@ intel_context_lookup(struct i915_gem_context *ctx,
 		     struct intel_engine_cs *engine);
 
 /**
- * intel_context_instance - Lookup or allocate the HW context for (ctx, engine)
+ * intel_context_pin_lock - Stablises the 'pinned' status of the HW context
  * @ctx - the parent GEM context
  * @engine - the target HW engine
  *
- * Returns the existing HW context for this pair of (GEM context, engine), or
- * allocates and initialises a fresh context. Once allocated, the HW context
- * remains resident until the GEM context is destroyed.
+ * Acquire a lock on the pinned status of the HW context, such that the context
+ * can neither be bound to the GPU or unbound whilst the lock is held, i.e.
+ * intel_context_is_pinned() remains stable.
  */
 struct intel_context *
-intel_context_instance(struct i915_gem_context *ctx,
+intel_context_pin_lock(struct i915_gem_context *ctx,
 		       struct intel_engine_cs *engine);
 
+static inline bool
+intel_context_is_pinned(struct intel_context *ce)
+{
+	return atomic_read(&ce->pin_count);
+}
+
+static inline void intel_context_pin_unlock(struct intel_context *ce)
+__releases(ce->pin_mutex)
+{
+	mutex_unlock(&ce->pin_mutex);
+}
+
 struct intel_context *
 __intel_context_insert(struct i915_gem_context *ctx,
 		       struct intel_engine_cs *engine,
@@ -50,18 +64,10 @@ intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine);
 
 static inline void __intel_context_pin(struct intel_context *ce)
 {
-	GEM_BUG_ON(!ce->pin_count);
-	ce->pin_count++;
+	GEM_BUG_ON(!intel_context_is_pinned(ce));
+	atomic_inc(&ce->pin_count);
 }
 
-static inline void intel_context_unpin(struct intel_context *ce)
-{
-	GEM_BUG_ON(!ce->pin_count);
-	if (--ce->pin_count)
-		return;
-
-	GEM_BUG_ON(!ce->ops);
-	ce->ops->unpin(ce);
-}
+void intel_context_unpin(struct intel_context *ce);
 
 #endif /* __INTEL_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
index 804fa93de853..6dc9b4b9067b 100644
--- a/drivers/gpu/drm/i915/intel_context_types.h
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -8,6 +8,7 @@
 #define __INTEL_CONTEXT_TYPES__
 
 #include <linux/list.h>
+#include <linux/mutex.h>
 #include <linux/rbtree.h>
 #include <linux/types.h>
 
@@ -38,14 +39,19 @@ struct intel_context {
 	struct i915_gem_context *gem_context;
 	struct intel_engine_cs *engine;
 	struct intel_engine_cs *active;
+
 	struct list_head active_link;
 	struct list_head signal_link;
 	struct list_head signals;
+
 	struct i915_vma *state;
 	struct intel_ring *ring;
+
 	u32 *lrc_reg_state;
 	u64 lrc_desc;
-	int pin_count;
+
+	atomic_t pin_count;
+	struct mutex pin_mutex; /* guards pinning and associated on-gpuing */
 
 	/**
 	 * active_tracker: Active tracker for the external rq activity
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 473cfea4c503..e69b52c4d3ce 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1238,7 +1238,7 @@ static void __execlists_context_fini(struct intel_context *ce)
 
 static void execlists_context_destroy(struct intel_context *ce)
 {
-	GEM_BUG_ON(ce->pin_count);
+	GEM_BUG_ON(intel_context_is_pinned(ce));
 
 	if (ce->state)
 		__execlists_context_fini(ce);
@@ -1476,7 +1476,7 @@ static int execlists_request_alloc(struct i915_request *request)
 {
 	int ret;
 
-	GEM_BUG_ON(!request->hw_context->pin_count);
+	GEM_BUG_ON(!intel_context_is_pinned(request->hw_context));
 
 	/*
 	 * Flush enough space to reduce the likelihood of waiting after
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4f137d0a5316..a2bdf188f3f6 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1356,7 +1356,7 @@ static void __ring_context_fini(struct intel_context *ce)
 
 static void ring_context_destroy(struct intel_context *ce)
 {
-	GEM_BUG_ON(ce->pin_count);
+	GEM_BUG_ON(intel_context_is_pinned(ce));
 
 	if (ce->state)
 		__ring_context_fini(ce);
@@ -1917,7 +1917,7 @@ static int ring_request_alloc(struct i915_request *request)
 {
 	int ret;
 
-	GEM_BUG_ON(!request->hw_context->pin_count);
+	GEM_BUG_ON(!intel_context_is_pinned(request->hw_context));
 	GEM_BUG_ON(request->timeline->has_initial_breadcrumb);
 
 	/*
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 15f016ca8e0d..c6c14d1b2ce8 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1029,7 +1029,7 @@ __sseu_test(struct drm_i915_private *i915,
 	if (ret)
 		goto out_context;
 
-	ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
+	ret = i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
 	if (ret)
 		goto out_spin;
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 6434e968ecb0..c032c0cf20e6 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -131,7 +131,7 @@ static void mock_context_unpin(struct intel_context *ce)
 
 static void mock_context_destroy(struct intel_context *ce)
 {
-	GEM_BUG_ON(ce->pin_count);
+	GEM_BUG_ON(intel_context_is_pinned(ce));
 
 	if (ce->ring)
 		mock_ring_free(ce->ring);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 33/38] drm/i915: Load balancing across a virtual engine
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (30 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management Chris Wilson
@ 2019-03-01 14:03 ` Chris Wilson
  2019-03-01 14:04 ` [PATCH 34/38] drm/i915: Extend execution fence to support a callback Chris Wilson
                   ` (8 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:03 UTC (permalink / raw)
  To: intel-gfx

Having allowed the user to define a set of engines that they will want
to only use, we go one step further and allow them to bind those engines
into a single virtual instance. Submitting a batch to the virtual engine
will then forward it to any one of the set in a manner as best to
distribute load.  The virtual engine has a single timeline across all
engines (it operates as a single queue), so it is not able to concurrently
run batches across multiple engines by itself; that is left up to the user
to submit multiple concurrent batches to multiple queues. Multiple users
will be load balanced across the system.

The mechanism used for load balancing in this patch is a late greedy
balancer. When a request is ready for execution, it is added to each
engine's queue, and when an engine is ready for its next request it
claims it from the virtual engine. The first engine to do so, wins, i.e.
the request is executed at the earliest opportunity (idle moment) in the
system.

As not all HW is created equal, the user is still able to skip the
virtual engine and execute the batch on a specific engine, all within the
same queue. It will then be executed in order on the correct engine,
with execution on other virtual engines being moved away due to the load
detection.

A couple of areas for potential improvement left!

- The virtual engine always take priority over equal-priority tasks.
Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
and hopefully the virtual and real engines are not then congested (i.e.
all work is via virtual engines, or all work is to the real engine).

- We require the breadcrumb irq around every virtual engine request. For
normal engines, we eliminate the need for the slow round trip via
interrupt by using the submit fence and queueing in order. For virtual
engines, we have to allow any job to transfer to a new ring, and cannot
coalesce the submissions, so require the completion fence instead,
forcing the persistent use of interrupts.

- We only drip feed single requests through each virtual engine and onto
the physical engines, even if there was enough work to fill all ELSP,
leaving small stalls with an idle CS event at the end of every request.
Could we be greedy and fill both slots? Being lazy is virtuous for load
distribution on less-than-full workloads though.

Other areas of improvement are more general, such as reducing lock
contention, reducing dispatch overhead, looking at direct submission
rather than bouncing around tasklets etc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    | 152 +++++-
 drivers/gpu/drm/i915/i915_scheduler.c      |  17 +-
 drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
 drivers/gpu/drm/i915/intel_lrc.c           | 518 ++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h           |   9 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 +++++++
 include/uapi/drm/i915_drm.h                |  30 ++
 9 files changed, 889 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index b0e4b976880c..9905fcdd33c8 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -89,4 +89,9 @@ static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
 	return !atomic_read(&t->count);
 }
 
+static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
+{
+	return test_bit(TASKLET_STATE_SCHED, &t->state);
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 321b26e302e5..57c85e990e80 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -91,6 +91,7 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
+#include "intel_lrc.h"
 #include "intel_workarounds.h"
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
@@ -236,6 +237,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
 	mutex_unlock(&i915->contexts.mutex);
 }
 
+static void free_engines(struct intel_engine_cs **engines, int count)
+{
+	int i;
+
+	if (!engines)
+		return;
+
+	/* We own the veng we created; regular engines are ignored */
+	for (i = 0; i < count; i++)
+		intel_virtual_engine_destroy(engines[i]);
+
+	kfree(engines);
+}
+
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
 	struct intel_context *it, *n;
@@ -246,8 +261,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
-
-	kfree(ctx->engines);
+	free_engines(ctx->engines, ctx->nengine);
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		it->ops->destroy(it);
@@ -1338,13 +1352,115 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 };
 
+static int check_user_mbz16(u16 __user *user)
+{
+	u16 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
+static int check_user_mbz32(u32 __user *user)
+{
+	u32 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
+static int check_user_mbz64(u64 __user *user)
+{
+	u64 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
 struct set_engines {
 	struct i915_gem_context *ctx;
 	struct intel_engine_cs **engines;
 	unsigned int nengine;
 };
 
+static int
+set_engines__load_balance(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_load_balance __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *ve;
+	unsigned int n;
+	u64 mask;
+	u16 idx;
+	int err;
+
+	if (!HAS_EXECLISTS(set->ctx->i915))
+		return -ENODEV;
+
+	if (USES_GUC_SUBMISSION(set->ctx->i915))
+		return -ENODEV; /* not implement yet */
+
+	if (get_user(idx, &ext->engine_index))
+		return -EFAULT;
+
+	if (idx >= set->nengine)
+		return -EINVAL;
+
+	if (set->engines[idx])
+		return -EEXIST;
+
+	err = check_user_mbz16(&ext->mbz16);
+	if (err)
+		return err;
+
+	err = check_user_mbz32(&ext->flags);
+	if (err)
+		return err;
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz64(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (get_user(mask, &ext->engines_mask))
+		return -EFAULT;
+
+	mask &= GENMASK_ULL(set->nengine - 1, 0) & ~BIT_ULL(idx);
+	if (!mask)
+		return -EINVAL;
+
+	if (is_power_of_2(mask)) {
+		ve = set->engines[__ffs64(mask)];
+	} else {
+		struct intel_engine_cs *stack[64];
+		int bit;
+
+		n = 0;
+		for_each_set_bit(bit, (unsigned long *)&mask, set->nengine)
+			stack[n++] = set->engines[bit];
+
+		ve = intel_execlists_create_virtual(set->ctx, stack, n);
+	}
+	if (IS_ERR(ve))
+		return PTR_ERR(ve);
+
+	if (cmpxchg(&set->engines[idx], NULL, ve)) {
+		intel_virtual_engine_destroy(ve);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
+	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 };
 
 static int
@@ -1405,13 +1521,13 @@ set_engines(struct i915_gem_context *ctx,
 					   ARRAY_SIZE(set_engines__extensions),
 					   &set);
 	if (err) {
-		kfree(set.engines);
+		free_engines(set.engines, set.nengine);
 		return err;
 	}
 
 out:
 	mutex_lock(&ctx->i915->drm.struct_mutex);
-	kfree(ctx->engines);
+	free_engines(ctx->engines, ctx->nengine);
 	ctx->engines = set.engines;
 	ctx->nengine = set.nengine;
 	mutex_unlock(&ctx->i915->drm.struct_mutex);
@@ -1664,6 +1780,34 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 				       GFP_KERNEL);
 		if (!dst->engines)
 			return -ENOMEM;
+
+		/*
+		 * Virtual engines are singletons; they can only exist
+		 * inside a single context, because they embed their
+		 * HW context... As each virtual context implies a single
+		 * timeline (each engine can only dequeue a single request
+		 * at any time), it would be surprising for two contexts
+		 * to use the same engine. So let's create a copy of
+		 * the virtual engine instead.
+		 */
+		if (intel_engine_is_virtual(dst->engines[0])) {
+			struct intel_engine_cs *ve;
+			struct intel_engine_cs **siblings;
+			unsigned long count;
+
+			count = intel_virtual_engine_get_siblings(src->engines[0],
+								  &siblings);
+
+			ve = intel_execlists_create_virtual(dst,
+							    siblings,
+							    count);
+			if (IS_ERR(ve)) {
+				dst->engines[0] = NULL;
+				return PTR_ERR(ve);
+			}
+
+			dst->engines[0] = ve;
+		}
 	}
 
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 8a64748a7912..5b934b71576d 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -245,17 +245,25 @@ sched_lock_engine(const struct i915_sched_node *node,
 		  struct intel_engine_cs *locked,
 		  struct sched_cache *cache)
 {
-	struct intel_engine_cs *engine = node_to_request(node)->engine;
+	const struct i915_request *rq = node_to_request(node);
+	struct intel_engine_cs *engine;
 
 	GEM_BUG_ON(!locked);
 
-	if (engine != locked) {
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	while (locked != (engine = READ_ONCE(rq->engine))) {
 		spin_unlock(&locked->timeline.lock);
 		memset(cache, 0, sizeof(*cache));
 		spin_lock(&engine->timeline.lock);
+		locked = engine;
 	}
 
-	return engine;
+	return locked;
 }
 
 static bool inflight(const struct i915_request *rq,
@@ -368,8 +376,11 @@ static void __i915_schedule(struct i915_request *rq,
 		if (prio <= node->attr.priority || node_signaled(node))
 			continue;
 
+		GEM_BUG_ON(node_to_request(node)->engine != engine);
+
 		node->attr.priority = prio;
 		if (!list_empty(&node->link)) {
+			GEM_BUG_ON(intel_engine_is_virtual(engine));
 			if (!cache.priolist)
 				cache.priolist =
 					i915_sched_lookup_priolist(engine,
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index 8ff146dc05ba..5e445f145eb1 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -25,6 +25,7 @@ struct i915_timeline {
 	spinlock_t lock;
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
+#define TIMELINE_VIRTUAL 2
 	struct mutex mutex; /* protects the flow of requests */
 
 	unsigned int pin_count;
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 85158e119638..fe59fa24013e 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -216,6 +216,7 @@ struct intel_engine_execlists {
 	 * @queue: queue of requests, in priority lists
 	 */
 	struct rb_root_cached queue;
+	struct rb_root_cached virtual;
 
 	/**
 	 * @csb_write: control register for Context Switch buffer
@@ -421,6 +422,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+#define I915_ENGINE_IS_VIRTUAL       BIT(4)
 	unsigned int flags;
 
 	/*
@@ -504,6 +506,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
+static inline bool
+intel_engine_is_virtual(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_IS_VIRTUAL;
+}
+
 #define instdone_slice_mask(dev_priv__) \
 	(IS_GEN(dev_priv__, 7) ? \
 	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e69b52c4d3ce..d8b276173463 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -166,6 +166,28 @@
 
 #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
+struct virtual_engine {
+	struct intel_engine_cs base;
+
+	struct intel_context context;
+	struct kref kref;
+	struct rcu_head rcu;
+
+	struct i915_request *request;
+	struct ve_node {
+		struct rb_node rb;
+		int prio;
+	} nodes[I915_NUM_ENGINES];
+
+	unsigned int count;
+	struct intel_engine_cs *siblings[0];
+};
+
+static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
+{
+	return container_of(engine, struct virtual_engine, base);
+}
+
 static int execlists_context_deferred_alloc(struct intel_context *ce,
 					    struct intel_engine_cs *engine);
 static void execlists_init_reg_state(u32 *reg_state,
@@ -235,7 +257,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 }
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
-				const struct i915_request *rq)
+				const struct i915_request *rq,
+				struct rb_node *rb)
 {
 	int last_prio;
 
@@ -270,6 +293,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, link)) > last_prio)
 		return true;
 
+	if (rb) { /* XXX virtual precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		bool preempt = false;
+
+		if (engine == ve->siblings[0]) { /* only preempt one sibling */
+			spin_lock(&ve->base.timeline.lock);
+			if (ve->request)
+				preempt = rq_prio(ve->request) > last_prio;
+			spin_unlock(&ve->base.timeline.lock);
+		}
+
+		if (preempt)
+			return preempt;
+	}
+
 	/*
 	 * If the inflight context did not trigger the preemption, then maybe
 	 * it was the set of queued requests? Pick the highest priority in
@@ -388,6 +427,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	list_for_each_entry_safe_reverse(rq, rn,
 					 &engine->timeline.requests,
 					 link) {
+		struct intel_engine_cs *owner;
+
 		if (i915_request_completed(rq))
 			break;
 
@@ -396,14 +437,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		GEM_BUG_ON(rq->hw_context->active);
 
-		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-		if (rq_prio(rq) != prio) {
-			prio = rq_prio(rq);
-			pl = i915_sched_lookup_priolist(engine, prio);
-		}
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+		owner = rq->hw_context->engine;
+		if (likely(owner == engine)) {
+			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+			if (rq_prio(rq) != prio) {
+				prio = rq_prio(rq);
+				pl = i915_sched_lookup_priolist(engine, prio);
+			}
+			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+
+			list_add(&rq->sched.link, pl);
+		} else {
+			if (__i915_request_has_started(rq))
+				rq->sched.attr.priority |= ACTIVE_PRIORITY;
 
-		list_add(&rq->sched.link, pl);
+			rq->engine = owner;
+			owner->submit_request(rq);
+		}
 
 		active = rq;
 	}
@@ -659,6 +709,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 						  execlists));
 }
 
+static void virtual_update_register_offsets(u32 *regs,
+					    struct intel_engine_cs *engine)
+{
+	u32 base = engine->mmio_base;
+
+	regs[CTX_CONTEXT_CONTROL] =
+		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
+	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
+	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
+	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
+	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
+
+	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
+	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
+	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
+	regs[CTX_SECOND_BB_HEAD_U] =
+		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
+	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
+	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
+
+	regs[CTX_CTX_TIMESTAMP] =
+		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
+	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
+	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
+	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
+	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
+	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
+	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
+	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
+	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
+
+	if (engine->class == RENDER_CLASS) {
+		regs[CTX_RCS_INDIRECT_CTX] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
+		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
+		regs[CTX_BB_PER_CTX_PTR] =
+			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
+
+		regs[CTX_R_PWR_CLK_STATE] =
+			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+	}
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -691,6 +785,28 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
+	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+		struct intel_engine_cs *active;
+
+		if (!rq) {
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		active = READ_ONCE(ve->context.active);
+		if (active && active != engine) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		break;
+	}
+
 	if (last) {
 		/*
 		 * Don't resubmit or switch until all outstanding
@@ -712,7 +828,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last)) {
+		if (need_preempt(engine, last, rb)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -752,6 +868,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		last->tail = last->wa_tail;
 	}
 
+	while (rb) { /* XXX virtual is always taking precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq;
+
+		spin_lock(&ve->base.timeline.lock);
+
+		rq = ve->request;
+		if (unlikely(!rq)) { /* lost the race to a sibling */
+			spin_unlock(&ve->base.timeline.lock);
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		if (rq_prio(rq) >= queue_prio(execlists)) {
+			if (last && !can_merge_rq(last, rq)) {
+				spin_unlock(&ve->base.timeline.lock);
+				return; /* leave this rq for another engine */
+			}
+
+			GEM_BUG_ON(rq->engine != &ve->base);
+			ve->request = NULL;
+			ve->base.execlists.queue_priority_hint = INT_MIN;
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+
+			GEM_BUG_ON(rq->hw_context != &ve->context);
+			rq->engine = engine;
+
+			if (engine != ve->siblings[0]) {
+				u32 *regs = ve->context.lrc_reg_state;
+				unsigned int n;
+
+				GEM_BUG_ON(READ_ONCE(ve->context.active));
+				virtual_update_register_offsets(regs, engine);
+
+				/*
+				 * Move the bound engine to the top of the list
+				 * for future execution. We then kick this
+				 * tasklet first before checking others, so that
+				 * we preferentially reuse this set of bound
+				 * registers.
+				 */
+				for (n = 1; n < ve->count; n++) {
+					if (ve->siblings[n] == engine) {
+						swap(ve->siblings[n],
+						     ve->siblings[0]);
+						break;
+					}
+				}
+
+				GEM_BUG_ON(ve->siblings[0] != engine);
+			}
+
+			__i915_request_submit(rq);
+			trace_i915_request_in(rq, port_index(port, execlists));
+			submit = true;
+			last = rq;
+		}
+
+		spin_unlock(&ve->base.timeline.lock);
+		break;
+	}
+
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
@@ -2899,6 +3081,301 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	}
 }
 
+static void __virtual_engine_free(struct rcu_head *rcu)
+{
+	struct virtual_engine *ve = container_of(rcu, typeof(*ve), rcu);
+
+	kfree(ve);
+}
+
+static void virtual_engine_free(struct kref *kref)
+{
+	struct virtual_engine *ve = container_of(kref, typeof(*ve), kref);
+	unsigned int n;
+
+	GEM_BUG_ON(ve->request);
+	GEM_BUG_ON(ve->context.active);
+
+	for (n = 0; n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct rb_node *node = &ve->nodes[sibling->id].rb;
+
+		if (RB_EMPTY_NODE(node))
+			continue;
+
+		spin_lock_irq(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(node))
+			rb_erase_cached(node, &sibling->execlists.virtual);
+
+		spin_unlock_irq(&sibling->timeline.lock);
+	}
+	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
+
+	if (ve->context.state)
+		__execlists_context_fini(&ve->context);
+
+	i915_timeline_fini(&ve->base.timeline);
+	call_rcu(&ve->rcu, __virtual_engine_free);
+}
+
+static void virtual_context_unpin(struct intel_context *ce)
+{
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+
+	execlists_context_unpin(ce);
+
+	kref_put(&ve->kref, virtual_engine_free);
+}
+
+static void virtual_engine_initial_hint(struct virtual_engine *ve)
+{
+	int swp;
+
+	/*
+	 * Pick a random sibling on starting to help spread the load around.
+	 *
+	 * New contexts are typically created with exactly the same order
+	 * of siblings, and often started in batches. Due to the way we iterate
+	 * the array of sibling when submitting requests, sibling[0] is
+	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
+	 * randomised across the system, we also help spread the load by the
+	 * first engine we inspect being different each time.
+	 *
+	 * NB This does not force us to execute on this engine, it will just
+	 * typically be the first we inspect for submission.
+	 */
+	swp = prandom_u32_max(ve->count);
+	if (!swp)
+		return;
+
+	swap(ve->siblings[swp], ve->siblings[0]);
+	virtual_update_register_offsets(ve->context.lrc_reg_state,
+					ve->siblings[0]);
+}
+
+static int virtual_context_pin(struct intel_context *ce)
+{
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+	int err;
+
+	/* Note: we must use a real engine class for setting up reg state */
+	err = __execlists_context_pin(ce, ve->siblings[0]);
+	if (err)
+		return err;
+
+	virtual_engine_initial_hint(ve);
+
+	kref_get(&ve->kref);
+	return 0;
+}
+
+static const struct intel_context_ops virtual_context_ops = {
+	.pin = virtual_context_pin,
+	.unpin = virtual_context_unpin,
+};
+
+static void virtual_submission_tasklet(unsigned long data)
+{
+	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int n;
+	int prio;
+
+	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
+	if (prio == INT_MIN)
+		return;
+
+	local_irq_disable();
+	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct ve_node * const node = &ve->nodes[sibling->id];
+		struct rb_node **parent, *rb;
+		bool first;
+
+		spin_lock(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(&node->rb)) {
+			first = rb_first_cached(&sibling->execlists.virtual) == &node->rb;
+			if (prio == node->prio || (prio > node->prio && first))
+				goto submit_engine;
+
+			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
+		}
+
+		rb = NULL;
+		first = true;
+		parent = &sibling->execlists.virtual.rb_root.rb_node;
+		while (*parent) {
+			struct ve_node *other;
+
+			rb = *parent;
+			other = rb_entry(rb, typeof(*other), rb);
+			if (prio > other->prio) {
+				parent = &rb->rb_left;
+			} else {
+				parent = &rb->rb_right;
+				first = false;
+			}
+		}
+
+		rb_link_node(&node->rb, rb, parent);
+		rb_insert_color_cached(&node->rb,
+				       &sibling->execlists.virtual,
+				       first);
+
+submit_engine:
+		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
+		node->prio = prio;
+		if (first && prio > sibling->execlists.queue_priority_hint) {
+			sibling->execlists.queue_priority_hint = prio;
+			tasklet_hi_schedule(&sibling->execlists.tasklet);
+		}
+
+		spin_unlock(&sibling->timeline.lock);
+	}
+	local_irq_enable();
+}
+
+static void virtual_submit_request(struct i915_request *request)
+{
+	struct virtual_engine *ve = to_virtual_engine(request->engine);
+
+	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
+
+	GEM_BUG_ON(ve->request);
+	ve->base.execlists.queue_priority_hint = rq_prio(request);
+	WRITE_ONCE(ve->request, request);
+
+	tasklet_schedule(&ve->base.execlists.tasklet);
+}
+
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count)
+{
+	struct virtual_engine *ve;
+	unsigned int n;
+	int err;
+
+	if (!count)
+		return ERR_PTR(-EINVAL);
+
+	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&ve->kref);
+	rcu_head_init(&ve->rcu);
+	ve->base.i915 = ctx->i915;
+	ve->base.id = -1;
+	ve->base.class = OTHER_CLASS;
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	err = i915_timeline_init(ctx->i915,
+				 &ve->base.timeline,
+				 ve->base.name,
+				 NULL);
+	if (err)
+		goto err_put;
+	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
+
+	ve->base.context = &virtual_context_ops;
+	ve->base.request_alloc = execlists_request_alloc;
+
+	ve->base.schedule = i915_schedule;
+	ve->base.submit_request = virtual_submit_request;
+
+	ve->base.execlists.queue_priority_hint = INT_MIN;
+	tasklet_init(&ve->base.execlists.tasklet,
+		     virtual_submission_tasklet,
+		     (unsigned long)ve);
+
+	intel_context_init(&ve->context, ctx, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask)
+			continue;
+
+		if (sibling->execlists.tasklet.func != execlists_submission_tasklet) {
+			err = -ENODEV;
+			goto err_put;
+		}
+
+		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
+		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
+
+		ve->siblings[ve->count++] = sibling;
+		ve->base.mask |= sibling->mask;
+
+		if (ve->base.class != OTHER_CLASS) {
+			if (ve->base.class != sibling->class) {
+				err = -EINVAL;
+				goto err_put;
+			}
+			continue;
+		}
+
+		ve->base.class = sibling->class;
+		snprintf(ve->base.name, sizeof(ve->base.name),
+			 "v%dx%d", ve->base.class, count);
+		ve->base.context_size = sibling->context_size;
+
+		ve->base.emit_bb_start = sibling->emit_bb_start;
+		ve->base.emit_flush = sibling->emit_flush;
+		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
+		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
+		ve->base.emit_fini_breadcrumb_dw =
+			sibling->emit_fini_breadcrumb_dw;
+	}
+
+	/* gracefully replace a degenerate virtual engine */
+	if (is_power_of_2(ve->base.mask)) {
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);
+		return actual;
+	}
+
+	__intel_context_insert(ctx, &ve->base, &ve->context);
+	return &ve->base;
+
+err_put:
+	virtual_engine_free(&ve->kref);
+	return ERR_PTR(err);
+}
+
+unsigned long
+intel_virtual_engine_get_siblings(struct intel_engine_cs *engine,
+				  struct intel_engine_cs ***siblings)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (!engine || !intel_engine_is_virtual(engine))
+		return 0;
+
+	*siblings = ve->siblings;
+	return ve->count;
+}
+
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (!engine || !intel_engine_is_virtual(engine))
+		return;
+
+	__intel_context_remove(&ve->context);
+
+	kref_put(&ve->kref, virtual_engine_free);
+}
+
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
@@ -2956,6 +3433,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\tQ ");
 	}
 
+	last = NULL;
+	count = 0;
+	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (rq) {
+			if (count++ < max - 1)
+				show_request(m, rq, "\t\tV ");
+			else
+				last = rq;
+		}
+	}
+	if (last) {
+		if (count > max) {
+			drm_printf(m,
+				   "\t\t...skipping %d virtual requests...\n",
+				   count - max);
+		}
+		show_request(m, last, "\t\tV ");
+	}
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f1aec8a6986f..0e69feba4173 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,6 +112,15 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							const char *prefix),
 				   unsigned int max);
 
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count);
+unsigned long
+intel_virtual_engine_get_siblings(struct intel_engine_cs *engine,
+				  struct intel_engine_cs ***siblings);
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
+
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 565e949a4722..7c0e95527bcb 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -10,6 +10,7 @@
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
+#include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
 
@@ -1042,6 +1043,169 @@ static int live_preempt_smoke(void *arg)
 	return err;
 }
 
+static int nop_virtual_engine(struct drm_i915_private *i915,
+			      struct intel_engine_cs **siblings,
+			      unsigned int nsibling,
+			      unsigned int nctx,
+			      unsigned int flags)
+#define CHAIN BIT(0)
+{
+	IGT_TIMEOUT(end_time);
+	struct i915_request *request[16];
+	struct i915_gem_context *ctx[16];
+	struct intel_engine_cs *ve[16];
+	unsigned long n, prime, nc;
+	struct igt_live_test t;
+	ktime_t times[2] = {};
+	int err;
+
+	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
+
+	for (n = 0; n < nctx; n++) {
+		ctx[n] = kernel_context(i915);
+		if (!ctx[n])
+			return -ENOMEM;
+
+		ve[n] = intel_execlists_create_virtual(ctx[n],
+						       siblings, nsibling);
+		if (IS_ERR(ve[n]))
+			return PTR_ERR(ve[n]);
+	}
+
+	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
+	if (err)
+		goto out;
+
+	for_each_prime_number_from(prime, 1, 8192) {
+		times[1] = ktime_get_raw();
+
+		if (flags & CHAIN) {
+			for (nc = 0; nc < nctx; nc++) {
+				for (n = 0; n < prime; n++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		} else {
+			for (n = 0; n < prime; n++) {
+				for (nc = 0; nc < nctx; nc++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		}
+
+		for (nc = 0; nc < nctx; nc++) {
+			if (i915_request_wait(request[nc],
+					      I915_WAIT_LOCKED,
+					      HZ / 10) < 0) {
+				pr_err("%s(%s): wait for %llx:%lld timed out\n",
+				       __func__, ve[0]->name,
+				       request[nc]->fence.context,
+				       request[nc]->fence.seqno);
+
+				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
+					  __func__, ve[0]->name,
+					  request[nc]->fence.context,
+					  request[nc]->fence.seqno);
+				GEM_TRACE_DUMP();
+				i915_gem_set_wedged(i915);
+				break;
+			}
+		}
+
+		times[1] = ktime_sub(ktime_get_raw(), times[1]);
+		if (prime == 1)
+			times[0] = times[1];
+
+		if (__igt_timeout(end_time, NULL))
+			break;
+	}
+
+	err = igt_live_test_end(&t);
+	if (err)
+		goto out;
+
+	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
+		nctx, ve[0]->name, ktime_to_ns(times[0]),
+		prime, div64_u64(ktime_to_ns(times[1]), prime));
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (nc = 0; nc < nctx; nc++) {
+		intel_virtual_engine_destroy(ve[nc]);
+		kernel_context_close(ctx[nc]);
+	}
+	return err;
+}
+
+static int live_virtual_engine(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	unsigned int class, inst;
+	int err = -ENODEV;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for_each_engine(engine, i915, id) {
+		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
+		if (err) {
+			pr_err("Failed to wrap engine %s: err=%d\n",
+			       engine->name, err);
+			goto out_unlock;
+		}
+	}
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		int nsibling, n;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (n = 1; n <= nsibling + 1; n++) {
+			err = nop_virtual_engine(i915, siblings, nsibling,
+						 n, 0);
+			if (err)
+				goto out_unlock;
+		}
+
+		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
+		if (err)
+			goto out_unlock;
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1053,6 +1217,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
+		SUBTEST(live_virtual_engine),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b68e1ad12c0f..102a2f9de443 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -125,6 +125,7 @@ enum drm_i915_gem_engine_class {
 };
 
 #define I915_ENGINE_CLASS_INVALID_NONE -1
+#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
 
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
@@ -1596,8 +1597,37 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+/*
+ * i915_context_engines_load_balance:
+ *
+ * Enable load balancing across this set of engines.
+ *
+ * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
+ * used will proxy the execbuffer request onto one of the set of engines
+ * in such a way as to distribute the load evenly across the set.
+ *
+ * The set of engines must be compatible (e.g. the same HW class) as they
+ * will share the same logical GPU context and ring.
+ *
+ * To intermix rendering with the virtual engine and direct rendering onto
+ * the backing engines (bypassing the load balancing proxy), the context must
+ * be defined to use a single timeline for all engines.
+ */
+struct i915_context_engines_load_balance {
+	struct i915_user_extension base;
+
+	__u16 engine_index;
+	__u16 mbz16; /* reserved for future use; must be zero */
+	__u32 flags; /* all undefined flags must be zero */
+
+	__u64 engines_mask; /* selection mask of engines[] */
+
+	__u64 mbz64[4]; /* reserved for future use; must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 34/38] drm/i915: Extend execution fence to support a callback
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (31 preceding siblings ...)
  2019-03-01 14:03 ` [PATCH 33/38] drm/i915: Load balancing across a virtual engine Chris Wilson
@ 2019-03-01 14:04 ` Chris Wilson
  2019-03-01 14:04 ` [PATCH 35/38] drm/i915/execlists: Virtual engine bonding Chris Wilson
                   ` (7 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:04 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 84 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h |  4 ++
 2 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e0807a61dcf4..db98b4bf5d5e 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -37,6 +37,8 @@ struct execute_cb {
 	struct list_head link;
 	struct irq_work work;
 	struct i915_sw_fence *fence;
+	void (*hook)(struct i915_request *rq, struct dma_fence *signal);
+	struct i915_request *signal;
 };
 
 static struct i915_global_request {
@@ -341,6 +343,17 @@ static void irq_execute_cb(struct irq_work *wrk)
 	kmem_cache_free(global.slab_execute_cbs, cb);
 }
 
+static void irq_execute_cb_hook(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	cb->hook(container_of(cb->fence, struct i915_request, submit),
+		 &cb->signal->fence);
+	i915_request_put(cb->signal);
+
+	irq_execute_cb(wrk);
+}
+
 static void __notify_execute_cb(struct i915_request *rq)
 {
 	struct execute_cb *cb;
@@ -367,14 +380,19 @@ static void __notify_execute_cb(struct i915_request *rq)
 }
 
 static int
-i915_request_await_execution(struct i915_request *rq,
-			     struct i915_request *signal,
-			     gfp_t gfp)
+__i915_request_await_execution(struct i915_request *rq,
+			       struct i915_request *signal,
+			       void (*hook)(struct i915_request *rq,
+					    struct dma_fence *signal),
+			       gfp_t gfp)
 {
 	struct execute_cb *cb;
 
-	if (i915_request_is_active(signal))
+	if (i915_request_is_active(signal)) {
+		if (hook)
+			hook(rq, &signal->fence);
 		return 0;
+	}
 
 	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
 	if (!cb)
@@ -384,8 +402,18 @@ i915_request_await_execution(struct i915_request *rq,
 	i915_sw_fence_await(cb->fence);
 	init_irq_work(&cb->work, irq_execute_cb);
 
+	if (hook) {
+		cb->hook = hook;
+		cb->signal = i915_request_get(signal);
+		cb->work.func = irq_execute_cb_hook;
+	}
+
 	spin_lock_irq(&signal->lock);
 	if (i915_request_is_active(signal)) {
+		if (hook) {
+			hook(rq, &signal->fence);
+			i915_request_put(signal);
+		}
 		i915_sw_fence_complete(cb->fence);
 		kmem_cache_free(global.slab_execute_cbs, cb);
 	} else {
@@ -788,7 +816,7 @@ emit_semaphore_wait(struct i915_request *to,
 		return err;
 
 	/* Only submit our spinner after the signaler is running! */
-	err = i915_request_await_execution(to, from, gfp);
+	err = __i915_request_await_execution(to, from, NULL, gfp);
 	if (err)
 		return err;
 
@@ -908,6 +936,52 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 	return 0;
 }
 
+int
+i915_request_await_execution(struct i915_request *rq,
+			     struct dma_fence *fence,
+			     void (*hook)(struct i915_request *rq,
+					  struct dma_fence *signal))
+{
+	struct dma_fence **child = &fence;
+	unsigned int nchild = 1;
+	int ret;
+
+	if (dma_fence_is_array(fence)) {
+		struct dma_fence_array *array = to_dma_fence_array(fence);
+
+		/* XXX Error for signal-on-any fence arrays */
+
+		child = array->fences;
+		nchild = array->num_fences;
+		GEM_BUG_ON(!nchild);
+	}
+
+	do {
+		fence = *child++;
+		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+			continue;
+
+		/*
+		 * We don't squash repeated fence dependencies here as we
+		 * want to run our callback in all cases.
+		 */
+
+		if (dma_fence_is_i915(fence))
+			ret = __i915_request_await_execution(rq,
+							     to_request(fence),
+							     hook,
+							     I915_FENCE_GFP);
+		else
+			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
+							    I915_FENCE_TIMEOUT,
+							    GFP_KERNEL);
+		if (ret < 0)
+			return ret;
+	} while (--nchild);
+
+	return 0;
+}
+
 /**
  * i915_request_await_object - set this request to (async) wait upon a bo
  * @to: request we are wishing to use
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index d4113074a8f6..fef6aabbb1a4 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -265,6 +265,10 @@ int i915_request_await_object(struct i915_request *to,
 			      bool write);
 int i915_request_await_dma_fence(struct i915_request *rq,
 				 struct dma_fence *fence);
+int i915_request_await_execution(struct i915_request *rq,
+				 struct dma_fence *fence,
+				 void (*hook)(struct i915_request *rq,
+					      struct dma_fence *signal));
 
 void i915_request_add(struct i915_request *rq);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 35/38] drm/i915/execlists: Virtual engine bonding
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (32 preceding siblings ...)
  2019-03-01 14:04 ` [PATCH 34/38] drm/i915: Extend execution fence to support a callback Chris Wilson
@ 2019-03-01 14:04 ` Chris Wilson
  2019-03-01 14:04 ` [PATCH 36/38] drm/i915: Allow specification of parallel execbuf Chris Wilson
                   ` (6 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:04 UTC (permalink / raw)
  To: intel-gfx

Some users require that when a master batch is executed on one particular
engine, a companion batch is run simultaneously on a specific slave
engine. For this purpose, we introduce virtual engine bonding, allowing
maps of master:slaves to be constructed to constrain which physical
engines a virtual engine may select given a fence on a master engine.

For the moment, we continue to ignore the issue of preemption deferring
the master request for later. Ideally, we would like to then also remove
the slave and run something else rather than have it stall the pipeline.
With load balancing, we should be able to move workload around it, but
there is a similar stall on the master pipeline while it may wait for
the slave to be executed. At the cost of more latency for the bonded
request, it may be interesting to launch both on their engines in
lockstep. (Bubbles abound.)

Opens: Also what about bonding an engine as its own master? It doesn't
break anything internally, so allow the silliness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c    |  46 ++++++
 drivers/gpu/drm/i915/i915_request.c        |   1 +
 drivers/gpu/drm/i915/i915_request.h        |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   7 +
 drivers/gpu/drm/i915/intel_lrc.c           |  97 ++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h           |   3 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 167 +++++++++++++++++++++
 include/uapi/drm/i915_drm.h                |  22 +++
 8 files changed, 344 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 57c85e990e80..c21460c7f842 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1459,8 +1459,54 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 	return 0;
 }
 
+static int
+set_engines__bond(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_bond __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *master;
+	u32 class, instance, siblings;
+	u16 idx;
+	int err;
+
+	if (get_user(idx, &ext->engine_index))
+		return -EFAULT;
+
+	if (idx >= set->nengine || !set->engines[idx])
+		return -EINVAL;
+
+	/*
+	 * A non-virtual engine has 0 siblings to choose between; and submit fence will
+	 * always be directed to the one engine.
+	 */
+	if (!intel_engine_is_virtual(set->engines[idx]))
+		return 0;
+
+	err = check_user_mbz16(&ext->mbz);
+	if (err)
+		return err;
+
+	if (get_user(class, &ext->master_class))
+		return -EFAULT;
+
+	if (get_user(instance, &ext->master_instance))
+		return -EFAULT;
+
+	master = intel_engine_lookup_user(set->ctx->i915, class, instance);
+	if (!master)
+		return -EINVAL;
+
+	if (get_user(siblings, &ext->sibling_mask))
+		return -EFAULT;
+
+	return intel_virtual_engine_attach_bond(set->engines[idx],
+						master, siblings);
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
+	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index db98b4bf5d5e..ccee1cff8961 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -741,6 +741,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->batch = NULL;
 	rq->capture_list = NULL;
 	rq->waitboost = false;
+	rq->execution_mask = ~0u;
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index fef6aabbb1a4..01de566c8e73 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -145,6 +145,7 @@ struct i915_request {
 	 */
 	struct i915_sched_node sched;
 	struct i915_dependency dep;
+	unsigned int execution_mask;
 
 	/*
 	 * A convenience pointer to the current breadcrumb value stored in
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index fe59fa24013e..e455a09a7100 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -382,6 +382,13 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct i915_request *rq);
 
+	/*
+	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
+	 * request down to the bonded pairs.
+	 */
+	void            (*bond_execute)(struct i915_request *rq,
+					struct dma_fence *signal);
+
 	/*
 	 * Call when the priority on a request has changed and it and its
 	 * dependencies may need rescheduling. Note the request itself may
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d8b276173463..a14f9d90bf7c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -179,6 +179,12 @@ struct virtual_engine {
 		int prio;
 	} nodes[I915_NUM_ENGINES];
 
+	struct ve_bond {
+		struct intel_engine_cs *master;
+		unsigned int sibling_mask;
+	} *bonds;
+	unsigned int nbond;
+
 	unsigned int count;
 	struct intel_engine_cs *siblings[0];
 };
@@ -3178,6 +3184,7 @@ static const struct intel_context_ops virtual_context_ops = {
 static void virtual_submission_tasklet(unsigned long data)
 {
 	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int mask;
 	unsigned int n;
 	int prio;
 
@@ -3186,12 +3193,30 @@ static void virtual_submission_tasklet(unsigned long data)
 		return;
 
 	local_irq_disable();
+
+	mask = 0;
+	spin_lock(&ve->base.timeline.lock);
+	if (ve->request)
+		mask = ve->request->execution_mask;
+	spin_unlock(&ve->base.timeline.lock);
+
 	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
 		struct intel_engine_cs *sibling = ve->siblings[n];
 		struct ve_node * const node = &ve->nodes[sibling->id];
 		struct rb_node **parent, *rb;
 		bool first;
 
+		if (unlikely(!(mask & sibling->mask))) {
+			if (!RB_EMPTY_NODE(&node->rb)) {
+				spin_lock(&sibling->timeline.lock);
+				rb_erase_cached(&node->rb,
+						&sibling->execlists.virtual);
+				RB_CLEAR_NODE(&node->rb);
+				spin_unlock(&sibling->timeline.lock);
+			}
+			continue;
+		}
+
 		spin_lock(&sibling->timeline.lock);
 
 		if (!RB_EMPTY_NODE(&node->rb)) {
@@ -3249,6 +3274,30 @@ static void virtual_submit_request(struct i915_request *request)
 	tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
+static struct ve_bond *
+virtual_find_bond(struct virtual_engine *ve, struct intel_engine_cs *master)
+{
+	int i;
+
+	for (i = 0; i < ve->nbond; i++) {
+		if (ve->bonds[i].master == master)
+			return &ve->bonds[i];
+	}
+
+	return NULL;
+}
+
+static void
+virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
+{
+	struct virtual_engine *ve = to_virtual_engine(rq->engine);
+	struct ve_bond *bond;
+
+	bond = virtual_find_bond(ve, to_request(signal)->engine);
+	if (bond) /* XXX serialise with rq->lock? */
+		rq->execution_mask &= bond->sibling_mask;
+}
+
 struct intel_engine_cs *
 intel_execlists_create_virtual(struct i915_gem_context *ctx,
 			       struct intel_engine_cs **siblings,
@@ -3289,6 +3338,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 
 	ve->base.schedule = i915_schedule;
 	ve->base.submit_request = virtual_submit_request;
+	ve->base.bond_execute = virtual_bond_execute;
 
 	ve->base.execlists.queue_priority_hint = INT_MIN;
 	tasklet_init(&ve->base.execlists.tasklet,
@@ -3364,6 +3414,53 @@ intel_virtual_engine_get_siblings(struct intel_engine_cs *engine,
 	return ve->count;
 }
 
+static unsigned long
+virtual_execution_mask(struct virtual_engine *ve, unsigned long mask)
+{
+	unsigned long emask = 0;
+	int bit;
+
+	for_each_set_bit(bit, &mask, ve->count)
+		emask |= ve->siblings[bit]->mask;
+
+	return emask;
+}
+
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long mask)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct ve_bond *bond;
+
+	if (mask >> ve->count)
+		return -EINVAL;
+
+	mask = virtual_execution_mask(ve, mask);
+	if (!mask)
+		return -EINVAL;
+
+	bond = virtual_find_bond(ve, master);
+	if (bond) {
+		bond->sibling_mask |= mask;
+		return 0;
+	}
+
+	bond = krealloc(ve->bonds,
+			sizeof(*bond) * (ve->nbond + 1),
+			GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond[ve->nbond].master = master;
+	bond[ve->nbond].sibling_mask = mask;
+
+	ve->bonds = bond;
+	ve->nbond++;
+
+	return 0;
+}
+
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
 {
 	struct virtual_engine *ve = to_virtual_engine(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 0e69feba4173..d32f5ed0b769 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -119,6 +119,9 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 unsigned long
 intel_virtual_engine_get_siblings(struct intel_engine_cs *engine,
 				  struct intel_engine_cs ***siblings);
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long siblings);
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
 
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 7c0e95527bcb..b6329280a59f 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -13,6 +13,7 @@
 #include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
 
@@ -1206,6 +1207,171 @@ static int live_virtual_engine(void *arg)
 	return err;
 }
 
+static int bond_virtual_engine(struct drm_i915_private *i915,
+			       unsigned int class,
+			       struct intel_engine_cs **siblings,
+			       unsigned int nsibling,
+			       unsigned int flags)
+#define BOND_SCHEDULE BIT(0)
+{
+	struct intel_engine_cs *master;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq[16];
+	enum intel_engine_id id;
+	unsigned long n;
+	int err;
+
+	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
+
+	ctx = kernel_context(i915);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = 0;
+	rq[0] = ERR_PTR(-ENOMEM);
+	for_each_engine(master, i915, id) {
+		struct i915_sw_fence fence;
+
+		if (master->class == class)
+			continue;
+
+		rq[0] = i915_request_alloc(master, ctx);
+		if (IS_ERR(rq[0])) {
+			err = PTR_ERR(rq[0]);
+			goto out;
+		}
+
+		if (flags & BOND_SCHEDULE)
+			onstack_fence_init(&fence);
+
+		i915_request_get(rq[0]);
+		i915_request_add(rq[0]);
+
+		for (n = 0; n < nsibling; n++) {
+			struct intel_engine_cs *engine;
+
+			engine = intel_execlists_create_virtual(ctx,
+								siblings,
+								nsibling);
+			if (IS_ERR(engine)) {
+				err = PTR_ERR(engine);
+				goto out;
+			}
+
+			err = intel_virtual_engine_attach_bond(engine,
+							       master,
+							       BIT(n));
+			if (err) {
+				intel_virtual_engine_destroy(engine);
+				goto out;
+			}
+
+			rq[n + 1] = i915_request_alloc(engine, ctx);
+			if (IS_ERR(rq[n + 1])) {
+				err = PTR_ERR(rq[n + 1]);
+				intel_virtual_engine_destroy(engine);
+				goto out;
+			}
+			i915_request_get(rq[n + 1]);
+
+			err = i915_request_await_execution(rq[n + 1],
+							   &rq[0]->fence,
+							   engine->bond_execute);
+			i915_request_add(rq[n + 1]);
+			intel_virtual_engine_destroy(engine);
+			if (err < 0)
+				goto out;
+		}
+		rq[n + 1] = ERR_PTR(-EINVAL);
+
+		if (flags & BOND_SCHEDULE)
+			onstack_fence_fini(&fence);
+
+		for (n = 0; n < nsibling; n++) {
+			if (i915_request_wait(rq[n + 1],
+					      I915_WAIT_LOCKED,
+					      MAX_SCHEDULE_TIMEOUT) < 0) {
+				err = -EIO;
+				goto out;
+			}
+
+			if (rq[n + 1]->engine != siblings[n]) {
+				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
+				       siblings[n]->name,
+				       rq[n + 1]->engine->name,
+				       rq[0]->engine->name);
+				err = -EINVAL;
+				goto out;
+			}
+		}
+
+		for (n = 0; !IS_ERR(rq[n]); n++)
+			i915_request_put(rq[n]);
+		rq[0] = ERR_PTR(-ENOMEM);
+	}
+
+out:
+	for (n = 0; !IS_ERR(rq[n]); n++)
+		i915_request_put(rq[n]);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	kernel_context_close(ctx);
+	return err;
+}
+
+static int live_virtual_bond(void *arg)
+{
+	static const struct phase {
+		const char *name;
+		unsigned int flags;
+	} phases[] = {
+		{ "", 0 },
+		{ "schedule", BOND_SCHEDULE },
+		{ },
+	};
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	unsigned int class, inst;
+	int err = 0;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		const struct phase *p;
+		int nsibling;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			GEM_BUG_ON(nsibling == ARRAY_SIZE(siblings));
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (p = phases; p->name; p++) {
+			err = bond_virtual_engine(i915,
+						  class, siblings, nsibling,
+						  p->flags);
+			if (err) {
+				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
+				       __func__, p->name, class, nsibling, err);
+				goto out_unlock;
+			}
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1218,6 +1384,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
+		SUBTEST(live_virtual_bond),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 102a2f9de443..c9683b19aaf2 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1530,6 +1530,10 @@ struct drm_i915_gem_context_param {
  * sized argument, will revert back to default settings.
  *
  * See struct i915_context_param_engines.
+ *
+ * Extensions:
+ *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
+ *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
  */
 #define I915_CONTEXT_PARAM_ENGINES	0xa
 /* Must be kept compact -- no holes and well documented */
@@ -1625,9 +1629,27 @@ struct i915_context_engines_load_balance {
 	__u64 mbz64[4]; /* reserved for future use; must be zero */
 };
 
+/*
+ * i915_context_engines_bond:
+ *
+ */
+struct i915_context_engines_bond {
+	struct i915_user_extension base;
+
+	__u16 engine_index;
+	__u16 mbz;
+
+	__u16 master_class;
+	__u16 master_instance;
+
+	__u64 sibling_mask;
+	__u64 flags; /* all undefined flags must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+#define I915_CONTEXT_ENGINES_EXT_BOND 1
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 36/38] drm/i915: Allow specification of parallel execbuf
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (33 preceding siblings ...)
  2019-03-01 14:04 ` [PATCH 35/38] drm/i915/execlists: Virtual engine bonding Chris Wilson
@ 2019-03-01 14:04 ` Chris Wilson
  2019-03-01 14:04 ` [PATCH 37/38] drm/i915/selftests: Check preemption support on each engine Chris Wilson
                   ` (5 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:04 UTC (permalink / raw)
  To: intel-gfx

There is a desire to split a task onto two engines and have them run at
the same time, e.g. scanline interleaving to spread the workload evenly.
Through the use of the out-fence from the first execbuf, we can
coordinate secondary execbuf to only become ready simultaneously with
the first, so that with all things idle the second execbufs are executed
in parallel with the first. The key difference here between the new
EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
waits for the completion of the first request (so that all of its
rendering results are visible to the second execbuf, the more common
userspace fence requirement).

Since we only have a single input fence slot, userspace cannot mix an
in-fence and a submit-fence. It has to use one or the other! This is not
such a harsh requirement, since by virtue of the submit-fence, the
secondary execbuf inherit all of the dependencies from the first
request, and for the application the dependencies should be common
between the primary and secondary execbuf.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_fence/parallel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++++++++++++++++-
 include/uapi/drm/i915_drm.h                | 17 ++++++++++++++-
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index de8effed4381..a7f99f4b8bf8 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -421,6 +421,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_CAPTURE:
 	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
 	case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
+	case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5ea2e1c8b927..6150f4db877d 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2285,6 +2285,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 {
 	struct i915_execbuffer eb;
 	struct dma_fence *in_fence = NULL;
+	struct dma_fence *exec_fence = NULL;
 	struct sync_file *out_fence = NULL;
 	intel_wakeref_t wakeref;
 	int out_fence_fd = -1;
@@ -2328,11 +2329,24 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			return -EINVAL;
 	}
 
+	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
+		if (in_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+
+		exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!exec_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+	}
+
 	if (args->flags & I915_EXEC_FENCE_OUT) {
 		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
 		if (out_fence_fd < 0) {
 			err = out_fence_fd;
-			goto err_in_fence;
+			goto err_exec_fence;
 		}
 	}
 
@@ -2464,6 +2478,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_request;
 	}
 
+	if (exec_fence) {
+		err = i915_request_await_execution(eb.request, exec_fence,
+						   eb.engine->bond_execute);
+		if (err < 0)
+			goto err_request;
+	}
+
 	if (fences) {
 		err = await_fence_array(&eb, fences);
 		if (err)
@@ -2524,6 +2545,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_out_fence:
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
+err_exec_fence:
+	dma_fence_put(exec_fence);
 err_in_fence:
 	dma_fence_put(in_fence);
 	return err;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index c9683b19aaf2..677d435ac64b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -591,6 +591,12 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_MMAP_GTT_COHERENT	52
 
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
+ * execution through use of explicit fence support.
+ * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
+ */
+#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1113,7 +1119,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/*
+ * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_SUBMIT		(1 << 20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT << 1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 37/38] drm/i915/selftests: Check preemption support on each engine
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (34 preceding siblings ...)
  2019-03-01 14:04 ` [PATCH 36/38] drm/i915: Allow specification of parallel execbuf Chris Wilson
@ 2019-03-01 14:04 ` Chris Wilson
  2019-03-06 11:29   ` Tvrtko Ursulin
  2019-03-01 14:04 ` [PATCH 38/38] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
                   ` (4 subsequent siblings)
  40 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:04 UTC (permalink / raw)
  To: intel-gfx

Check that we have setup on preemption for the engine before testing,
instead warn if it is not enabled on supported HW.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index b6329280a59f..a7de7a8fc24a 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -90,6 +90,9 @@ static int live_preempt(void *arg)
 	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
 		return 0;
 
+	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
+		pr_err("Logical preemption supported, but not exposed\n");
+
 	mutex_lock(&i915->drm.struct_mutex);
 	wakeref = intel_runtime_pm_get(i915);
 
@@ -114,6 +117,9 @@ static int live_preempt(void *arg)
 	for_each_engine(engine, i915, id) {
 		struct i915_request *rq;
 
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
 		rq = igt_spinner_create_request(&spin_lo, ctx_lo, engine,
 						MI_ARB_CHECK);
 		if (IS_ERR(rq)) {
@@ -205,6 +211,9 @@ static int live_late_preempt(void *arg)
 	for_each_engine(engine, i915, id) {
 		struct i915_request *rq;
 
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
 		rq = igt_spinner_create_request(&spin_lo, ctx_lo, engine,
 						MI_ARB_CHECK);
 		if (IS_ERR(rq)) {
@@ -337,6 +346,9 @@ static int live_suppress_self_preempt(void *arg)
 		struct i915_request *rq_a, *rq_b;
 		int depth;
 
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
 		engine->execlists.preempt_hang.count = 0;
 
 		rq_a = igt_spinner_create_request(&a.spin,
@@ -483,6 +495,9 @@ static int live_suppress_wait_preempt(void *arg)
 	for_each_engine(engine, i915, id) {
 		int depth;
 
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
 		if (!engine->emit_init_breadcrumb)
 			continue;
 
@@ -604,6 +619,9 @@ static int live_chain_preempt(void *arg)
 		};
 		int count, i;
 
+		if (!intel_engine_has_preemption(engine))
+			continue;
+
 		for_each_prime_number_from(count, 1, 32) { /* must fit ring! */
 			struct i915_request *rq;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* [PATCH 38/38] drm/i915/execlists: Skip direct submission if only lite-restore
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (35 preceding siblings ...)
  2019-03-01 14:04 ` [PATCH 37/38] drm/i915/selftests: Check preemption support on each engine Chris Wilson
@ 2019-03-01 14:04 ` Chris Wilson
  2019-03-01 14:15 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption Patchwork
                   ` (3 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 14:04 UTC (permalink / raw)
  To: intel-gfx

If we resubmitting the active context, simply skip the submission as
performing the submission from the interrupt handler has higher
throughput than continually provoking lite-restores. If however, we find
ourselves with a new client, we check whether or not we can dequeue into
the second port or to resolve preemption.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c | 24 +++++++++++++++++++-----
 1 file changed, 19 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a14f9d90bf7c..507eefd55676 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1390,12 +1390,26 @@ static void __submit_queue_imm(struct intel_engine_cs *engine)
 		tasklet_hi_schedule(&execlists->tasklet);
 }
 
-static void submit_queue(struct intel_engine_cs *engine, int prio)
+static bool inflight(const struct intel_engine_execlists *execlists,
+		     const struct i915_request *rq)
 {
-	if (prio > engine->execlists.queue_priority_hint) {
-		engine->execlists.queue_priority_hint = prio;
+	const struct i915_request *active = port_request(execlists->port);
+
+	return active && active->hw_context == rq->hw_context;
+}
+
+static void submit_queue(struct intel_engine_cs *engine,
+			 const struct i915_request *rq)
+{
+	struct intel_engine_execlists *execlists = &engine->execlists;
+
+	if (rq_prio(rq) <= execlists->queue_priority_hint)
+		return;
+
+	execlists->queue_priority_hint = rq_prio(rq);
+
+	if (!inflight(execlists, rq))
 		__submit_queue_imm(engine);
-	}
 }
 
 static void execlists_submit_request(struct i915_request *request)
@@ -1411,7 +1425,7 @@ static void execlists_submit_request(struct i915_request *request)
 	GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 	GEM_BUG_ON(list_empty(&request->sched.link));
 
-	submit_queue(engine, rq_prio(request));
+	submit_queue(engine, request);
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 88+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (36 preceding siblings ...)
  2019-03-01 14:04 ` [PATCH 38/38] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
@ 2019-03-01 14:15 ` Patchwork
  2019-03-01 14:32 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  40 siblings, 0 replies; 88+ messages in thread
From: Patchwork @ 2019-03-01 14:15 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
URL   : https://patchwork.freedesktop.org/series/57427/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
5e23d902d35d drm/i915/execlists: Suppress redundant preemption
2bf4e3894cc3 drm/i915: Introduce i915_timeline.mutex
bf3a70770514 drm/i915: Keep timeline HWSP allocated until idle across the system
183c1d46d005 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
f1ab995ada5f drm/i915: Prioritise non-busywait semaphore workloads
7da75240f892 drm/i915/selftests: Check that whitelisted registers are accessible
fb6dd69a3faf drm/i915: Force GPU idle on suspend
bd82e2da337c drm/i915/selftests: Improve switch-to-kernel-context checking
c3674ddd02ad drm/i915: Do a synchronous switch-to-kernel-context on idling
e210c30d893c drm/i915: Store the BIT(engine->id) as the engine's mask
-:262: CHECK:SPACING: No space is necessary after a cast
#262: FILE: drivers/gpu/drm/i915/intel_device_info.c:741:
+	BUILD_BUG_ON(BITS_PER_TYPE(intel_engine_mask_t) < I915_NUM_ENGINES);

total: 0 errors, 0 warnings, 1 checks, 520 lines checked
40c040d7f36a drm/i915: Refactor common code to load initial power context
3344a6270d75 drm/i915: Reduce presumption of request ordering for barriers
0fbe8b734da4 drm/i915: Remove has-kernel-context
14c910b68d83 drm/i915: Introduce the i915_user_extension_method
-:58: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#58: 
new file mode 100644

-:63: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#63: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:1:
+/*

-:112: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#112: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:1:
+/*

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

total: 0 errors, 3 warnings, 3 checks, 109 lines checked
0b25c64b182f drm/i915: Track active engines within a context
-:110: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#110: FILE: drivers/gpu/drm/i915/i915_gem_context.h:167:
+	struct mutex mutex;

total: 0 errors, 0 warnings, 1 checks, 198 lines checked
0baaabfdc6c8 drm/i915: Introduce a context barrier callback
f8ec2c8c0830 drm/i915: Create/destroy VM (ppGTT) for use with contexts
-:37: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#37: FILE: drivers/gpu/drm/i915/i915_drv.h:221:
+	struct mutex vm_lock;

-:551: WARNING:LINE_SPACING: Missing a blank line after declarations
#551: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:503:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:627: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#627: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:
+		ncontexts = dw = 0;

-:678: WARNING:LINE_SPACING: Missing a blank line after declarations
#678: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:610:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:758: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#758: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:686:
+		ncontexts = dw = 0;

-:876: WARNING:LONG_LINE: line over 100 characters
#876: FILE: include/uapi/drm/i915_drm.h:405:
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)

-:877: WARNING:LONG_LINE: line over 100 characters
#877: FILE: include/uapi/drm/i915_drm.h:406:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:877: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#877: FILE: include/uapi/drm/i915_drm.h:406:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:877: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#877: FILE: include/uapi/drm/i915_drm.h:406:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

total: 1 errors, 5 warnings, 3 checks, 834 lines checked
efcf633f1dbb drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
-:27: WARNING:LONG_LINE: line over 100 characters
#27: FILE: drivers/gpu/drm/i915/i915_drv.c:3000:
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),

-:542: WARNING:LONG_LINE: line over 100 characters
#542: FILE: include/uapi/drm/i915_drm.h:395:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:542: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#542: FILE: include/uapi/drm/i915_drm.h:395:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:542: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#542: FILE: include/uapi/drm/i915_drm.h:395:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

total: 1 errors, 3 warnings, 0 checks, 688 lines checked
2e12e535df62 drm/i915: Allow contexts to share a single timeline across all engines
4b3dbd308a4d drm/i915: Allow userspace to clone contexts on creation
5aae2005f1de drm/i915: Fix I915_EXEC_RING_MASK
e7c03aa42717 drm/i915: Remove last traces of exec-id (GEM_BUSY)
fa712cd9fcd4 drm/i915: Re-arrange execbuf so context is known before engine
481ec94ce3cc drm/i915: Allow a context to define its set of engines
40e466b00076 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
ae2d6298740e drm/i915: Pass around the intel_context
53195d9716e1 drm/i915: Split struct intel_context definition to its own header
-:297: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#297: 
new file mode 100644

-:302: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#302: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:1:
+/*

-:437: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#437: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:136:
+	struct mutex mutex;

-:472: CHECK:LINE_SPACING: Please don't use multiple blank lines
#472: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:171:
+
+

-:579: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#579: FILE: drivers/gpu/drm/i915/i915_timeline_types.h:1:
+/*

-:603: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#603: FILE: drivers/gpu/drm/i915/i915_timeline_types.h:25:
+	spinlock_t lock;

-:665: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#665: FILE: drivers/gpu/drm/i915/intel_context.h:1:
+/*

-:718: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#718: FILE: drivers/gpu/drm/i915/intel_context_types.h:1:
+/*

-:784: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#784: FILE: drivers/gpu/drm/i915/intel_engine_types.h:1:
+/*

-:1072: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#1072: FILE: drivers/gpu/drm/i915/intel_engine_types.h:289:
+		spinlock_t irq_lock;

-:1288: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1288: FILE: drivers/gpu/drm/i915/intel_engine_types.h:505:
+#define instdone_slice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)

-:1292: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1292: FILE: drivers/gpu/drm/i915/intel_engine_types.h:509:
+#define instdone_subslice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])

-:1296: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1296: FILE: drivers/gpu/drm/i915/intel_engine_types.h:513:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1296: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'slice__' - possible side-effects?
#1296: FILE: drivers/gpu/drm/i915/intel_engine_types.h:513:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1296: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'subslice__' - possible side-effects?
#1296: FILE: drivers/gpu/drm/i915/intel_engine_types.h:513:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1878: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#1878: FILE: drivers/gpu/drm/i915/intel_workarounds_types.h:1:
+/*

total: 0 errors, 7 warnings, 9 checks, 1821 lines checked
c34e28878e18 drm/i915: Store the intel_context_ops in the intel_engine_cs
b2525b1af2fa drm/i915: Move over to intel_context_lookup()
-:286: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#286: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:151:
+	spinlock_t hw_contexts_lock;

-:324: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#324: 
new file mode 100644

-:329: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#329: FILE: drivers/gpu/drm/i915/intel_context.c:1:
+/*

total: 0 errors, 2 warnings, 1 checks, 647 lines checked
631bd21a494b drm/i915: Make context pinning part of intel_context_ops
b8e8c0a9719d drm/i915: Track the pinned kernel contexts on each engine
6ccc7beebb9c drm/i915: Introduce intel_context.pin_mutex for pin management
-:243: WARNING:MEMORY_BARRIER: memory barrier without comment
#243: FILE: drivers/gpu/drm/i915/intel_context.c:157:
+		smp_mb__before_atomic();

total: 0 errors, 1 warnings, 0 checks, 387 lines checked
9a20e5c80942 drm/i915: Load balancing across a virtual engine
-:900: WARNING:LINE_SPACING: Missing a blank line after declarations
#900: FILE: drivers/gpu/drm/i915/intel_lrc.c:3342:
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);

total: 0 errors, 1 warnings, 0 checks, 1108 lines checked
eb7f9c7d8fed drm/i915: Extend execution fence to support a callback
43190fadc3e7 drm/i915/execlists: Virtual engine bonding
7988f06b0790 drm/i915: Allow specification of parallel execbuf
3b2bef0b3c17 drm/i915/selftests: Check preemption support on each engine
800b0544e514 drm/i915/execlists: Skip direct submission if only lite-restore

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (37 preceding siblings ...)
  2019-03-01 14:15 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption Patchwork
@ 2019-03-01 14:32 ` Patchwork
  2019-03-01 14:36 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-03-01 18:03 ` ✗ Fi.CI.IGT: failure " Patchwork
  40 siblings, 0 replies; 88+ messages in thread
From: Patchwork @ 2019-03-01 14:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
URL   : https://patchwork.freedesktop.org/series/57427/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915: Introduce i915_timeline.mutex
Okay!

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:351:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:351:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915/selftests: Check that whitelisted registers are accessible
Okay!

Commit: drm/i915: Force GPU idle on suspend
Okay!

Commit: drm/i915/selftests: Improve switch-to-kernel-context checking
Okay!

Commit: drm/i915: Do a synchronous switch-to-kernel-context on idling
Okay!

Commit: drm/i915: Store the BIT(engine->id) as the engine's mask
Okay!

Commit: drm/i915: Refactor common code to load initial power context
Okay!

Commit: drm/i915: Reduce presumption of request ordering for barriers
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3566:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)

Commit: drm/i915: Remove has-kernel-context
Okay!

Commit: drm/i915: Introduce the i915_user_extension_method
Okay!

Commit: drm/i915: Track active engines within a context
Okay!

Commit: drm/i915: Introduce a context barrier callback
Okay!

Commit: drm/i915: Create/destroy VM (ppGTT) for use with contexts
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3570:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1264:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1264:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)

Commit: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Okay!

Commit: drm/i915: Allow contexts to share a single timeline across all engines
Okay!

Commit: drm/i915: Allow userspace to clone contexts on creation
Okay!

Commit: drm/i915: Fix I915_EXEC_RING_MASK
Okay!

Commit: drm/i915: Remove last traces of exec-id (GEM_BUSY)
Okay!

Commit: drm/i915: Re-arrange execbuf so context is known before engine
Okay!

Commit: drm/i915: Allow a context to define its set of engines
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
Okay!

Commit: drm/i915: Pass around the intel_context
Okay!

Commit: drm/i915: Split struct intel_context definition to its own header
Okay!

Commit: drm/i915: Store the intel_context_ops in the intel_engine_cs
Okay!

Commit: drm/i915: Move over to intel_context_lookup()
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Make context pinning part of intel_context_ops
Okay!

Commit: drm/i915: Track the pinned kernel contexts on each engine
Okay!

Commit: drm/i915: Introduce intel_context.pin_mutex for pin management
+drivers/gpu/drm/i915/intel_context.c:113:22: warning: context imbalance in 'intel_context_pin_lock' - wrong count at exit

Commit: drm/i915: Load balancing across a virtual engine
+./include/linux/overflow.h:285:13: error: incorrect type in conditional
+./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/overflow.h:285:13:    got void
+./include/linux/overflow.h:285:13: warning: call with no type!
+./include/linux/overflow.h:287:13: error: incorrect type in conditional
+./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow'
+./include/linux/overflow.h:287:13:    got void
+./include/linux/overflow.h:287:13: warning: call with no type!

Commit: drm/i915: Extend execution fence to support a callback
Okay!

Commit: drm/i915/execlists: Virtual engine bonding
Okay!

Commit: drm/i915: Allow specification of parallel execbuf
Okay!

Commit: drm/i915/selftests: Check preemption support on each engine
Okay!

Commit: drm/i915/execlists: Skip direct submission if only lite-restore
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (38 preceding siblings ...)
  2019-03-01 14:32 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-03-01 14:36 ` Patchwork
  2019-03-01 18:03 ` ✗ Fi.CI.IGT: failure " Patchwork
  40 siblings, 0 replies; 88+ messages in thread
From: Patchwork @ 2019-03-01 14:36 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
URL   : https://patchwork.freedesktop.org/series/57427/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5676 -> Patchwork_12343
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/57427/revisions/1/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12343 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_cs_nop@fork-gfx0:
    - fi-icl-u2:          NOTRUN -> SKIP [fdo#109315] +17

  * igt@gem_exec_basic@readonly-bsd1:
    - fi-icl-u2:          NOTRUN -> SKIP [fdo#109276] +7

  * igt@gem_exec_basic@readonly-bsd2:
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] +76

  * igt@gem_exec_parse@basic-allowed:
    - fi-icl-u2:          NOTRUN -> SKIP [fdo#109289] +1

  * igt@i915_selftest@live_contexts:
    - fi-icl-u2:          NOTRUN -> DMESG-FAIL [fdo#108569]

  * igt@kms_busy@basic-flip-c:
    - fi-blb-e6850:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_chamelium@dp-edid-read:
    - fi-icl-u2:          NOTRUN -> SKIP [fdo#109316] +2

  * igt@kms_chamelium@vga-hpd-fast:
    - fi-icl-u2:          NOTRUN -> SKIP [fdo#109309] +1

  * igt@kms_force_connector_basic@prune-stale-modes:
    - fi-icl-u2:          NOTRUN -> SKIP [fdo#109285] +3

  * igt@kms_frontbuffer_tracking@basic:
    - fi-icl-u2:          NOTRUN -> FAIL [fdo#103167]

  * igt@kms_pipe_crc_basic@hang-read-crc-pipe-c:
    - fi-blb-e6850:       NOTRUN -> SKIP [fdo#109271] +48

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - fi-bsw-kefka:       SKIP [fdo#109271] -> PASS

  * igt@i915_pm_rpm@basic-rte:
    - fi-bsw-kefka:       FAIL [fdo#108800] -> PASS

  
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108569]: https://bugs.freedesktop.org/show_bug.cgi?id=108569
  [fdo#108800]: https://bugs.freedesktop.org/show_bug.cgi?id=108800
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109285]: https://bugs.freedesktop.org/show_bug.cgi?id=109285
  [fdo#109289]: https://bugs.freedesktop.org/show_bug.cgi?id=109289
  [fdo#109309]: https://bugs.freedesktop.org/show_bug.cgi?id=109309
  [fdo#109315]: https://bugs.freedesktop.org/show_bug.cgi?id=109315
  [fdo#109316]: https://bugs.freedesktop.org/show_bug.cgi?id=109316


Participating hosts (44 -> 38)
------------------------------

  Additional (2): fi-icl-u2 fi-pnv-d510 
  Missing    (8): fi-ilk-m540 fi-hsw-4200u fi-byt-j1900 fi-byt-squawks fi-bsw-cyan fi-gdg-551 fi-icl-y fi-byt-clapper 


Build changes
-------------

    * Linux: CI_DRM_5676 -> Patchwork_12343

  CI_DRM_5676: 3911a5d7d3de6d8e491868bb0cd506346131d71b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4866: 189956af183c245eb237b3be4fa22953ec93bbe0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12343: 800b0544e5143742e54cbcab646cdbd525895733 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

800b0544e514 drm/i915/execlists: Skip direct submission if only lite-restore
3b2bef0b3c17 drm/i915/selftests: Check preemption support on each engine
7988f06b0790 drm/i915: Allow specification of parallel execbuf
43190fadc3e7 drm/i915/execlists: Virtual engine bonding
eb7f9c7d8fed drm/i915: Extend execution fence to support a callback
9a20e5c80942 drm/i915: Load balancing across a virtual engine
6ccc7beebb9c drm/i915: Introduce intel_context.pin_mutex for pin management
b8e8c0a9719d drm/i915: Track the pinned kernel contexts on each engine
631bd21a494b drm/i915: Make context pinning part of intel_context_ops
b2525b1af2fa drm/i915: Move over to intel_context_lookup()
c34e28878e18 drm/i915: Store the intel_context_ops in the intel_engine_cs
53195d9716e1 drm/i915: Split struct intel_context definition to its own header
ae2d6298740e drm/i915: Pass around the intel_context
40e466b00076 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
481ec94ce3cc drm/i915: Allow a context to define its set of engines
fa712cd9fcd4 drm/i915: Re-arrange execbuf so context is known before engine
e7c03aa42717 drm/i915: Remove last traces of exec-id (GEM_BUSY)
5aae2005f1de drm/i915: Fix I915_EXEC_RING_MASK
4b3dbd308a4d drm/i915: Allow userspace to clone contexts on creation
2e12e535df62 drm/i915: Allow contexts to share a single timeline across all engines
efcf633f1dbb drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
f8ec2c8c0830 drm/i915: Create/destroy VM (ppGTT) for use with contexts
0baaabfdc6c8 drm/i915: Introduce a context barrier callback
0b25c64b182f drm/i915: Track active engines within a context
14c910b68d83 drm/i915: Introduce the i915_user_extension_method
0fbe8b734da4 drm/i915: Remove has-kernel-context
3344a6270d75 drm/i915: Reduce presumption of request ordering for barriers
40c040d7f36a drm/i915: Refactor common code to load initial power context
e210c30d893c drm/i915: Store the BIT(engine->id) as the engine's mask
c3674ddd02ad drm/i915: Do a synchronous switch-to-kernel-context on idling
bd82e2da337c drm/i915/selftests: Improve switch-to-kernel-context checking
fb6dd69a3faf drm/i915: Force GPU idle on suspend
7da75240f892 drm/i915/selftests: Check that whitelisted registers are accessible
f1ab995ada5f drm/i915: Prioritise non-busywait semaphore workloads
183c1d46d005 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
bf3a70770514 drm/i915: Keep timeline HWSP allocated until idle across the system
2bf4e3894cc3 drm/i915: Introduce i915_timeline.mutex
5e23d902d35d drm/i915/execlists: Suppress redundant preemption

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12343/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex
  2019-03-01 14:03 ` [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex Chris Wilson
@ 2019-03-01 15:09   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> A simple mutex used for guarding the flow of requests in and out of the
> timeline. In the short-term, it will be used only to guard the addition
> of requests into the timeline, taken on alloc and released on commit so
> that only one caller can construct a request into the timeline
> (important as the seqno and ring pointers must be serialised). This will
> be used by observers to ensure that the seqno/hwsp is stable. Later,
> when we have reduced retiring to only operate on a single timeline at a
> time, we can then use the mutex as the sole guard required for retiring.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c            | 6 +++++-
>   drivers/gpu/drm/i915/i915_timeline.c           | 1 +
>   drivers/gpu/drm/i915/i915_timeline.h           | 2 ++
>   drivers/gpu/drm/i915/selftests/i915_request.c  | 4 +---
>   drivers/gpu/drm/i915/selftests/mock_timeline.c | 1 +
>   5 files changed, 10 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c65f6c990fdd..719d1a5ab082 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -563,6 +563,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   		return ERR_CAST(ce);
>   
>   	reserve_gt(i915);
> +	mutex_lock(&ce->ring->timeline->mutex);
>   
>   	/* Move our oldest request to the slab-cache (if not in use!) */
>   	rq = list_first_entry(&ce->ring->request_list, typeof(*rq), ring_link);
> @@ -688,6 +689,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   
>   	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
> +	mutex_unlock(&ce->ring->timeline->mutex);
>   	unreserve_gt(i915);
>   	intel_context_unpin(ce);
>   	return ERR_PTR(ret);
> @@ -880,7 +882,7 @@ void i915_request_add(struct i915_request *request)
>   	GEM_TRACE("%s fence %llx:%lld\n",
>   		  engine->name, request->fence.context, request->fence.seqno);
>   
> -	lockdep_assert_held(&request->i915->drm.struct_mutex);
> +	lockdep_assert_held(&request->timeline->mutex);
>   	trace_i915_request_add(request);
>   
>   	/*
> @@ -991,6 +993,8 @@ void i915_request_add(struct i915_request *request)
>   	 */
>   	if (prev && i915_request_completed(prev))
>   		i915_request_retire_upto(prev);
> +
> +	mutex_unlock(&request->timeline->mutex);
>   }
>   
>   static unsigned long local_clock_us(unsigned int *cpu)
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index b2202d2e58a2..87a80558da28 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -162,6 +162,7 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	timeline->fence_context = dma_fence_context_alloc(1);
>   
>   	spin_lock_init(&timeline->lock);
> +	mutex_init(&timeline->mutex);
>   
>   	INIT_ACTIVE_REQUEST(&timeline->barrier);
>   	INIT_ACTIVE_REQUEST(&timeline->last_request);
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 7bec7d2e45bf..36c3849f7108 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -44,6 +44,8 @@ struct i915_timeline {
>   #define TIMELINE_CLIENT 0 /* default subclass */
>   #define TIMELINE_ENGINE 1
>   
> +	struct mutex mutex; /* protects the flow of requests */
> +
>   	unsigned int pin_count;
>   	const u32 *hwsp_seqno;
>   	struct i915_vma *hwsp_ggtt;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 7da52e3d67af..7e1b65b8eb19 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -141,14 +141,12 @@ static int igt_fence_wait(void *arg)
>   		err = -ENOMEM;
>   		goto out_locked;
>   	}
> -	mutex_unlock(&i915->drm.struct_mutex); /* safe as we are single user */
>   
>   	if (dma_fence_wait_timeout(&request->fence, false, T) != -ETIME) {
>   		pr_err("fence wait success before submit (expected timeout)!\n");
> -		goto out_device;
> +		goto out_locked;
>   	}
>   
> -	mutex_lock(&i915->drm.struct_mutex);
>   	i915_request_add(request);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_timeline.c b/drivers/gpu/drm/i915/selftests/mock_timeline.c
> index d2de9ece2118..416d85233263 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_timeline.c
> @@ -14,6 +14,7 @@ void mock_timeline_init(struct i915_timeline *timeline, u64 context)
>   	timeline->fence_context = context;
>   
>   	spin_lock_init(&timeline->lock);
> +	mutex_init(&timeline->mutex);
>   
>   	INIT_ACTIVE_REQUEST(&timeline->barrier);
>   	INIT_ACTIVE_REQUEST(&timeline->last_request);
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 04/38] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-03-01 14:03 ` [PATCH 04/38] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-03-01 15:11   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> Having introduced per-context seqno, we now have a means to identity
> progress across the system without feel of rollback as befell the
> global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
> advance of submission safe in the knowledge that our target seqno and
> address is stable.
> 
> However, since we are telling the GPU to busy-spin on the target address
> until it matches the signaling seqno, we only want to do so when we are
> sure that busy-spin will be completed quickly. To achieve this we only
> submit the request to HW once the signaler is itself executing (modulo
> preemption causing us to wait longer), and we only do so for default and
> above priority requests (so that idle priority tasks never themselves
> hog the GPU waiting for others).
> 
> As might be reasonably expected, HW semaphores excel in inter-engine
> synchronisation microbenchmarks (where the 3x reduced latency / increased
> throughput more than offset the power cost of spinning on a second ring)
> and have significant improvement (can be up to ~10%, most see no change)
> for single clients that utilize multiple engines (typically media players
> and transcoders), without regressing multiple clients that can saturate
> the system or changing the power envelope dramatically.
> 
> v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
> v4: Tell the world and include it as part of scheduler caps.
> 
> Testcase: igt/gem_exec_whisper
> Testcase: igt/benchmarks/gem_wsim
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c           |   2 +-
>   drivers/gpu/drm/i915/i915_request.c       | 138 +++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_request.h       |   1 +
>   drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
>   drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
>   drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
>   drivers/gpu/drm/i915/intel_gpu_commands.h |   9 +-
>   drivers/gpu/drm/i915/intel_lrc.c          |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
>   include/uapi/drm/i915_drm.h               |   1 +
>   10 files changed, 160 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index c6354f6cdbdb..c08abdef5eb6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -351,7 +351,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
>   		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
>   		break;
>   	case I915_PARAM_HAS_SEMAPHORES:
> -		value = 0;
> +		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
>   		break;
>   	case I915_PARAM_HAS_SECURE_BATCHES:
>   		value = capable(CAP_SYS_ADMIN);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index d354967d6ae8..59e30b8c4ee9 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -22,8 +22,9 @@
>    *
>    */
>   
> -#include <linux/prefetch.h>
>   #include <linux/dma-fence-array.h>
> +#include <linux/irq_work.h>
> +#include <linux/prefetch.h>
>   #include <linux/sched.h>
>   #include <linux/sched/clock.h>
>   #include <linux/sched/signal.h>
> @@ -32,9 +33,16 @@
>   #include "i915_active.h"
>   #include "i915_reset.h"
>   
> +struct execute_cb {
> +	struct list_head link;
> +	struct irq_work work;
> +	struct i915_sw_fence *fence;
> +};
> +
>   static struct i915_global_request {
>   	struct kmem_cache *slab_requests;
>   	struct kmem_cache *slab_dependencies;
> +	struct kmem_cache *slab_execute_cbs;
>   } global;
>   
>   static const char *i915_fence_get_driver_name(struct dma_fence *fence)
> @@ -325,6 +333,69 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	} while (tmp != rq);
>   }
>   
> +static void irq_execute_cb(struct irq_work *wrk)
> +{
> +	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
> +
> +	i915_sw_fence_complete(cb->fence);
> +	kmem_cache_free(global.slab_execute_cbs, cb);
> +}
> +
> +static void __notify_execute_cb(struct i915_request *rq)
> +{
> +	struct execute_cb *cb;
> +
> +	lockdep_assert_held(&rq->lock);
> +
> +	if (list_empty(&rq->execute_cb))
> +		return;
> +
> +	list_for_each_entry(cb, &rq->execute_cb, link)
> +		irq_work_queue(&cb->work);
> +
> +	/*
> +	 * XXX Rollback on __i915_request_unsubmit()
> +	 *
> +	 * In the future, perhaps when we have an active time-slicing scheduler,
> +	 * it will be interesting to unsubmit parallel execution and remove
> +	 * busywaits from the GPU until their master is restarted. This is
> +	 * quite hairy, we have to carefully rollback the fence and do a
> +	 * preempt-to-idle cycle on the target engine, all the while the
> +	 * master execute_cb may refire.
> +	 */
> +	INIT_LIST_HEAD(&rq->execute_cb);
> +}
> +
> +static int
> +i915_request_await_execution(struct i915_request *rq,
> +			     struct i915_request *signal,
> +			     gfp_t gfp)
> +{
> +	struct execute_cb *cb;
> +
> +	if (i915_request_is_active(signal))
> +		return 0;
> +
> +	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
> +	if (!cb)
> +		return -ENOMEM;
> +
> +	cb->fence = &rq->submit;
> +	i915_sw_fence_await(cb->fence);
> +	init_irq_work(&cb->work, irq_execute_cb);
> +
> +	spin_lock_irq(&signal->lock);
> +	if (i915_request_is_active(signal)) {
> +		i915_sw_fence_complete(cb->fence);
> +		kmem_cache_free(global.slab_execute_cbs, cb);
> +	} else {
> +		list_add_tail(&cb->link, &signal->execute_cb);
> +	}
> +	spin_unlock_irq(&signal->lock);
> +
> +	return 0;
> +}
> +
>   static void move_to_timeline(struct i915_request *request,
>   			     struct i915_timeline *timeline)
>   {
> @@ -361,6 +432,8 @@ void __i915_request_submit(struct i915_request *request)
>   	    !i915_request_enable_breadcrumb(request))
>   		intel_engine_queue_breadcrumbs(engine);
>   
> +	__notify_execute_cb(request);
> +
>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> @@ -608,6 +681,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	}
>   
>   	INIT_LIST_HEAD(&rq->active_list);
> +	INIT_LIST_HEAD(&rq->execute_cb);
>   
>   	tl = ce->ring->timeline;
>   	ret = i915_timeline_get_seqno(tl, rq, &seqno);
> @@ -696,6 +770,52 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	return ERR_PTR(ret);
>   }
>   
> +static int
> +emit_semaphore_wait(struct i915_request *to,
> +		    struct i915_request *from,
> +		    gfp_t gfp)
> +{
> +	u32 hwsp_offset;
> +	u32 *cs;
> +	int err;
> +
> +	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
> +	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
> +
> +	/* We need to pin the signaler's HWSP until we are finished reading. */
> +	err = i915_timeline_read_hwsp(from, to, &hwsp_offset);
> +	if (err)
> +		return err;
> +
> +	/* Only submit our spinner after the signaler is running! */
> +	err = i915_request_await_execution(to, from, gfp);
> +	if (err)
> +		return err;
> +
> +	cs = intel_ring_begin(to, 4);
> +	if (IS_ERR(cs))
> +		return PTR_ERR(cs);
> +
> +	/*
> +	 * Using greater-than-or-equal here means we have to worry
> +	 * about seqno wraparound. To side step that issue, we swap
> +	 * the timeline HWSP upon wrapping, so that everyone listening
> +	 * for the old (pre-wrap) values do not see the much smaller
> +	 * (post-wrap) values than they were expecting (and so wait
> +	 * forever).
> +	 */
> +	*cs++ = MI_SEMAPHORE_WAIT |
> +		MI_SEMAPHORE_GLOBAL_GTT |
> +		MI_SEMAPHORE_POLL |
> +		MI_SEMAPHORE_SAD_GTE_SDD;
> +	*cs++ = from->fence.seqno;
> +	*cs++ = hwsp_offset;
> +	*cs++ = 0;
> +
> +	intel_ring_advance(to, cs);
> +	return 0;
> +}
> +
>   static int
>   i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   {
> @@ -717,6 +837,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
>   						       &from->submit,
>   						       I915_FENCE_GFP);
> +	} else if (intel_engine_has_semaphores(to->engine) &&
> +		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
> +		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
>   	} else {
>   		ret = i915_sw_fence_await_dma_fence(&to->submit,
>   						    &from->fence, 0,
> @@ -1208,14 +1331,23 @@ int __init i915_global_request_init(void)
>   	if (!global.slab_requests)
>   		return -ENOMEM;
>   
> +	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
> +					     SLAB_HWCACHE_ALIGN |
> +					     SLAB_RECLAIM_ACCOUNT |
> +					     SLAB_TYPESAFE_BY_RCU);
> +	if (!global.slab_execute_cbs)
> +		goto err_requests;
> +
>   	global.slab_dependencies = KMEM_CACHE(i915_dependency,
>   					      SLAB_HWCACHE_ALIGN |
>   					      SLAB_RECLAIM_ACCOUNT);
>   	if (!global.slab_dependencies)
> -		goto err_requests;
> +		goto err_execute_cbs;
>   
>   	return 0;
>   
> +err_execute_cbs:
> +	kmem_cache_destroy(global.slab_execute_cbs);
>   err_requests:
>   	kmem_cache_destroy(global.slab_requests);
>   	return -ENOMEM;
> @@ -1224,11 +1356,13 @@ int __init i915_global_request_init(void)
>   void i915_global_request_shrink(void)
>   {
>   	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_execute_cbs);
>   	kmem_cache_shrink(global.slab_requests);
>   }
>   
>   void i915_global_request_exit(void)
>   {
>   	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_execute_cbs);
>   	kmem_cache_destroy(global.slab_requests);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index ea1e6f0ade53..3e0d62d35226 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -129,6 +129,7 @@ struct i915_request {
>   	 */
>   	struct i915_sw_fence submit;
>   	wait_queue_entry_t submitq;
> +	struct list_head execute_cb;
>   
>   	/*
>   	 * A list of everyone we wait upon, and everyone who waits upon us.
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 7c58b049ecb5..8d1400d378d7 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
>   	__i915_sw_fence_notify(fence, FENCE_FREE);
>   }
>   
> -static void i915_sw_fence_complete(struct i915_sw_fence *fence)
> +void i915_sw_fence_complete(struct i915_sw_fence *fence)
>   {
>   	debug_fence_assert(fence);
>   
> @@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
>   	__i915_sw_fence_complete(fence, NULL);
>   }
>   
> -static void i915_sw_fence_await(struct i915_sw_fence *fence)
> +void i915_sw_fence_await(struct i915_sw_fence *fence)
>   {
>   	debug_fence_assert(fence);
>   	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 0e055ea0179f..6dec9e1d1102 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    unsigned long timeout,
>   				    gfp_t gfp);
>   
> +void i915_sw_fence_await(struct i915_sw_fence *fence);
> +void i915_sw_fence_complete(struct i915_sw_fence *fence);
> +
>   static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
>   {
>   	return atomic_read(&fence->pending) <= 0;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index df8f88142f1d..4fc7e2ac6278 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -616,6 +616,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
>   	} map[] = {
>   #define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
>   		MAP(PREEMPTION, PREEMPTION),
> +		MAP(SEMAPHORES, SEMAPHORES),
>   #undef MAP
>   	};
>   	struct intel_engine_cs *engine;
> diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
> index b96a31bc1080..a34ece53a771 100644
> --- a/drivers/gpu/drm/i915/intel_gpu_commands.h
> +++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
> @@ -105,8 +105,13 @@
>   #define MI_SEMAPHORE_SIGNAL	MI_INSTR(0x1b, 0) /* GEN8+ */
>   #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
>   #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
> -#define   MI_SEMAPHORE_POLL		(1<<15)
> -#define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
> +#define   MI_SEMAPHORE_POLL		(1 << 15)
> +#define   MI_SEMAPHORE_SAD_GT_SDD	(0 << 12)
> +#define   MI_SEMAPHORE_SAD_GTE_SDD	(1 << 12)
> +#define   MI_SEMAPHORE_SAD_LT_SDD	(2 << 12)
> +#define   MI_SEMAPHORE_SAD_LTE_SDD	(3 << 12)
> +#define   MI_SEMAPHORE_SAD_EQ_SDD	(4 << 12)
> +#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5 << 12)
>   #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
>   #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
>   #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f57cfe2fc078..53d6f7fdb50e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2324,6 +2324,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->park = NULL;
>   	engine->unpark = NULL;
>   
> +	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b8ec7e40a59b..2e7264119ec4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -499,6 +499,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> +#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
>   	unsigned int flags;
>   
>   	/*
> @@ -576,6 +577,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +static inline bool
> +intel_engine_has_semaphores(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> +}
> +
>   void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
>   
>   static inline bool __execlists_need_preempt(int prio, int last)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 8304a7f1ec3f..b10eea3f6d24 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -479,6 +479,7 @@ typedef struct drm_i915_irq_wait {
>   #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
>   #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
>   #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
> +#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
>   
>   #define I915_PARAM_HUC_STATUS		 42
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 05/38] drm/i915: Prioritise non-busywait semaphore workloads
  2019-03-01 14:03 ` [PATCH 05/38] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-03-01 15:12   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> We don't want to busywait on the GPU if we have other work to do. If we
> give non-busywaiting workloads higher (initial) priority than workloads
> that require a busywait, we will prioritise work that is ready to run
> immediately. We then also have to be careful that we don't give earlier
> semaphores an accidental boost because later work doesn't wait on other
> rings, hence we keep a history of semaphore usage of the dependency chain.
> 
> v2: Stop rolling the bits into a chain and just use a flag in case this
> request or any of our dependencies use a semaphore. The rolling around
> was contagious as Tvrtko was heard to fall off his chair.
> 
> Testcase: igt/gem_exec_schedule/semaphore
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++++
>   drivers/gpu/drm/i915/i915_scheduler.c |  6 ++++++
>   drivers/gpu/drm/i915/i915_scheduler.h |  9 ++++++---
>   drivers/gpu/drm/i915/intel_lrc.c      |  2 +-
>   4 files changed, 29 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 59e30b8c4ee9..bcf3c1a155e2 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -813,6 +813,7 @@ emit_semaphore_wait(struct i915_request *to,
>   	*cs++ = 0;
>   
>   	intel_ring_advance(to, cs);
> +	to->sched.flags |= I915_SCHED_HAS_SEMAPHORE;
>   	return 0;
>   }
>   
> @@ -1083,6 +1084,21 @@ void i915_request_add(struct i915_request *request)
>   	if (engine->schedule) {
>   		struct i915_sched_attr attr = request->gem_context->sched;
>   
> +		/*
> +		 * Boost actual workloads past semaphores!
> +		 *
> +		 * With semaphores we spin on one engine waiting for another,
> +		 * simply to reduce the latency of starting our work when
> +		 * the signaler completes. However, if there is any other
> +		 * work that we could be doing on this engine instead, that
> +		 * is better utilisation and will reduce the overall duration
> +		 * of the current work. To avoid PI boosting a semaphore
> +		 * far in the distance past over useful work, we keep a history
> +		 * of any semaphore use along our dependency chain.
> +		 */
> +		if (!(request->sched.flags & I915_SCHED_HAS_SEMAPHORE))
> +			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
> +
>   		/*
>   		 * Boost priorities to new clients (new request flows).
>   		 *
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index 50018ad30233..8a64748a7912 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -39,6 +39,7 @@ void i915_sched_node_init(struct i915_sched_node *node)
>   	INIT_LIST_HEAD(&node->waiters_list);
>   	INIT_LIST_HEAD(&node->link);
>   	node->attr.priority = I915_PRIORITY_INVALID;
> +	node->flags = 0;
>   }
>   
>   static struct i915_dependency *
> @@ -69,6 +70,11 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   		dep->signaler = signal;
>   		dep->flags = flags;
>   
> +		/* Keep track of whether anyone on this chain has a semaphore */
> +		if (signal->flags & I915_SCHED_HAS_SEMAPHORE &&
> +		    !node_started(signal))
> +			node->flags |=  I915_SCHED_HAS_SEMAPHORE;
> +
>   		ret = true;
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 7d4a49750d92..6ce450cf63fa 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -24,14 +24,15 @@ enum {
>   	I915_PRIORITY_INVALID = INT_MIN
>   };
>   
> -#define I915_USER_PRIORITY_SHIFT 2
> +#define I915_USER_PRIORITY_SHIFT 3
>   #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
>   
>   #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
>   #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
>   
> -#define I915_PRIORITY_WAIT	((u8)BIT(0))
> -#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
> +#define I915_PRIORITY_WAIT		((u8)BIT(0))
> +#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
> +#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
>   
>   #define __NO_PREEMPTION (I915_PRIORITY_WAIT)
>   
> @@ -74,6 +75,8 @@ struct i915_sched_node {
>   	struct list_head waiters_list; /* those after us, they depend upon us */
>   	struct list_head link;
>   	struct i915_sched_attr attr;
> +	unsigned int flags;
> +#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
>   };
>   
>   struct i915_dependency {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 53d6f7fdb50e..2268860cca44 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -164,7 +164,7 @@
>   #define WA_TAIL_DWORDS 2
>   #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
>   
> -#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
> +#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
>   
>   static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   					    struct intel_engine_cs *engine,
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible
  2019-03-01 14:03 ` [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible Chris Wilson
@ 2019-03-01 15:18   ` Michał Winiarski
  2019-03-01 15:25     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Michał Winiarski @ 2019-03-01 15:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

On Fri, Mar 01, 2019 at 02:03:32PM +0000, Chris Wilson wrote:
> There is no point in whitelisting a register that the user then cannot
> write to, so check the register exists before merging such patches.
> 
> v2: Mark SLICE_COMMON_ECO_CHICKEN1 [731c] as write-only
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Dale B Stimson <dale.b.stimson@intel.com>
> Cc: Michał Winiarski <michal.winiarski@intel.com>
> ---
>  .../drm/i915/selftests/intel_workarounds.c    | 376 ++++++++++++++++++
>  1 file changed, 376 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> index e6ffc8ac22dc..33b3ced83fde 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> @@ -12,6 +12,14 @@
>  #include "igt_spinner.h"
>  #include "igt_wedge_me.h"
>  #include "mock_context.h"
> +#include "mock_drm.h"
> +
> +static const struct wo_register {
> +	enum intel_platform platform;
> +	u32 reg;
> +} wo_registers[] = {
> +	{ INTEL_GEMINILAKE, 0x731c }
> +};
>  
>  #define REF_NAME_MAX (INTEL_ENGINE_CS_MAX_NAME + 4)
>  struct wa_lists {
> @@ -331,6 +339,373 @@ static int check_whitelist_across_reset(struct intel_engine_cs *engine,
>  	return err;
>  }
>  
> +static struct i915_vma *create_scratch(struct i915_gem_context *ctx)
> +{
> +	struct drm_i915_gem_object *obj;
> +	struct i915_vma *vma;
> +	void *ptr;
> +	int err;
> +
> +	obj = i915_gem_object_create_internal(ctx->i915, PAGE_SIZE);
> +	if (IS_ERR(obj))
> +		return ERR_CAST(obj);
> +
> +	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);

Check return value.

> +
> +	ptr = i915_gem_object_pin_map(obj, I915_MAP_WB);
> +	if (IS_ERR(ptr)) {
> +		err = PTR_ERR(ptr);
> +		goto err_obj;
> +	}
> +	memset(ptr, 0xc5, PAGE_SIZE);
> +	i915_gem_object_unpin_map(obj);
> +
> +	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
> +	if (IS_ERR(vma)) {
> +		err = PTR_ERR(vma);
> +		goto err_obj;
> +	}
> +
> +	err = i915_vma_pin(vma, 0, 0, PIN_USER);
> +	if (err)
> +		goto err_obj;
> +
> +	err = i915_gem_object_set_to_cpu_domain(obj, false);
> +	if (err)
> +		goto err_obj;
> +
> +	return vma;
> +
> +err_obj:
> +	i915_gem_object_put(obj);
> +	return ERR_PTR(err);
> +}
> +

[SNIP]

> +static int check_dirty_whitelist(struct i915_gem_context *ctx,
> +				 struct intel_engine_cs *engine)
> +{
> +	const u32 values[] = {
> +		0x00000000,
> +		0x01010101,
> +		0x10100101,
> +		0x03030303,
> +		0x30300303,
> +		0x05050505,
> +		0x50500505,
> +		0x0f0f0f0f,
> +		0xf00ff00f,
> +		0x10101010,
> +		0xf0f01010,
> +		0x30303030,
> +		0xa0a03030,
> +		0x50505050,
> +		0xc0c05050,
> +		0xf0f0f0f0,
> +		0x11111111,
> +		0x33333333,
> +		0x55555555,
> +		0x0000ffff,
> +		0x00ff00ff,
> +		0xff0000ff,
> +		0xffff00ff,
> +		0xffffffff,
> +	};
> +	struct i915_vma *scratch;
> +	struct i915_vma *batch;
> +	int err = 0, i, v;
> +	u32 *cs;
> +
> +	scratch = create_scratch(ctx);
> +	if (IS_ERR(scratch))
> +		return PTR_ERR(scratch);
> +
> +	batch = create_batch(ctx);
> +	if (IS_ERR(batch)) {
> +		err = PTR_ERR(batch);
> +		goto out_scratch;
> +	}
> +
> +	for (i = 0; i < engine->whitelist.count; i++) {
> +		u32 reg = i915_mmio_reg_offset(engine->whitelist.list[i].reg);
> +		u64 addr = scratch->node.start;
> +		struct i915_request *rq;
> +		u32 srm, lrm, rsvd;
> +		u32 expect;
> +		int idx;
> +
> +		if (wo_register(engine, reg))
> +			continue;
> +
> +		srm = MI_STORE_REGISTER_MEM;
> +		lrm = MI_LOAD_REGISTER_MEM;
> +		if (INTEL_GEN(ctx->i915) >= 8)
> +			lrm++, srm++;
> +
> +		pr_debug("%s: Writing garbage to %x {srm=0x%08x, lrm=0x%08x}\n",
> +			 engine->name, reg, srm, lrm);

Why are we printing opcodes (srm/lrm)?

> +
> +		cs = i915_gem_object_pin_map(batch->obj, I915_MAP_WC);
> +		if (IS_ERR(cs)) {
> +			err = PTR_ERR(cs);
> +			goto out_batch;
> +		}
> +
> +		/* SRM original */
> +		*cs++ = srm;
> +		*cs++ = reg;
> +		*cs++ = lower_32_bits(addr);
> +		*cs++ = upper_32_bits(addr);
> +
> +		idx = 1;
> +		for (v = 0; v < ARRAY_SIZE(values); v++) {
> +			/* LRI garbage */
> +			*cs++ = MI_LOAD_REGISTER_IMM(1);
> +			*cs++ = reg;
> +			*cs++ = values[v];
> +
> +			/* SRM result */
> +			*cs++ = srm;
> +			*cs++ = reg;
> +			*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
> +			*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
> +			idx++;
> +		}
> +		for (v = 0; v < ARRAY_SIZE(values); v++) {
> +			/* LRI garbage */
> +			*cs++ = MI_LOAD_REGISTER_IMM(1);
> +			*cs++ = reg;
> +			*cs++ = ~values[v];
> +
> +			/* SRM result */
> +			*cs++ = srm;
> +			*cs++ = reg;
> +			*cs++ = lower_32_bits(addr + sizeof(u32) * idx);
> +			*cs++ = upper_32_bits(addr + sizeof(u32) * idx);
> +			idx++;
> +		}
> +		GEM_BUG_ON(idx * sizeof(u32) > scratch->size);
> +
> +		/* LRM original -- don't leave garbage in the context! */
> +		*cs++ = lrm;
> +		*cs++ = reg;
> +		*cs++ = lower_32_bits(addr);
> +		*cs++ = upper_32_bits(addr);
> +
> +		*cs++ = MI_BATCH_BUFFER_END;
> +
> +		i915_gem_object_unpin_map(batch->obj);
> +		i915_gem_chipset_flush(ctx->i915);
> +
> +		rq = i915_request_alloc(engine, ctx);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			goto out_batch;
> +		}
> +
> +		if (engine->emit_init_breadcrumb) { /* Be nice if we hang */
> +			err = engine->emit_init_breadcrumb(rq);
> +			if (err)
> +				goto err_request;
> +		}
> +
> +		err = engine->emit_bb_start(rq,
> +					    batch->node.start, PAGE_SIZE,
> +					    0);
> +		if (err)
> +			goto err_request;
> +
> +err_request:
> +		i915_request_add(rq);
> +		if (err)
> +			goto out_batch;
> +
> +		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
> +			pr_err("%s: Futzing %x timedout; cancelling test\n",
> +			       engine->name, reg);
> +			i915_gem_set_wedged(ctx->i915);
> +			err = -EIO;
> +			goto out_batch;
> +		}
> +
> +		cs = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB);
> +		if (IS_ERR(cs)) {
> +			err = PTR_ERR(cs);
> +			goto out_batch;
> +		}

We're already using cs for batch! Extra pointer pls.

> +
> +		GEM_BUG_ON(values[ARRAY_SIZE(values) - 1] != 0xffffffff);
> +		rsvd = cs[ARRAY_SIZE(values)]; /* detect write masking */

So we're writing 0xffffffff to get the mask. And there's a comment. And it will
explode if someone changes the last value.

Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

> +		if (!rsvd) {
> +			pr_err("%s: Unable to write to whitelisted register %x\n",
> +			       engine->name, reg);
> +			err = -EINVAL;
> +			goto out_unpin;
> +		}
> +
> +		expect = cs[0];
> +		idx = 1;
> +		for (v = 0; v < ARRAY_SIZE(values); v++) {
> +			expect = reg_write(expect, values[v], rsvd);
> +			if (cs[idx] != expect)
> +				err++;
> +			idx++;
> +		}
> +		for (v = 0; v < ARRAY_SIZE(values); v++) {
> +			expect = reg_write(expect, ~values[v], rsvd);
> +			if (cs[idx] != expect)
> +				err++;
> +			idx++;
> +		}
> +		if (err) {
> +			pr_err("%s: %d mismatch between values written to whitelisted register [%x], and values read back!\n",
> +			       engine->name, err, reg);
> +
> +			pr_info("%s: Whitelisted register: %x, original value %08x, rsvd %08x\n",
> +				engine->name, reg, cs[0], rsvd);
> +
> +			expect = cs[0];
> +			idx = 1;
> +			for (v = 0; v < ARRAY_SIZE(values); v++) {
> +				u32 w = values[v];
> +
> +				expect = reg_write(expect, w, rsvd);
> +				pr_info("Wrote %08x, read %08x, expect %08x\n",
> +					w, cs[idx], expect);
> +				idx++;
> +			}
> +			for (v = 0; v < ARRAY_SIZE(values); v++) {
> +				u32 w = ~values[v];
> +
> +				expect = reg_write(expect, w, rsvd);
> +				pr_info("Wrote %08x, read %08x, expect %08x\n",
> +					w, cs[idx], expect);
> +				idx++;
> +			}
> +
> +			err = -EINVAL;
> +		}
> +out_unpin:
> +		i915_gem_object_unpin_map(scratch->obj);
> +		if (err)
> +			break;
> +	}
> +
> +	if (igt_flush_test(ctx->i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +out_batch:
> +	i915_vma_unpin_and_release(&batch, 0);
> +out_scratch:
> +	i915_vma_unpin_and_release(&scratch, 0);
> +	return err;
> +}
> +
> +static int live_dirty_whitelist(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context *ctx;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	struct drm_file *file;
> +	int err = 0;
> +
> +	/* Can the user write to the whitelisted registers? */
> +
> +	if (INTEL_GEN(i915) < 7) /* minimum requirement for LRI, SRM, LRM */
> +		return 0;
> +
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	file = mock_file(i915);
> +	mutex_lock(&i915->drm.struct_mutex);
> +	if (IS_ERR(file)) {
> +		err = PTR_ERR(file);
> +		goto out_rpm;
> +	}
> +
> +	ctx = live_context(i915, file);
> +	if (IS_ERR(ctx)) {
> +		err = PTR_ERR(ctx);
> +		goto out_file;
> +	}
> +
> +	for_each_engine(engine, i915, id) {
> +		if (engine->whitelist.count == 0)
> +			continue;
> +
> +		err = check_dirty_whitelist(ctx, engine);
> +		if (err)
> +			goto out_file;
> +	}
> +
> +out_file:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	mock_file_free(i915, file);
> +	mutex_lock(&i915->drm.struct_mutex);
> +out_rpm:
> +	intel_runtime_pm_put(i915, wakeref);
> +	return err;
> +}
> +
>  static int live_reset_whitelist(void *arg)
>  {
>  	struct drm_i915_private *i915 = arg;
> @@ -504,6 +879,7 @@ live_engine_reset_gt_engine_workarounds(void *arg)
>  int intel_workarounds_live_selftests(struct drm_i915_private *i915)
>  {
>  	static const struct i915_subtest tests[] = {
> +		SUBTEST(live_dirty_whitelist),
>  		SUBTEST(live_reset_whitelist),
>  		SUBTEST(live_gpu_reset_gt_engine_workarounds),
>  		SUBTEST(live_engine_reset_gt_engine_workarounds),
> -- 
> 2.20.1
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 10/38] drm/i915: Store the BIT(engine->id) as the engine's mask
  2019-03-01 14:03 ` [PATCH 10/38] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
@ 2019-03-01 15:25   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:25 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> In the next patch, we are introducing a broad virtual engine to encompass
> multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
> reflect the broader set of engines implied by the virtual instance, lets
> store the full bitmask.
> 
> v2: Use intel_engine_mask_t (s/ring_mask/engine_mask/)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h               |  4 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |  2 +-
>   drivers/gpu/drm/i915/i915_pci.c               | 39 +++++++++++--------
>   drivers/gpu/drm/i915/i915_reset.c             |  8 ++--
>   drivers/gpu/drm/i915/intel_device_info.c      |  6 +--
>   drivers/gpu/drm/i915/intel_device_info.h      |  6 +--
>   drivers/gpu/drm/i915/intel_engine_cs.c        | 15 ++++---
>   drivers/gpu/drm/i915/intel_guc_submission.c   |  4 +-
>   drivers/gpu/drm/i915/intel_hangcheck.c        |  8 ++--
>   drivers/gpu/drm/i915/intel_ringbuffer.c       | 18 ++++-----
>   drivers/gpu/drm/i915/intel_ringbuffer.h       | 11 ++----
>   .../gpu/drm/i915/selftests/i915_gem_context.c |  6 +--
>   drivers/gpu/drm/i915/selftests/i915_request.c |  2 +-
>   drivers/gpu/drm/i915/selftests/intel_guc.c    |  4 +-
>   .../gpu/drm/i915/selftests/intel_hangcheck.c  |  4 +-
>   drivers/gpu/drm/i915/selftests/intel_lrc.c    |  4 +-
>   .../drm/i915/selftests/intel_workarounds.c    |  2 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  1 +
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  2 +-
>   19 files changed, 75 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cf325a00d143..0dd680cdb9ce 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2099,7 +2099,7 @@ static inline struct drm_i915_private *huc_to_i915(struct intel_huc *huc)
>   
>   /* Iterator over subset of engines selected by mask */
>   #define for_each_engine_masked(engine__, dev_priv__, mask__, tmp__) \
> -	for ((tmp__) = (mask__) & INTEL_INFO(dev_priv__)->ring_mask; \
> +	for ((tmp__) = (mask__) & INTEL_INFO(dev_priv__)->engine_mask; \
>   	     (tmp__) ? \
>   	     ((engine__) = (dev_priv__)->engine[__mask_next_bit(tmp__)]), 1 : \
>   	     0;)
> @@ -2432,7 +2432,7 @@ static inline unsigned int i915_sg_segment_size(void)
>   #define ALL_ENGINES	(~0)
>   
>   #define HAS_ENGINE(dev_priv, id) \
> -	(!!(INTEL_INFO(dev_priv)->ring_mask & ENGINE_MASK(id)))
> +	(!!(INTEL_INFO(dev_priv)->engine_mask & ENGINE_MASK(id)))
>   
>   #define HAS_BSD(dev_priv)	HAS_ENGINE(dev_priv, VCS)
>   #define HAS_BSD2(dev_priv)	HAS_ENGINE(dev_priv, VCS2)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 7e79691664e5..99022738cc89 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -798,7 +798,7 @@ static void gen8_initialize_pml4(struct i915_address_space *vm,
>    */
>   static void mark_tlbs_dirty(struct i915_hw_ppgtt *ppgtt)
>   {
> -	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->vm.i915)->ring_mask;
> +	ppgtt->pd_dirty_rings = INTEL_INFO(ppgtt->vm.i915)->engine_mask;

Could rename pd_dirty_rings as well.

>   }
>   
>   /* Removes entries from a single page table, releasing it if it's empty.
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index a9211c370cd1..524f55771f23 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -94,7 +94,7 @@
>   	.gpu_reset_clobbers_display = true, \
>   	.hws_needs_physical = 1, \
>   	.unfenced_needs_alignment = 1, \
> -	.ring_mask = RENDER_RING, \
> +	.engine_mask = RENDER_RING, \

What RENDER_*RING* & co, while we are churning?

Could we use the not-much-remaining-used ENGINE_MASK in here? Like:

   .engine_mask = ENGINE(RCS) | .. ;

>   	.has_snoop = true, \
>   	.has_coherent_ggtt = false, \
>   	GEN_DEFAULT_PIPEOFFSETS, \
> @@ -133,7 +133,7 @@ static const struct intel_device_info intel_i865g_info = {
>   	.num_pipes = 2, \
>   	.display.has_gmch = 1, \
>   	.gpu_reset_clobbers_display = true, \
> -	.ring_mask = RENDER_RING, \
> +	.engine_mask = RENDER_RING, \
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = true, \
>   	GEN_DEFAULT_PIPEOFFSETS, \
> @@ -210,7 +210,7 @@ static const struct intel_device_info intel_pineview_info = {
>   	.display.has_hotplug = 1, \
>   	.display.has_gmch = 1, \
>   	.gpu_reset_clobbers_display = true, \
> -	.ring_mask = RENDER_RING, \
> +	.engine_mask = RENDER_RING, \
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = true, \
>   	GEN_DEFAULT_PIPEOFFSETS, \
> @@ -239,7 +239,7 @@ static const struct intel_device_info intel_i965gm_info = {
>   static const struct intel_device_info intel_g45_info = {
>   	GEN4_FEATURES,
>   	PLATFORM(INTEL_G45),
> -	.ring_mask = RENDER_RING | BSD_RING,
> +	.engine_mask = RENDER_RING | BSD_RING,
>   	.gpu_reset_clobbers_display = false,
>   };
>   
> @@ -249,7 +249,7 @@ static const struct intel_device_info intel_gm45_info = {
>   	.is_mobile = 1,
>   	.display.has_fbc = 1,
>   	.display.supports_tv = 1,
> -	.ring_mask = RENDER_RING | BSD_RING,
> +	.engine_mask = RENDER_RING | BSD_RING,
>   	.gpu_reset_clobbers_display = false,
>   };
>   
> @@ -257,7 +257,7 @@ static const struct intel_device_info intel_gm45_info = {
>   	GEN(5), \
>   	.num_pipes = 2, \
>   	.display.has_hotplug = 1, \
> -	.ring_mask = RENDER_RING | BSD_RING, \
> +	.engine_mask = RENDER_RING | BSD_RING, \
>   	.has_snoop = true, \
>   	.has_coherent_ggtt = true, \
>   	/* ilk does support rc6, but we do not implement [power] contexts */ \
> @@ -283,7 +283,7 @@ static const struct intel_device_info intel_ironlake_m_info = {
>   	.num_pipes = 2, \
>   	.display.has_hotplug = 1, \
>   	.display.has_fbc = 1, \
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING, \
> +	.engine_mask = RENDER_RING | BSD_RING | BLT_RING, \
>   	.has_coherent_ggtt = true, \
>   	.has_llc = 1, \
>   	.has_rc6 = 1, \
> @@ -328,7 +328,7 @@ static const struct intel_device_info intel_sandybridge_m_gt2_info = {
>   	.num_pipes = 3, \
>   	.display.has_hotplug = 1, \
>   	.display.has_fbc = 1, \
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING, \
> +	.engine_mask = RENDER_RING | BSD_RING | BLT_RING, \
>   	.has_coherent_ggtt = true, \
>   	.has_llc = 1, \
>   	.has_rc6 = 1, \
> @@ -389,7 +389,7 @@ static const struct intel_device_info intel_valleyview_info = {
>   	.ppgtt = INTEL_PPGTT_FULL,
>   	.has_snoop = true,
>   	.has_coherent_ggtt = false,
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING,
> +	.engine_mask = RENDER_RING | BSD_RING | BLT_RING,
>   	.display_mmio_offset = VLV_DISPLAY_BASE,
>   	GEN_DEFAULT_PAGE_SIZES,
>   	GEN_DEFAULT_PIPEOFFSETS,
> @@ -398,7 +398,7 @@ static const struct intel_device_info intel_valleyview_info = {
>   
>   #define G75_FEATURES  \
>   	GEN7_FEATURES, \
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
> +	.engine_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
>   	.display.has_ddi = 1, \
>   	.has_fpga_dbg = 1, \
>   	.display.has_psr = 1, \
> @@ -462,7 +462,8 @@ static const struct intel_device_info intel_broadwell_rsvd_info = {
>   static const struct intel_device_info intel_broadwell_gt3_info = {
>   	BDW_PLATFORM,
>   	.gt = 3,
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
> +	.engine_mask =
> +		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
>   };
>   
>   static const struct intel_device_info intel_cherryview_info = {
> @@ -471,7 +472,7 @@ static const struct intel_device_info intel_cherryview_info = {
>   	.num_pipes = 3,
>   	.display.has_hotplug = 1,
>   	.is_lp = 1,
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING,
> +	.engine_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING,
>   	.has_64bit_reloc = 1,
>   	.has_runtime_pm = 1,
>   	.has_rc6 = 1,
> @@ -521,7 +522,8 @@ static const struct intel_device_info intel_skylake_gt2_info = {
>   
>   #define SKL_GT3_PLUS_PLATFORM \
>   	SKL_PLATFORM, \
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING
> +	.engine_mask = \
> +		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING
>   
>   
>   static const struct intel_device_info intel_skylake_gt3_info = {
> @@ -538,7 +540,7 @@ static const struct intel_device_info intel_skylake_gt4_info = {
>   	GEN(9), \
>   	.is_lp = 1, \
>   	.display.has_hotplug = 1, \
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
> +	.engine_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING, \
>   	.num_pipes = 3, \
>   	.has_64bit_reloc = 1, \
>   	.display.has_ddi = 1, \
> @@ -592,7 +594,8 @@ static const struct intel_device_info intel_kabylake_gt2_info = {
>   static const struct intel_device_info intel_kabylake_gt3_info = {
>   	KBL_PLATFORM,
>   	.gt = 3,
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
> +	.engine_mask =
> +		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
>   };
>   
>   #define CFL_PLATFORM \
> @@ -612,7 +615,8 @@ static const struct intel_device_info intel_coffeelake_gt2_info = {
>   static const struct intel_device_info intel_coffeelake_gt3_info = {
>   	CFL_PLATFORM,
>   	.gt = 3,
> -	.ring_mask = RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
> +	.engine_mask =
> +		RENDER_RING | BSD_RING | BLT_RING | VEBOX_RING | BSD2_RING,
>   };
>   
>   #define GEN10_FEATURES \
> @@ -655,7 +659,8 @@ static const struct intel_device_info intel_icelake_11_info = {
>   	GEN11_FEATURES,
>   	PLATFORM(INTEL_ICELAKE),
>   	.is_alpha_support = 1,
> -	.ring_mask = RENDER_RING | BLT_RING | VEBOX_RING | BSD_RING | BSD3_RING,
> +	.engine_mask =
> +		RENDER_RING | BLT_RING | VEBOX_RING | BSD_RING | BSD3_RING,
>   };
>   
>   #undef GEN
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 55d6123dbba4..fc5c5a7e4dfb 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -692,7 +692,7 @@ static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
>   		return err;
>   
>   	for_each_engine(engine, i915, id)
> -		intel_engine_reset(engine, stalled_mask & ENGINE_MASK(id));
> +		intel_engine_reset(engine, stalled_mask & engine->mask);
>   
>   	i915_gem_restore_fences(i915);
>   
> @@ -1057,7 +1057,7 @@ void i915_reset(struct drm_i915_private *i915,
>   static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
>   					struct intel_engine_cs *engine)
>   {
> -	return intel_gpu_reset(i915, intel_engine_flag(engine));
> +	return intel_gpu_reset(i915, engine->mask);
>   }
>   
>   /**
> @@ -1241,7 +1241,7 @@ void i915_handle_error(struct drm_i915_private *i915,
>   	 */
>   	wakeref = intel_runtime_pm_get(i915);
>   
> -	engine_mask &= INTEL_INFO(i915)->ring_mask;
> +	engine_mask &= INTEL_INFO(i915)->engine_mask;
>   
>   	if (flags & I915_ERROR_CAPTURE) {
>   		i915_capture_error_state(i915, engine_mask, msg);
> @@ -1260,7 +1260,7 @@ void i915_handle_error(struct drm_i915_private *i915,
>   				continue;
>   
>   			if (i915_reset_engine(engine, msg) == 0)
> -				engine_mask &= ~intel_engine_flag(engine);
> +				engine_mask &= ~engine->mask;
>   
>   			clear_bit(I915_RESET_ENGINE + engine->id,
>   				  &error->flags);
> diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
> index 855a5074ad77..6283506e89b5 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.c
> +++ b/drivers/gpu/drm/i915/intel_device_info.c
> @@ -738,7 +738,7 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
>   		runtime->num_scalers[PIPE_C] = 1;
>   	}
>   
> -	BUILD_BUG_ON(I915_NUM_ENGINES > BITS_PER_TYPE(intel_ring_mask_t));
> +	BUILD_BUG_ON(BITS_PER_TYPE(intel_engine_mask_t) < I915_NUM_ENGINES);
>   
>   	if (IS_GEN(dev_priv, 11))
>   		for_each_pipe(dev_priv, pipe)
> @@ -887,7 +887,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
>   			continue;
>   
>   		if (!(BIT(i) & RUNTIME_INFO(dev_priv)->vdbox_enable)) {
> -			info->ring_mask &= ~ENGINE_MASK(_VCS(i));
> +			info->engine_mask &= ~ENGINE_MASK(_VCS(i));
>   			DRM_DEBUG_DRIVER("vcs%u fused off\n", i);
>   			continue;
>   		}
> @@ -906,7 +906,7 @@ void intel_device_info_init_mmio(struct drm_i915_private *dev_priv)
>   			continue;
>   
>   		if (!(BIT(i) & RUNTIME_INFO(dev_priv)->vebox_enable)) {
> -			info->ring_mask &= ~ENGINE_MASK(_VECS(i));
> +			info->engine_mask &= ~ENGINE_MASK(_VECS(i));
>   			DRM_DEBUG_DRIVER("vecs%u fused off\n", i);
>   		}
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
> index e8b8661df746..047d10bdd455 100644
> --- a/drivers/gpu/drm/i915/intel_device_info.h
> +++ b/drivers/gpu/drm/i915/intel_device_info.h
> @@ -150,14 +150,14 @@ struct sseu_dev_info {
>   	u8 eu_mask[GEN_MAX_SLICES * GEN_MAX_SUBSLICES];
>   };
>   
> -typedef u8 intel_ring_mask_t;
> +typedef u8 intel_engine_mask_t;
>   
>   struct intel_device_info {
>   	u16 gen_mask;
>   
>   	u8 gen;
>   	u8 gt; /* GT number, 0 if undefined */
> -	intel_ring_mask_t ring_mask; /* Rings supported by the HW */
> +	intel_engine_mask_t engine_mask; /* Engines supported by the HW */
>   
>   	enum intel_platform platform;
>   	u32 platform_mask;
> @@ -200,7 +200,7 @@ struct intel_runtime_info {
>   	u8 num_sprites[I915_MAX_PIPES];
>   	u8 num_scalers[I915_MAX_PIPES];
>   
> -	u8 num_rings;
> +	u8 num_engines;
>   
>   	/* Slice/subslice/EU info */
>   	struct sseu_dev_info sseu;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 4fc7e2ac6278..8226871c7781 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -313,7 +313,10 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
>   	if (!engine)
>   		return -ENOMEM;
>   
> +	BUILD_BUG_ON(BITS_PER_TYPE(engine->mask) < I915_NUM_ENGINES);
> +
>   	engine->id = id;
> +	engine->mask = BIT(id);

Could use ENGINE_MASK just because we can?

>   	engine->i915 = dev_priv;
>   	__sprint_engine_name(engine->name, info);
>   	engine->hw_id = engine->guc_id = info->hw_id;
> @@ -355,15 +358,15 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
>   int intel_engines_init_mmio(struct drm_i915_private *dev_priv)
>   {
>   	struct intel_device_info *device_info = mkwrite_device_info(dev_priv);
> -	const unsigned int ring_mask = INTEL_INFO(dev_priv)->ring_mask;
> +	const unsigned int engine_mask = INTEL_INFO(dev_priv)->engine_mask;
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
>   	unsigned int mask = 0;
>   	unsigned int i;
>   	int err;
>   
> -	WARN_ON(ring_mask == 0);
> -	WARN_ON(ring_mask &
> +	WARN_ON(engine_mask == 0);
> +	WARN_ON(engine_mask &
>   		GENMASK(BITS_PER_TYPE(mask) - 1, I915_NUM_ENGINES));
>   
>   	if (i915_inject_load_failure())
> @@ -385,8 +388,8 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv)
>   	 * are added to the driver by a warning and disabling the forgotten
>   	 * engines.
>   	 */
> -	if (WARN_ON(mask != ring_mask))
> -		device_info->ring_mask = mask;
> +	if (WARN_ON(mask != engine_mask))
> +		device_info->engine_mask = mask;
>   
>   	/* We always presume we have at least RCS available for later probing */
>   	if (WARN_ON(!HAS_ENGINE(dev_priv, RCS))) {
> @@ -394,7 +397,7 @@ int intel_engines_init_mmio(struct drm_i915_private *dev_priv)
>   		goto cleanup;
>   	}
>   
> -	RUNTIME_INFO(dev_priv)->num_rings = hweight32(mask);
> +	RUNTIME_INFO(dev_priv)->num_engines = hweight32(mask);
>   
>   	i915_check_and_clear_faults(dev_priv);
>   
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 56ba2fcbabe6..119d3326bb5e 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -1030,7 +1030,7 @@ static int guc_clients_create(struct intel_guc *guc)
>   	GEM_BUG_ON(guc->preempt_client);
>   
>   	client = guc_client_alloc(dev_priv,
> -				  INTEL_INFO(dev_priv)->ring_mask,
> +				  INTEL_INFO(dev_priv)->engine_mask,
>   				  GUC_CLIENT_PRIORITY_KMD_NORMAL,
>   				  dev_priv->kernel_context);
>   	if (IS_ERR(client)) {
> @@ -1041,7 +1041,7 @@ static int guc_clients_create(struct intel_guc *guc)
>   
>   	if (dev_priv->preempt_context) {
>   		client = guc_client_alloc(dev_priv,
> -					  INTEL_INFO(dev_priv)->ring_mask,
> +					  INTEL_INFO(dev_priv)->engine_mask,
>   					  GUC_CLIENT_PRIORITY_KMD_HIGH,
>   					  dev_priv->preempt_context);
>   		if (IS_ERR(client)) {
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index f1d8dfc58049..c508875d5236 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -120,7 +120,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>   	 */
>   	tmp = I915_READ_CTL(engine);
>   	if (tmp & RING_WAIT) {
> -		i915_handle_error(dev_priv, BIT(engine->id), 0,
> +		i915_handle_error(dev_priv, engine->mask, 0,
>   				  "stuck wait on %s", engine->name);
>   		I915_WRITE_CTL(engine, tmp);
>   		return ENGINE_WAIT_KICK;
> @@ -282,13 +282,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		hangcheck_store_sample(engine, &hc);
>   
>   		if (hc.stalled) {
> -			hung |= intel_engine_flag(engine);
> +			hung |= engine->mask;
>   			if (hc.action != ENGINE_DEAD)
> -				stuck |= intel_engine_flag(engine);
> +				stuck |= engine->mask;
>   		}
>   
>   		if (hc.wedged)
> -			wedged |= intel_engine_flag(engine);
> +			wedged |= engine->mask;
>   	}
>   
>   	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1b96b0960adc..e78594ecba6f 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1692,8 +1692,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   	struct drm_i915_private *i915 = rq->i915;
>   	struct intel_engine_cs *engine = rq->engine;
>   	enum intel_engine_id id;
> -	const int num_rings =
> -		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_rings - 1 : 0;
> +	const int num_engines =
> +		IS_HSW_GT1(i915) ? RUNTIME_INFO(i915)->num_engines - 1 : 0;
>   	bool force_restore = false;
>   	int len;
>   	u32 *cs;
> @@ -1707,7 +1707,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   
>   	len = 4;
>   	if (IS_GEN(i915, 7))
> -		len += 2 + (num_rings ? 4*num_rings + 6 : 0);
> +		len += 2 + (num_engines ? 4 * num_engines + 6 : 0);
>   	if (flags & MI_FORCE_RESTORE) {
>   		GEM_BUG_ON(flags & MI_RESTORE_INHIBIT);
>   		flags &= ~MI_FORCE_RESTORE;
> @@ -1722,10 +1722,10 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   	/* WaProgramMiArbOnOffAroundMiSetContext:ivb,vlv,hsw,bdw,chv */
>   	if (IS_GEN(i915, 7)) {
>   		*cs++ = MI_ARB_ON_OFF | MI_ARB_DISABLE;
> -		if (num_rings) {
> +		if (num_engines) {
>   			struct intel_engine_cs *signaller;
>   
> -			*cs++ = MI_LOAD_REGISTER_IMM(num_rings);
> +			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
>   			for_each_engine(signaller, i915, id) {
>   				if (signaller == engine)
>   					continue;
> @@ -1768,11 +1768,11 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   	*cs++ = MI_NOOP;
>   
>   	if (IS_GEN(i915, 7)) {
> -		if (num_rings) {
> +		if (num_engines) {
>   			struct intel_engine_cs *signaller;
>   			i915_reg_t last_reg = {}; /* keep gcc quiet */
>   
> -			*cs++ = MI_LOAD_REGISTER_IMM(num_rings);
> +			*cs++ = MI_LOAD_REGISTER_IMM(num_engines);
>   			for_each_engine(signaller, i915, id) {
>   				if (signaller == engine)
>   					continue;
> @@ -1859,8 +1859,8 @@ static int switch_context(struct i915_request *rq)
>   				goto err;
>   		} while (--loops);
>   
> -		if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
> -			unwind_mm = intel_engine_flag(engine);
> +		if (ppgtt->pd_dirty_rings & engine->mask) {
> +			unwind_mm = engine->mask;
>   			ppgtt->pd_dirty_rings &= ~unwind_mm;
>   			hw_flags = MI_FORCE_RESTORE;
>   		}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 2e7264119ec4..850ad892836c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -10,12 +10,12 @@
>   #include <linux/seqlock.h>
>   
>   #include "i915_gem_batch_pool.h"
> -
> -#include "i915_reg.h"
>   #include "i915_pmu.h"
> +#include "i915_reg.h"
>   #include "i915_request.h"
>   #include "i915_selftest.h"
>   #include "i915_timeline.h"
> +#include "intel_device_info.h"
>   #include "intel_gpu_commands.h"
>   #include "intel_workarounds.h"
>   
> @@ -334,6 +334,7 @@ struct intel_engine_cs {
>   	enum intel_engine_id id;
>   	unsigned int hw_id;
>   	unsigned int guc_id;
> +	intel_engine_mask_t mask;
>   
>   	u8 uabi_id;
>   	u8 uabi_class;
> @@ -668,12 +669,6 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
>   	return port;
>   }
>   
> -static inline unsigned int
> -intel_engine_flag(const struct intel_engine_cs *engine)
> -{
> -	return BIT(engine->id);
> -}
> -
>   static inline u32
>   intel_read_status_page(const struct intel_engine_cs *engine, int reg)
>   {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index c7ea3535264a..1c608b60c55c 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -556,7 +556,7 @@ static int igt_ctx_exec(void *arg)
>   		ncontexts++;
>   	}
>   	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
> -		ncontexts, RUNTIME_INFO(i915)->num_rings, ndwords);
> +		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
>   
>   	dw = 0;
>   	list_for_each_entry(obj, &objects, st_link) {
> @@ -1126,7 +1126,7 @@ static int igt_ctx_readonly(void *arg)
>   		}
>   	}
>   	pr_info("Submitted %lu dwords (across %u engines)\n",
> -		ndwords, RUNTIME_INFO(i915)->num_rings);
> +		ndwords, RUNTIME_INFO(i915)->num_engines);
>   
>   	dw = 0;
>   	list_for_each_entry(obj, &objects, st_link) {
> @@ -1459,7 +1459,7 @@ static int igt_vm_isolation(void *arg)
>   		count += this;
>   	}
>   	pr_info("Checked %lu scratch offsets across %d engines\n",
> -		count, RUNTIME_INFO(i915)->num_rings);
> +		count, RUNTIME_INFO(i915)->num_engines);
>   
>   out_rpm:
>   	intel_runtime_pm_put(i915, wakeref);
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 7e1b65b8eb19..29875c28a544 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -1216,7 +1216,7 @@ static int live_breadcrumbs_smoketest(void *arg)
>   		num_fences += atomic_long_read(&t[id].num_fences);
>   	}
>   	pr_info("Completed %lu waits for %lu fences across %d engines and %d cpus\n",
> -		num_waits, num_fences, RUNTIME_INFO(i915)->num_rings, ncpus);
> +		num_waits, num_fences, RUNTIME_INFO(i915)->num_engines, ncpus);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   	ret = igt_live_test_end(&live) ?: ret;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_guc.c b/drivers/gpu/drm/i915/selftests/intel_guc.c
> index c5e0a0e98fcb..b05a21eaa8f4 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_guc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_guc.c
> @@ -111,7 +111,7 @@ static int validate_client(struct intel_guc_client *client,
>   			dev_priv->preempt_context : dev_priv->kernel_context;
>   
>   	if (client->owner != ctx_owner ||
> -	    client->engines != INTEL_INFO(dev_priv)->ring_mask ||
> +	    client->engines != INTEL_INFO(dev_priv)->engine_mask ||
>   	    client->priority != client_priority ||
>   	    client->doorbell_id == GUC_DOORBELL_INVALID)
>   		return -EINVAL;
> @@ -261,7 +261,7 @@ static int igt_guc_doorbells(void *arg)
>   
>   	for (i = 0; i < ATTEMPTS; i++) {
>   		clients[i] = guc_client_alloc(dev_priv,
> -					      INTEL_INFO(dev_priv)->ring_mask,
> +					      INTEL_INFO(dev_priv)->engine_mask,
>   					      i % GUC_CLIENT_PRIORITY_NUM,
>   					      dev_priv->kernel_context);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 12e047328ab8..5624a238244e 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1358,7 +1358,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>   
>   out_reset:
>   	igt_global_reset_lock(i915);
> -	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
> +	fake_hangcheck(rq->i915, rq->engine->mask);
>   	igt_global_reset_unlock(i915);
>   
>   	if (tsk) {
> @@ -1643,7 +1643,7 @@ static int igt_handle_error(void *arg)
>   	/* Temporarily disable error capture */
>   	error = xchg(&i915->gpu_error.first_error, (void *)-1);
>   
> -	i915_handle_error(i915, ENGINE_MASK(engine->id), 0, NULL);
> +	i915_handle_error(i915, engine->mask, 0, NULL);
>   
>   	xchg(&i915->gpu_error.first_error, error);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 0677038a5466..565e949a4722 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -929,7 +929,7 @@ static int smoke_crescendo(struct preempt_smoke *smoke, unsigned int flags)
>   
>   	pr_info("Submitted %lu crescendo:%x requests across %d engines and %d contexts\n",
>   		count, flags,
> -		RUNTIME_INFO(smoke->i915)->num_rings, smoke->ncontext);
> +		RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext);
>   	return 0;
>   }
>   
> @@ -957,7 +957,7 @@ static int smoke_random(struct preempt_smoke *smoke, unsigned int flags)
>   
>   	pr_info("Submitted %lu random:%x requests across %d engines and %d contexts\n",
>   		count, flags,
> -		RUNTIME_INFO(smoke->i915)->num_rings, smoke->ncontext);
> +		RUNTIME_INFO(smoke->i915)->num_engines, smoke->ncontext);
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> index 33b3ced83fde..79a836d21abf 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
> @@ -222,7 +222,7 @@ static int check_whitelist(struct i915_gem_context *ctx,
>   
>   static int do_device_reset(struct intel_engine_cs *engine)
>   {
> -	i915_reset(engine->i915, ENGINE_MASK(engine->id), "live_workarounds");
> +	i915_reset(engine->i915, engine->mask, "live_workarounds");
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index ec1ae948954c..c2c954f64226 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -223,6 +223,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.i915 = i915;
>   	snprintf(engine->base.name, sizeof(engine->base.name), "%s", name);
>   	engine->base.id = id;
> +	engine->base.mask = BIT(id);
>   	engine->base.status_page.addr = (void *)(engine + 1);
>   
>   	engine->base.context_pin = mock_context_pin;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index c27616efc4f8..8581cf5e0e8c 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -206,7 +206,7 @@ struct drm_i915_private *mock_gem_device(void)
>   
>   	mock_init_ggtt(i915, &i915->ggtt);
>   
> -	mkwrite_device_info(i915)->ring_mask = BIT(0);
> +	mkwrite_device_info(i915)->engine_mask = BIT(0);
>   	i915->kernel_context = mock_context(i915, NULL);
>   	if (!i915->kernel_context)
>   		goto err_unlock;
> 

Looks good. I would consider making the patch bigger since one merge 
conflict is better than two.

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible
  2019-03-01 15:18   ` Michał Winiarski
@ 2019-03-01 15:25     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 15:25 UTC (permalink / raw)
  To: Michał Winiarski; +Cc: intel-gfx

Quoting Michał Winiarski (2019-03-01 15:18:58)
> On Fri, Mar 01, 2019 at 02:03:32PM +0000, Chris Wilson wrote:
> > +static struct i915_vma *create_scratch(struct i915_gem_context *ctx)
> > +{
> > +     struct drm_i915_gem_object *obj;
> > +     struct i915_vma *vma;
> > +     void *ptr;
> > +     int err;
> > +
> > +     obj = i915_gem_object_create_internal(ctx->i915, PAGE_SIZE);
> > +     if (IS_ERR(obj))
> > +             return ERR_CAST(obj);
> > +
> > +     i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
> 
> Check return value.

Can't fail, but transform it into set_coherency which is void so you
can't complain :-p

> > +     ptr = i915_gem_object_pin_map(obj, I915_MAP_WB);
> > +     if (IS_ERR(ptr)) {
> > +             err = PTR_ERR(ptr);
> > +             goto err_obj;
> > +     }
> > +     memset(ptr, 0xc5, PAGE_SIZE);
> > +     i915_gem_object_unpin_map(obj);
> > +
> > +     vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
> > +     if (IS_ERR(vma)) {
> > +             err = PTR_ERR(vma);
> > +             goto err_obj;
> > +     }
> > +
> > +     err = i915_vma_pin(vma, 0, 0, PIN_USER);
> > +     if (err)
> > +             goto err_obj;
> > +
> > +     err = i915_gem_object_set_to_cpu_domain(obj, false);
> > +     if (err)
> > +             goto err_obj;
> > +
> > +     return vma;
> > +
> > +err_obj:
> > +     i915_gem_object_put(obj);
> > +     return ERR_PTR(err);
> > +}
> > +

[more snip]

> > +             if (wo_register(engine, reg))
> > +                     continue;
> > +
> > +             srm = MI_STORE_REGISTER_MEM;
> > +             lrm = MI_LOAD_REGISTER_MEM;
> > +             if (INTEL_GEN(ctx->i915) >= 8)
> > +                     lrm++, srm++;
> > +
> > +             pr_debug("%s: Writing garbage to %x {srm=0x%08x, lrm=0x%08x}\n",
> > +                      engine->name, reg, srm, lrm);
> 
> Why are we printing opcodes (srm/lrm)?

In a debug, can you guess? Because despite making a lrm variable I used
MI_LRM later on and spent a few runs wondering why the GPU kept hanging
with the wrong opcode. Consider it gone.

> > +             cs = i915_gem_object_pin_map(scratch->obj, I915_MAP_WB);
> > +             if (IS_ERR(cs)) {
> > +                     err = PTR_ERR(cs);
> > +                     goto out_batch;
> > +             }
> 
> We're already using cs for batch! Extra pointer pls.

Will someone think of the poor electrons! Or is more, jobs for all!

> > +
> > +             GEM_BUG_ON(values[ARRAY_SIZE(values) - 1] != 0xffffffff);
> > +             rsvd = cs[ARRAY_SIZE(values)]; /* detect write masking */
> 
> So we're writing 0xffffffff to get the mask. And there's a comment. And it will
> explode if someone changes the last value.
> 
> Reviewed-by: Michał Winiarski <michal.winiarski@intel.com>

It'll do for now, there's a bit more I think we can improve on, but
incremental steps.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 21/38] drm/i915: Fix I915_EXEC_RING_MASK
  2019-03-01 14:03 ` [PATCH 21/38] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
@ 2019-03-01 15:29   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Tvrtko Ursulin, stable


On 01/03/2019 14:03, Chris Wilson wrote:
> This was supposed to be a mask of all known rings, but it is being used
> by execbuffer to filter out invalid rings, and so is instead mapping high
> unused values onto valid rings. Instead of a mask of all known rings,
> we need it to be the mask of all possible rings.
> 
> Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring")
> Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: <stable@vger.kernel.org> # v4.6+
> ---
>   include/uapi/drm/i915_drm.h | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 60cbb2e4f140..4e59cc87527b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1004,7 +1004,7 @@ struct drm_i915_gem_execbuffer2 {
>   	 * struct drm_i915_gem_exec_fence *fences.
>   	 */
>   	__u64 cliprects_ptr;
> -#define I915_EXEC_RING_MASK              (7<<0)
> +#define I915_EXEC_RING_MASK              (0x3f)
>   #define I915_EXEC_DEFAULT                (0<<0)
>   #define I915_EXEC_RENDER                 (1<<0)
>   #define I915_EXEC_BSD                    (2<<0)
> 

Easy pickings first.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine
  2019-03-01 14:03 ` [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
@ 2019-03-01 15:33   ` Tvrtko Ursulin
  2019-03-01 19:11     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Needed for a following patch.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

I'll do yours, you do mine. Criss-cross. Now that's an oldend but golden 
reference. :)

Regards,

Tvrtko

> ---
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++----
>   1 file changed, 7 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 07c0af316f86..53d0d70c97fa 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -2312,10 +2312,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	if (args->flags & I915_EXEC_IS_PINNED)
>   		eb.batch_flags |= I915_DISPATCH_PINNED;
>   
> -	eb.engine = eb_select_engine(eb.i915, file, args);
> -	if (!eb.engine)
> -		return -EINVAL;
> -
>   	if (args->flags & I915_EXEC_FENCE_IN) {
>   		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
>   		if (!in_fence)
> @@ -2340,6 +2336,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	if (unlikely(err))
>   		goto err_destroy;
>   
> +	eb.engine = eb_select_engine(eb.i915, file, args);
> +	if (!eb.engine) {
> +		err = -EINVAL;
> +		goto err_engine;
> +	}
> +
>   	/*
>   	 * Take a local wakeref for preparing to dispatch the execbuf as
>   	 * we expect to access the hardware fairly frequently in the
> @@ -2505,6 +2507,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	mutex_unlock(&dev->struct_mutex);
>   err_rpm:
>   	intel_runtime_pm_put(eb.i915, wakeref);
> +err_engine:
>   	i915_gem_context_put(eb.ctx);
>   err_destroy:
>   	eb_destroy(&eb);
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-01 14:03 ` [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-03-01 15:39   ` Tvrtko Ursulin
  2019-03-01 18:57     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:39 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> An idea for extending uABI inspired by Vulkan's extension chains.
> Instead of expanding the data struct for each ioctl every time we need
> to add a new feature, define an extension chain instead. As we add
> optional interfaces to control the ioctl, we define a new extension
> struct that can be linked into the ioctl data only when required by the
> user. The key advantage being able to ignore large control structs for
> optional interfaces/extensions, while being able to process them in a
> consistent manner.
> 
> In comparison to other extensible ioctls, the key difference is the
> use of a linked chain of extension structs vs an array of tagged
> pointers. For example,
> 
> struct drm_amdgpu_cs_chunk {
>          __u32           chunk_id;
>          __u32           length_dw;
>          __u64           chunk_data;
> };
> 
> struct drm_amdgpu_cs_in {
>          __u32           ctx_id;
>          __u32           bo_list_handle;
>          __u32           num_chunks;
>          __u32           _pad;
>          __u64           chunks;
> };
> 
> allows userspace to pass in array of pointers to extension structs, but
> must therefore keep constructing that array along side the command stream.
> In dynamic situations like that, a linked list is preferred and does not
> similar from extra cache line misses as the extension structs themselves
> must still be loaded separate to the chunks array.
> 
> v2: Apply the tail call optimisation directly to nip the worry of stack
> overflow in the bud.
> v3: Defend against recursion.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile               |  1 +
>   drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
>   drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
>   include/uapi/drm/i915_drm.h                 | 20 ++++++++++
>   5 files changed, 91 insertions(+)
>   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
>   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index a1d834068765..89105b1aaf12 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -46,6 +46,7 @@ i915-y := i915_drv.o \
>   	  i915_sw_fence.o \
>   	  i915_syncmap.o \
>   	  i915_sysfs.o \
> +	  i915_user_extensions.o \
>   	  intel_csr.o \
>   	  intel_device_info.o \
>   	  intel_pm.o \
> diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
> new file mode 100644
> index 000000000000..879b4094b2d7
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_user_extensions.c
> @@ -0,0 +1,43 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#include <linux/sched/signal.h>
> +#include <linux/uaccess.h>
> +#include <uapi/drm/i915_drm.h>
> +
> +#include "i915_user_extensions.h"
> +
> +int i915_user_extensions(struct i915_user_extension __user *ext,
> +			 const i915_user_extension_fn *tbl,
> +			 unsigned long count,
> +			 void *data)
> +{
> +	unsigned int stackdepth = 512;
> +
> +	while (ext) {
> +		int err;
> +		u64 x;
> +
> +		if (!stackdepth--) /* recursion vs useful flexibility */
> +			return -EINVAL;

I don't get this. What stack? Did you mean "static unsigned int 
stackdepth" in case someone puts i915_user_extension into the extension 
table? Or just a limit on number of chained extensions? But you are not 
processing the recursively here.

Regards,

Tvrtko

> +
> +		if (get_user(x, &ext->name))
> +			return -EFAULT;
> +
> +		err = -EINVAL;
> +		if (x < count && tbl[x])
> +			err = tbl[x](ext, data);
> +		if (err)
> +			return err;
> +
> +		if (get_user(x, &ext->next_extension))
> +			return -EFAULT;
> +
> +		ext = u64_to_user_ptr(x);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
> new file mode 100644
> index 000000000000..313a510b068a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_user_extensions.h
> @@ -0,0 +1,20 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#ifndef I915_USER_EXTENSIONS_H
> +#define I915_USER_EXTENSIONS_H
> +
> +struct i915_user_extension;
> +
> +typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
> +				      void *data);
> +
> +int i915_user_extensions(struct i915_user_extension __user *ext,
> +			 const i915_user_extension_fn *tbl,
> +			 unsigned long count,
> +			 void *data);
> +
> +#endif /* I915_USER_EXTENSIONS_H */
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 9726df37c4c4..fcc751aa1ea8 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -105,6 +105,13 @@
>   	__T;								\
>   })
>   
> +#define container_of_user(ptr, type, member) ({				\
> +	void __user *__mptr = (void __user *)(ptr);			\
> +	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> +			 !__same_type(*(ptr), void),			\
> +			 "pointer type mismatch in container_of()");	\
> +	((type __user *)(__mptr - offsetof(type, member))); })
> +
>   static inline u64 ptr_to_u64(const void *ptr)
>   {
>   	return (uintptr_t)ptr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index b10eea3f6d24..f4fa0825722a 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -62,6 +62,26 @@ extern "C" {
>   #define I915_ERROR_UEVENT		"ERROR"
>   #define I915_RESET_UEVENT		"RESET"
>   
> +/*
> + * i915_user_extension: Base class for defining a chain of extensions
> + *
> + * Many interfaces need to grow over time. In most cases we can simply
> + * extend the struct and have userspace pass in more data. Another option,
> + * as demonstrated by Vulkan's approach to providing extensions for forward
> + * and backward compatibility, is to use a list of optional structs to
> + * provide those extra details.
> + *
> + * The key advantage to using an extension chain is that it allows us to
> + * redefine the interface more easily than an ever growing struct of
> + * increasing complexity, and for large parts of that interface to be
> + * entirely optional. The downside is more pointer chasing; chasing across
> + * the __user boundary with pointers encapsulated inside u64.
> + */
> +struct i915_user_extension {
> +	__u64 next_extension;
> +	__u64 name;
> +};
> +
>   /*
>    * MOCS indexes used for GPU surfaces, defining the cacheability of the
>    * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 15/38] drm/i915: Track active engines within a context
  2019-03-01 14:03 ` [PATCH 15/38] drm/i915: Track active engines within a context Chris Wilson
@ 2019-03-01 15:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 15:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> For use in the next patch, if we track which engines have been used by
> the HW, we can reduce the work required to flush our state off the HW to
> those engines.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c           | 18 +++++----------
>   drivers/gpu/drm/i915/i915_gem_context.c       |  5 +++++
>   drivers/gpu/drm/i915/i915_gem_context.h       |  5 +++++
>   drivers/gpu/drm/i915/intel_lrc.c              | 22 +++++++++----------
>   drivers/gpu/drm/i915/intel_ringbuffer.c       | 14 +++++++-----
>   drivers/gpu/drm/i915/selftests/mock_context.c |  2 ++
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  6 +++++
>   7 files changed, 43 insertions(+), 29 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 298371aad445..36c63b087ffd 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -388,12 +388,9 @@ static void print_context_stats(struct seq_file *m,
>   	struct i915_gem_context *ctx;
>   
>   	list_for_each_entry(ctx, &i915->contexts.list, link) {
> -		struct intel_engine_cs *engine;
> -		enum intel_engine_id id;
> -
> -		for_each_engine(engine, i915, id) {
> -			struct intel_context *ce = to_intel_context(ctx, engine);
> +		struct intel_context *ce;
>   
> +		list_for_each_entry(ce, &ctx->active_engines, active_link) {
>   			if (ce->state)
>   				per_file_stats(0, ce->state->obj, &kstats);
>   			if (ce->ring)
> @@ -1880,9 +1877,7 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   {
>   	struct drm_i915_private *dev_priv = node_to_i915(m->private);
>   	struct drm_device *dev = &dev_priv->drm;
> -	struct intel_engine_cs *engine;
>   	struct i915_gem_context *ctx;
> -	enum intel_engine_id id;
>   	int ret;
>   
>   	ret = mutex_lock_interruptible(&dev->struct_mutex);
> @@ -1890,6 +1885,8 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   		return ret;
>   
>   	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> +		struct intel_context *ce;
> +
>   		seq_puts(m, "HW context ");
>   		if (!list_empty(&ctx->hw_id_link))
>   			seq_printf(m, "%x [pin %u]", ctx->hw_id,
> @@ -1912,11 +1909,8 @@ static int i915_context_status(struct seq_file *m, void *unused)
>   		seq_putc(m, ctx->remap_slice ? 'R' : 'r');
>   		seq_putc(m, '\n');
>   
> -		for_each_engine(engine, dev_priv, id) {
> -			struct intel_context *ce =
> -				to_intel_context(ctx, engine);
> -
> -			seq_printf(m, "%s: ", engine->name);
> +		list_for_each_entry(ce, &ctx->active_engines, active_link) {
> +			seq_printf(m, "%s: ", ce->engine->name);
>   			if (ce->state)
>   				describe_obj(m, ce->state->obj);
>   			if (ce->ring)
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 004ffcfb305d..3b5145b30d85 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -224,6 +224,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
> +	GEM_BUG_ON(!list_empty(&ctx->active_engines));
>   
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
> @@ -239,6 +240,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   	put_pid(ctx->pid);
>   
>   	list_del(&ctx->link);
> +	mutex_destroy(&ctx->mutex);
>   
>   	kfree_rcu(ctx, rcu);
>   }
> @@ -351,6 +353,7 @@ intel_context_init(struct intel_context *ce,
>   		   struct intel_engine_cs *engine)
>   {
>   	ce->gem_context = ctx;
> +	ce->engine = engine;
>   
>   	INIT_LIST_HEAD(&ce->signal_link);
>   	INIT_LIST_HEAD(&ce->signals);
> @@ -379,6 +382,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   	list_add_tail(&ctx->link, &dev_priv->contexts.list);
>   	ctx->i915 = dev_priv;
>   	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
> +	INIT_LIST_HEAD(&ctx->active_engines);
> +	mutex_init(&ctx->mutex);
>   
>   	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
>   		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index c39dbb32a5c6..df48013b581e 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -163,6 +163,9 @@ struct i915_gem_context {
>   	atomic_t hw_id_pin_count;
>   	struct list_head hw_id_link;
>   
> +	struct list_head active_engines;
> +	struct mutex mutex;
> +
>   	/**
>   	 * @user_handle: userspace identifier
>   	 *
> @@ -176,7 +179,9 @@ struct i915_gem_context {
>   	/** engine: per-engine logical HW state */
>   	struct intel_context {
>   		struct i915_gem_context *gem_context;
> +		struct intel_engine_cs *engine;
>   		struct intel_engine_cs *active;
> +		struct list_head active_link;
>   		struct list_head signal_link;
>   		struct list_head signals;
>   		struct i915_vma *state;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2268860cca44..50a7c87705c8 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1276,6 +1276,7 @@ static void execlists_context_unpin(struct intel_context *ce)
>   	i915_gem_object_unpin_map(ce->state->obj);
>   	i915_vma_unpin(ce->state);
>   
> +	list_del(&ce->active_link);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -1360,6 +1361,11 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   	__execlists_update_reg_state(engine, ce);
>   
>   	ce->state->obj->pin_global++;
> +
> +	mutex_lock(&ctx->mutex);
> +	list_add(&ce->active_link, &ctx->active_engines);
> +	mutex_unlock(&ctx->mutex);
> +
>   	i915_gem_context_get(ctx);
>   	return ce;
>   
> @@ -2880,9 +2886,8 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   
>   void intel_lr_context_resume(struct drm_i915_private *i915)
>   {
> -	struct intel_engine_cs *engine;
>   	struct i915_gem_context *ctx;
> -	enum intel_engine_id id;
> +	struct intel_context *ce;
>   
>   	/*
>   	 * Because we emit WA_TAIL_DWORDS there may be a disparity
> @@ -2896,17 +2901,10 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
>   	 * simplicity, we just zero everything out.
>   	 */
>   	list_for_each_entry(ctx, &i915->contexts.list, link) {
> -		for_each_engine(engine, i915, id) {
> -			struct intel_context *ce =
> -				to_intel_context(ctx, engine);
> -
> -			if (!ce->state)
> -				continue;
> -
> +		list_for_each_entry(ce, &ctx->active_engines, active_link) {
> +			GEM_BUG_ON(!ce->ring);
>   			intel_ring_reset(ce->ring, 0);
> -
> -			if (ce->pin_count) /* otherwise done in context_pin */
> -				__execlists_update_reg_state(engine, ce);
> +			__execlists_update_reg_state(ce->engine, ce);
>   		}
>   	}
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index e78594ecba6f..764dcc5d5856 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1430,6 +1430,7 @@ static void intel_ring_context_unpin(struct intel_context *ce)
>   	__context_unpin_ppgtt(ce->gem_context);
>   	__context_unpin(ce);
>   
> +	list_del(&ce->active_link);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -1509,6 +1510,10 @@ __ring_context_pin(struct intel_engine_cs *engine,
>   {
>   	int err;
>   
> +	/* One ringbuffer to rule them all */
> +	GEM_BUG_ON(!engine->buffer);
> +	ce->ring = engine->buffer;
> +
>   	if (!ce->state && engine->context_size) {
>   		struct i915_vma *vma;
>   
> @@ -1529,12 +1534,11 @@ __ring_context_pin(struct intel_engine_cs *engine,
>   	if (err)
>   		goto err_unpin;
>   
> -	i915_gem_context_get(ctx);
> -
> -	/* One ringbuffer to rule them all */
> -	GEM_BUG_ON(!engine->buffer);
> -	ce->ring = engine->buffer;
> +	mutex_lock(&ctx->mutex);
> +	list_add(&ce->active_link, &ctx->active_engines);
> +	mutex_unlock(&ctx->mutex);
>   
> +	i915_gem_context_get(ctx);
>   	return ce;
>   
>   err_unpin:
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index b646cdcdd602..353b37b9f78e 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -44,6 +44,8 @@ mock_context(struct drm_i915_private *i915,
>   	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>   	INIT_LIST_HEAD(&ctx->handles_list);
>   	INIT_LIST_HEAD(&ctx->hw_id_link);
> +	INIT_LIST_HEAD(&ctx->active_engines);
> +	mutex_init(&ctx->mutex);
>   
>   	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
>   		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index c2c954f64226..8032a8a9542f 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -125,6 +125,7 @@ static void hw_delay_complete(struct timer_list *t)
>   static void mock_context_unpin(struct intel_context *ce)
>   {
>   	mock_timeline_unpin(ce->ring->timeline);
> +	list_del(&ce->active_link);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -160,6 +161,11 @@ mock_context_pin(struct intel_engine_cs *engine,
>   	mock_timeline_pin(ce->ring->timeline);
>   
>   	ce->ops = &mock_context_ops;
> +
> +	mutex_lock(&ctx->mutex);
> +	list_add(&ce->active_link, &ctx->active_engines);
> +	mutex_unlock(&ctx->mutex);
> +
>   	i915_gem_context_get(ctx);
>   	return ce;
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/38] drm/i915: Introduce a context barrier callback
  2019-03-01 14:03 ` [PATCH 16/38] drm/i915: Introduce a context barrier callback Chris Wilson
@ 2019-03-01 16:12   ` Tvrtko Ursulin
  2019-03-01 19:03     ` Chris Wilson
  2019-03-02 10:01     ` Chris Wilson
  0 siblings, 2 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 16:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> In the next patch, we will want to update live state within a context.
> As this state may be in use by the GPU and we haven't been explicitly
> tracking its activity, we instead attach it to a request we send down
> the context setup with its new state and on retiring that request
> cleanup the old state as we then know that it is no longer live.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       |  74 +++++++++++++
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 103 ++++++++++++++++++
>   2 files changed, 177 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 3b5145b30d85..91926a407548 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -707,6 +707,80 @@ last_request_on_engine(struct i915_timeline *timeline,
>   	return NULL;
>   }
>   
> +struct context_barrier_task {
> +	struct i915_active base;
> +	void (*task)(void *data);
> +	void *data;
> +};
> +
> +static void cb_retire(struct i915_active *base)
> +{
> +	struct context_barrier_task *cb = container_of(base, typeof(*cb), base);
> +
> +	if (cb->task)
> +		cb->task(cb->data);
> +
> +	i915_active_fini(&cb->base);
> +	kfree(cb);
> +}
> +
> +I915_SELFTEST_DECLARE(static unsigned long context_barrier_inject_fault);
> +static int context_barrier_task(struct i915_gem_context *ctx,
> +				unsigned long engines,

I'm in two minds about usefulness of intel_engine_mask_t.

> +				void (*task)(void *data),
> +				void *data)
> +{
> +	struct drm_i915_private *i915 = ctx->i915;
> +	struct context_barrier_task *cb;
> +	struct intel_context *ce;
> +	intel_wakeref_t wakeref;
> +	int err = 0;
> +
> +	lockdep_assert_held(&i915->drm.struct_mutex);
> +	GEM_BUG_ON(!task);
> +
> +	cb = kmalloc(sizeof(*cb), GFP_KERNEL);
> +	if (!cb)
> +		return -ENOMEM;
> +
> +	i915_active_init(i915, &cb->base, cb_retire);
> +	i915_active_acquire(&cb->base);
> +
> +	wakeref = intel_runtime_pm_get(i915);
> +	list_for_each_entry(ce, &ctx->active_engines, active_link) {
> +		struct intel_engine_cs *engine = ce->engine;
> +		struct i915_request *rq;
> +
> +		if (!(ce->engine->mask & engines))
> +			continue;
> +
> +		if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
> +				       engine->mask)) {
> +			err = -ENXIO;
> +			break;
> +		}
> +
> +		rq = i915_request_alloc(engine, ctx);
> +		if (IS_ERR(rq)) {
> +			err = PTR_ERR(rq);
> +			break;
> +		}
> +
> +		err = i915_active_ref(&cb->base, rq->fence.context, rq);
> +		i915_request_add(rq);
> +		if (err)
> +			break;
> +	}
> +	intel_runtime_pm_put(i915, wakeref);
> +
> +	cb->task = err ? NULL : task; /* caller needs to unwind instead */
> +	cb->data = data;
> +
> +	i915_active_release(&cb->base);
> +
> +	return err;
> +}
> +
>   int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
>   				      unsigned long mask)
>   {
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 7ae5033457b6..4f7c04247354 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -1594,10 +1594,113 @@ static int igt_switch_to_kernel_context(void *arg)
>   	return err;
>   }
>   
> +static void mock_barrier_task(void *data)
> +{
> +	unsigned int *counter = data;
> +
> +	++*counter;
> +}
> +
> +static int mock_context_barrier(void *arg)
> +{
> +#undef pr_fmt
> +#define pr_fmt(x) "context_barrier_task():" # x
> +	struct drm_i915_private *i915 = arg;
> +	struct i915_gem_context *ctx;
> +	struct i915_request *rq;
> +	intel_wakeref_t wakeref;
> +	unsigned int counter;
> +	int err;
> +
> +	/*
> +	 * The context barrier provides us with a callback after it emits
> +	 * a request; useful for retiring old state after loading new.
> +	 */
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +
> +	ctx = mock_context(i915, "mock");
> +	if (IS_ERR(ctx)) {
> +		err = PTR_ERR(ctx);
> +		goto unlock;
> +	}
> +
> +	counter = 0;
> +	err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
> +	if (err) {
> +		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
> +		goto out;
> +	}
> +	if (counter == 0) {
> +		pr_err("Did not retire immediately with 0 engines\n");
> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	counter = 0;
> +	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
> +	if (err) {
> +		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
> +		goto out;
> +	}
> +	if (counter == 0) {
> +		pr_err("Did not retire immediately for all inactive engines\n");

Why would this one retire immediately? It will send requests down the 
pipe, no? So don't you actually need to wait for the tracker to be 
signalled and that counter == num_engines?

> +		err = -EINVAL;
> +		goto out;
> +	}
> +
> +	rq = ERR_PTR(-ENODEV);
> +	with_intel_runtime_pm(i915, wakeref)
> +		rq = i915_request_alloc(i915->engine[RCS], ctx);
> +	if (IS_ERR(rq)) {
> +		pr_err("Request allocation failed!\n");
> +		goto out;
> +	}
> +	i915_request_add(rq);

Doesn't this need to go under the wakeref as well?

> +	GEM_BUG_ON(list_empty(&ctx->active_engines));
> +
> +	counter = 0;
> +	context_barrier_inject_fault = BIT(RCS);
> +	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
> +	context_barrier_inject_fault = 0;
> +	if (err == -ENXIO)
> +		err = 0;
> +	else
> +		pr_err("Did not hit fault injection!\n");
> +	if (counter != 0) {
> +		pr_err("Invoked callback on error!\n");
> +		err = -EIO;
> +	}
> +	if (err)
> +		goto out;
> +
> +	counter = 0;
> +	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
> +	if (err) {
> +		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
> +		goto out;
> +	}
> +	mock_device_flush(i915);
> +	if (counter == 0) {
> +		pr_err("Did not retire on each active engines\n");
> +		err = -EINVAL;
> +		goto out;
> +	}

This one is inline with my understanding, and the context_barrier_task 
arguments are the same as the one above.. hm.. I am confused.

> +
> +out:
> +	mock_context_close(ctx);
> +unlock:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +#undef pr_fmt
> +#define pr_fmt(x) x
> +}
> +
>   int i915_gem_context_mock_selftests(void)
>   {
>   	static const struct i915_subtest tests[] = {
>   		SUBTEST(igt_switch_to_kernel_context),
> +		SUBTEST(mock_context_barrier),
>   	};
>   	struct drm_i915_private *i915;
>   	int err;
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-03-01 14:03 ` [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
@ 2019-03-01 16:36   ` Tvrtko Ursulin
  2019-03-01 19:10     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-01 16:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> It can be useful to have a single ioctl to create a context with all
> the initial parameters instead of a series of create + setparam + setparam
> ioctls. This extension to create context allows any of the parameters
> to be passed in as a linked list to be applied to the newly constructed
> context.
> 
> v2: Make a local copy of user setparam (Tvrtko)
> v3: Use flags to detect availability of extension interface

Looks good to me.

Why have you changed to use flags and not just check the extension field 
being non-null?

Regards,

Tvrtko


> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c         |   2 +-
>   drivers/gpu/drm/i915/i915_gem_context.c | 447 +++++++++++++-----------
>   include/uapi/drm/i915_drm.h             | 166 +++++----
>   3 files changed, 339 insertions(+), 276 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 6b75d1b7b8bd..de8effed4381 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2997,7 +2997,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_SET_SPRITE_COLORKEY, intel_sprite_set_colorkey_ioctl, DRM_MASTER),
>   	DRM_IOCTL_DEF_DRV(I915_GET_SPRITE_COLORKEY, drm_noop, DRM_MASTER),
>   	DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> -	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 8c35b6019f0d..f883d99653a3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -89,6 +89,7 @@
>   #include <drm/i915_drm.h>
>   #include "i915_drv.h"
>   #include "i915_trace.h"
> +#include "i915_user_extensions.h"
>   #include "intel_lrc_reg.h"
>   #include "intel_workarounds.h"
>   
> @@ -1066,196 +1067,6 @@ static int set_ppgtt(struct i915_gem_context *ctx,
>   	return err;
>   }
>   
> -static bool client_is_banned(struct drm_i915_file_private *file_priv)
> -{
> -	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> -}
> -
> -int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> -				  struct drm_file *file)
> -{
> -	struct drm_i915_private *i915 = to_i915(dev);
> -	struct drm_i915_gem_context_create *args = data;
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct i915_gem_context *ctx;
> -	int ret;
> -
> -	if (!DRIVER_CAPS(i915)->has_logical_contexts)
> -		return -ENODEV;
> -
> -	if (args->pad != 0)
> -		return -EINVAL;
> -
> -	ret = i915_terminally_wedged(i915);
> -	if (ret)
> -		return ret;
> -
> -	if (client_is_banned(file_priv)) {
> -		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
> -			  current->comm,
> -			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
> -
> -		return -EIO;
> -	}
> -
> -	ret = i915_mutex_lock_interruptible(dev);
> -	if (ret)
> -		return ret;
> -
> -	ctx = i915_gem_create_context(i915, file_priv);
> -	mutex_unlock(&dev->struct_mutex);
> -	if (IS_ERR(ctx))
> -		return PTR_ERR(ctx);
> -
> -	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
> -
> -	args->ctx_id = ctx->user_handle;
> -	DRM_DEBUG("HW context %d created\n", args->ctx_id);
> -
> -	return 0;
> -}
> -
> -int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> -				   struct drm_file *file)
> -{
> -	struct drm_i915_gem_context_destroy *args = data;
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct i915_gem_context *ctx;
> -	int ret;
> -
> -	if (args->pad != 0)
> -		return -EINVAL;
> -
> -	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
> -		return -ENOENT;
> -
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> -	if (!ctx)
> -		return -ENOENT;
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		goto out;
> -
> -	__destroy_hw_context(ctx, file_priv);
> -	mutex_unlock(&dev->struct_mutex);
> -
> -out:
> -	i915_gem_context_put(ctx);
> -	return 0;
> -}
> -
> -static int get_sseu(struct i915_gem_context *ctx,
> -		    struct drm_i915_gem_context_param *args)
> -{
> -	struct drm_i915_gem_context_param_sseu user_sseu;
> -	struct intel_engine_cs *engine;
> -	struct intel_context *ce;
> -	int ret;
> -
> -	if (args->size == 0)
> -		goto out;
> -	else if (args->size < sizeof(user_sseu))
> -		return -EINVAL;
> -
> -	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> -			   sizeof(user_sseu)))
> -		return -EFAULT;
> -
> -	if (user_sseu.flags || user_sseu.rsvd)
> -		return -EINVAL;
> -
> -	engine = intel_engine_lookup_user(ctx->i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> -	if (!engine)
> -		return -EINVAL;
> -
> -	/* Only use for mutex here is to serialize get_param and set_param. */
> -	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	ce = to_intel_context(ctx, engine);
> -
> -	user_sseu.slice_mask = ce->sseu.slice_mask;
> -	user_sseu.subslice_mask = ce->sseu.subslice_mask;
> -	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> -	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> -
> -	mutex_unlock(&ctx->i915->drm.struct_mutex);
> -
> -	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
> -			 sizeof(user_sseu)))
> -		return -EFAULT;
> -
> -out:
> -	args->size = sizeof(user_sseu);
> -
> -	return 0;
> -}
> -
> -int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> -				    struct drm_file *file)
> -{
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct drm_i915_gem_context_param *args = data;
> -	struct i915_gem_context *ctx;
> -	int ret = 0;
> -
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> -	if (!ctx)
> -		return -ENOENT;
> -
> -	switch (args->param) {
> -	case I915_CONTEXT_PARAM_BAN_PERIOD:
> -		ret = -EINVAL;
> -		break;
> -	case I915_CONTEXT_PARAM_NO_ZEROMAP:
> -		args->size = 0;
> -		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> -		break;
> -	case I915_CONTEXT_PARAM_GTT_SIZE:
> -		args->size = 0;
> -
> -		if (ctx->ppgtt)
> -			args->value = ctx->ppgtt->vm.total;
> -		else if (to_i915(dev)->mm.aliasing_ppgtt)
> -			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
> -		else
> -			args->value = to_i915(dev)->ggtt.vm.total;
> -		break;
> -	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
> -		args->size = 0;
> -		args->value = i915_gem_context_no_error_capture(ctx);
> -		break;
> -	case I915_CONTEXT_PARAM_BANNABLE:
> -		args->size = 0;
> -		args->value = i915_gem_context_is_bannable(ctx);
> -		break;
> -	case I915_CONTEXT_PARAM_RECOVERABLE:
> -		args->size = 0;
> -		args->value = i915_gem_context_is_recoverable(ctx);
> -		break;
> -	case I915_CONTEXT_PARAM_PRIORITY:
> -		args->size = 0;
> -		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
> -		break;
> -	case I915_CONTEXT_PARAM_SSEU:
> -		ret = get_sseu(ctx, args);
> -		break;
> -	case I915_CONTEXT_PARAM_VM:
> -		ret = get_ppgtt(ctx, args);
> -		break;
> -	default:
> -		ret = -EINVAL;
> -		break;
> -	}
> -
> -	i915_gem_context_put(ctx);
> -	return ret;
> -}
> -
>   static int gen8_emit_rpcs_config(struct i915_request *rq,
>   				 struct intel_context *ce,
>   				 struct intel_sseu sseu)
> @@ -1531,18 +1342,11 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	return 0;
>   }
>   
> -int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> -				    struct drm_file *file)
> +static int ctx_setparam(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
>   {
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct drm_i915_gem_context_param *args = data;
> -	struct i915_gem_context *ctx;
>   	int ret = 0;
>   
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> -	if (!ctx)
> -		return -ENOENT;
> -
>   	switch (args->param) {
>   	case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   		if (args->size)
> @@ -1552,6 +1356,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
>   		break;
> +
>   	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1560,6 +1365,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			i915_gem_context_clear_no_error_capture(ctx);
>   		break;
> +
>   	case I915_CONTEXT_PARAM_BANNABLE:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1586,7 +1392,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   
>   			if (args->size)
>   				ret = -EINVAL;
> -			else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
> +			else if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
>   				ret = -ENODEV;
>   			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
>   				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
> @@ -1614,6 +1420,247 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		break;
>   	}
>   
> +	return ret;
> +}
> +
> +static int create_setparam(struct i915_user_extension __user *ext, void *data)
> +{
> +	struct drm_i915_gem_context_create_ext_setparam local;
> +
> +	if (copy_from_user(&local, ext, sizeof(local)))
> +		return -EFAULT;
> +
> +	if (local.setparam.ctx_id)
> +		return -EINVAL;
> +
> +	return ctx_setparam(data, &local.setparam);
> +}
> +
> +static const i915_user_extension_fn create_extensions[] = {
> +	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
> +};
> +
> +static bool client_is_banned(struct drm_i915_file_private *file_priv)
> +{
> +	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> +}
> +
> +int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> +				  struct drm_file *file)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct drm_i915_gem_context_create_ext *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_gem_context *ctx;
> +	int ret;
> +
> +	if (!DRIVER_CAPS(i915)->has_logical_contexts)
> +		return -ENODEV;
> +
> +	if (args->flags & I915_CONTEXT_CREATE_FLAGS_UNKNOWN)
> +		return -EINVAL;
> +
> +	ret = i915_terminally_wedged(i915);
> +	if (ret)
> +		return ret;
> +
> +	if (client_is_banned(file_priv)) {
> +		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
> +			  current->comm,
> +			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
> +
> +		return -EIO;
> +	}
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		return ret;
> +
> +	ctx = i915_gem_create_context(i915, file_priv);
> +	mutex_unlock(&dev->struct_mutex);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
> +
> +	if (args->flags & I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS) {
> +		ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
> +					   create_extensions,
> +					   ARRAY_SIZE(create_extensions),
> +					   ctx);
> +		if (ret) {
> +			idr_remove(&file_priv->context_idr, ctx->user_handle);
> +			context_close(ctx);
> +			return ret;
> +		}
> +	}
> +
> +	args->ctx_id = ctx->user_handle;
> +	DRM_DEBUG("HW context %d created\n", args->ctx_id);
> +
> +	return 0;
> +}
> +
> +int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> +				   struct drm_file *file)
> +{
> +	struct drm_i915_gem_context_destroy *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_gem_context *ctx;
> +	int ret;
> +
> +	if (args->pad != 0)
> +		return -EINVAL;
> +
> +	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
> +		return -ENOENT;
> +
> +	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (!ctx)
> +		return -ENOENT;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		goto out;
> +
> +	__destroy_hw_context(ctx, file_priv);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +out:
> +	i915_gem_context_put(ctx);
> +	return 0;
> +}
> +
> +static int get_sseu(struct i915_gem_context *ctx,
> +		    struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_gem_context_param_sseu user_sseu;
> +	struct intel_engine_cs *engine;
> +	struct intel_context *ce;
> +	int ret;
> +
> +	if (args->size == 0)
> +		goto out;
> +	else if (args->size < sizeof(user_sseu))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +			   sizeof(user_sseu)))
> +		return -EFAULT;
> +
> +	if (user_sseu.flags || user_sseu.rsvd)
> +		return -EINVAL;
> +
> +	engine = intel_engine_lookup_user(ctx->i915,
> +					  user_sseu.engine_class,
> +					  user_sseu.engine_instance);
> +	if (!engine)
> +		return -EINVAL;
> +
> +	/* Only use for mutex here is to serialize get_param and set_param. */
> +	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ce = to_intel_context(ctx, engine);
> +
> +	user_sseu.slice_mask = ce->sseu.slice_mask;
> +	user_sseu.subslice_mask = ce->sseu.subslice_mask;
> +	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> +	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> +
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
> +			 sizeof(user_sseu)))
> +		return -EFAULT;
> +
> +out:
> +	args->size = sizeof(user_sseu);
> +
> +	return 0;
> +}
> +
> +int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_context_param *args = data;
> +	struct i915_gem_context *ctx;
> +	int ret = 0;
> +
> +	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (!ctx)
> +		return -ENOENT;
> +
> +	switch (args->param) {
> +	case I915_CONTEXT_PARAM_NO_ZEROMAP:
> +		args->size = 0;
> +		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_GTT_SIZE:
> +		args->size = 0;
> +		if (ctx->ppgtt)
> +			args->value = ctx->ppgtt->vm.total;
> +		else if (to_i915(dev)->mm.aliasing_ppgtt)
> +			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
> +		else
> +			args->value = to_i915(dev)->ggtt.vm.total;
> +		break;
> +
> +	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
> +		args->size = 0;
> +		args->value = i915_gem_context_no_error_capture(ctx);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BANNABLE:
> +		args->size = 0;
> +		args->value = i915_gem_context_is_bannable(ctx);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_RECOVERABLE:
> +		args->size = 0;
> +		args->value = i915_gem_context_is_recoverable(ctx);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_PRIORITY:
> +		args->size = 0;
> +		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
> +		break;
> +
> +	case I915_CONTEXT_PARAM_SSEU:
> +		ret = get_sseu(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = get_ppgtt(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BAN_PERIOD:
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	i915_gem_context_put(ctx);
> +	return ret;
> +}
> +
> +int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_context_param *args = data;
> +	struct i915_gem_context *ctx;
> +	int ret;
> +
> +	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (!ctx)
> +		return -ENOENT;
> +
> +	ret = ctx_setparam(ctx, args);
> +
>   	i915_gem_context_put(ctx);
>   	return ret;
>   }
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 9fcfb54a13f2..eec635fb2e1c 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -392,6 +392,7 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
>   #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
>   #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
> +#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)
>   #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>   #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
>   #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
> @@ -1443,85 +1444,17 @@ struct drm_i915_gem_wait {
>   };
>   
>   struct drm_i915_gem_context_create {
> -	/*  output: id of new context*/
> -	__u32 ctx_id;
> -	__u32 pad;
> -};
> -
> -struct drm_i915_gem_context_destroy {
> -	__u32 ctx_id;
> +	__u32 ctx_id; /* output: id of new context*/
>   	__u32 pad;
>   };
>   
> -/*
> - * DRM_I915_GEM_VM_CREATE -
> - *
> - * Create a new virtual memory address space (ppGTT) for use within a context
> - * on the same file. Extensions can be provided to configure exactly how the
> - * address space is setup upon creation.
> - *
> - * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> - * returned.
> - *
> - * DRM_I915_GEM_VM_DESTROY -
> - *
> - * Destroys a previously created VM id.
> - */
> -struct drm_i915_gem_vm_control {
> -	__u64 extensions;
> +struct drm_i915_gem_context_create_ext {
> +	__u32 ctx_id; /* output: id of new context*/
>   	__u32 flags;
> -	__u32 id;
> -};
> -
> -struct drm_i915_reg_read {
> -	/*
> -	 * Register offset.
> -	 * For 64bit wide registers where the upper 32bits don't immediately
> -	 * follow the lower 32bits, the offset of the lower 32bits must
> -	 * be specified
> -	 */
> -	__u64 offset;
> -#define I915_REG_READ_8B_WA (1ul << 0)
> -
> -	__u64 val; /* Return value */
> -};
> -/* Known registers:
> - *
> - * Render engine timestamp - 0x2358 + 64bit - gen7+
> - * - Note this register returns an invalid value if using the default
> - *   single instruction 8byte read, in order to workaround that pass
> - *   flag I915_REG_READ_8B_WA in offset field.
> - *
> - */
> -
> -struct drm_i915_reset_stats {
> -	__u32 ctx_id;
> -	__u32 flags;
> -
> -	/* All resets since boot/module reload, for all contexts */
> -	__u32 reset_count;
> -
> -	/* Number of batches lost when active in GPU, for this context */
> -	__u32 batch_active;
> -
> -	/* Number of batches lost pending for execution, for this context */
> -	__u32 batch_pending;
> -
> -	__u32 pad;
> -};
> -
> -struct drm_i915_gem_userptr {
> -	__u64 user_ptr;
> -	__u64 user_size;
> -	__u32 flags;
> -#define I915_USERPTR_READ_ONLY 0x1
> -#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
> -	/**
> -	 * Returned handle for the object.
> -	 *
> -	 * Object handles are nonzero.
> -	 */
> -	__u32 handle;
> +#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
> +#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
> +	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
> +	__u64 extensions;
>   };
>   
>   struct drm_i915_gem_context_param {
> @@ -1637,6 +1570,89 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +struct drm_i915_gem_context_create_ext_setparam {
> +#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
> +	struct i915_user_extension base;
> +	struct drm_i915_gem_context_param setparam;
> +};
> +
> +struct drm_i915_gem_context_destroy {
> +	__u32 ctx_id;
> +	__u32 pad;
> +};
> +
> +/*
> + * DRM_I915_GEM_VM_CREATE -
> + *
> + * Create a new virtual memory address space (ppGTT) for use within a context
> + * on the same file. Extensions can be provided to configure exactly how the
> + * address space is setup upon creation.
> + *
> + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> + * returned.
> + *
> + * DRM_I915_GEM_VM_DESTROY -
> + *
> + * Destroys a previously created VM id.
> + */
> +struct drm_i915_gem_vm_control {
> +	__u64 extensions;
> +	__u32 flags;
> +	__u32 id;
> +};
> +
> +struct drm_i915_reg_read {
> +	/*
> +	 * Register offset.
> +	 * For 64bit wide registers where the upper 32bits don't immediately
> +	 * follow the lower 32bits, the offset of the lower 32bits must
> +	 * be specified
> +	 */
> +	__u64 offset;
> +#define I915_REG_READ_8B_WA (1ul << 0)
> +
> +	__u64 val; /* Return value */
> +};
> +
> +/* Known registers:
> + *
> + * Render engine timestamp - 0x2358 + 64bit - gen7+
> + * - Note this register returns an invalid value if using the default
> + *   single instruction 8byte read, in order to workaround that pass
> + *   flag I915_REG_READ_8B_WA in offset field.
> + *
> + */
> +
> +struct drm_i915_reset_stats {
> +	__u32 ctx_id;
> +	__u32 flags;
> +
> +	/* All resets since boot/module reload, for all contexts */
> +	__u32 reset_count;
> +
> +	/* Number of batches lost when active in GPU, for this context */
> +	__u32 batch_active;
> +
> +	/* Number of batches lost pending for execution, for this context */
> +	__u32 batch_pending;
> +
> +	__u32 pad;
> +};
> +
> +struct drm_i915_gem_userptr {
> +	__u64 user_ptr;
> +	__u64 user_size;
> +	__u32 flags;
> +#define I915_USERPTR_READ_ONLY 0x1
> +#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
> +	/**
> +	 * Returned handle for the object.
> +	 *
> +	 * Object handles are nonzero.
> +	 */
> +	__u32 handle;
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
  2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (39 preceding siblings ...)
  2019-03-01 14:36 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-03-01 18:03 ` Patchwork
  40 siblings, 0 replies; 88+ messages in thread
From: Patchwork @ 2019-03-01 18:03 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/38] drm/i915/execlists: Suppress redundant preemption
URL   : https://patchwork.freedesktop.org/series/57427/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5676_full -> Patchwork_12343_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12343_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12343_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12343_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_busy@extended-blt:
    - shard-glk:          PASS -> FAIL +7
    - shard-snb:          PASS -> FAIL +3

  * igt@gem_busy@extended-parallel-bsd:
    - shard-iclb:         PASS -> FAIL +7
    - shard-skl:          NOTRUN -> FAIL

  * igt@gem_busy@extended-parallel-bsd2:
    - shard-kbl:          PASS -> FAIL +9

  * igt@gem_busy@extended-parallel-vebox:
    - shard-apl:          PASS -> FAIL +7
    - shard-skl:          PASS -> FAIL +2
    - shard-hsw:          NOTRUN -> FAIL

  * igt@gem_busy@extended-render:
    - shard-hsw:          PASS -> FAIL +6

  * igt@gem_busy@extended-semaphore-render:
    - shard-iclb:         NOTRUN -> FAIL

  
#### Warnings ####

  * igt@gem_busy@extended-semaphore-blt:
    - shard-iclb:         SKIP [fdo#109275] -> FAIL +2
    - shard-kbl:          SKIP [fdo#109271] -> FAIL +5
    - shard-glk:          SKIP [fdo#109271] -> FAIL +3

  * igt@gem_busy@extended-semaphore-vebox:
    - shard-apl:          SKIP [fdo#109271] -> FAIL +3
    - shard-skl:          SKIP [fdo#109271] -> FAIL +3

  
Known issues
------------

  Here are the changes found in Patchwork_12343_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_busy@extended-parallel-bsd2:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109276] +4

  * igt@gem_exec_params@no-bsd:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109283]

  * igt@gem_exec_store@pages-bsd1:
    - shard-apl:          NOTRUN -> SKIP [fdo#109271] +17

  * igt@gem_mocs_settings@mocs-reset-ctx-dirty-render:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109287]

  * igt@gem_pread@stolen-display:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109277]

  * igt@i915_missed_irq:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109503]

  * igt@i915_pm_backlight@fade_with_suspend:
    - shard-skl:          NOTRUN -> FAIL [fdo#107847]

  * igt@i915_pm_rpm@modeset-lpsp-stress-no-wait:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107724]

  * igt@i915_pm_rpm@system-suspend-devices:
    - shard-skl:          PASS -> INCOMPLETE [fdo#107807]

  * igt@kms_atomic_transition@4x-modeset-transitions:
    - shard-apl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_busy@extended-modeset-hang-newfb-render-a:
    - shard-snb:          PASS -> DMESG-WARN [fdo#107956]

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#107956]

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
    - shard-skl:          NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_busy@extended-modeset-hang-oldfb-render-d:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +10

  * igt@kms_busy@extended-pageflip-hang-oldfb-render-f:
    - shard-hsw:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-c:
    - shard-hsw:          NOTRUN -> DMESG-WARN [fdo#107956]

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-f:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109278]

  * igt@kms_chamelium@vga-edid-read:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109284] +1

  * igt@kms_color@pipe-b-legacy-gamma:
    - shard-iclb:         NOTRUN -> FAIL [fdo#104782]

  * igt@kms_cursor_crc@cursor-64x21-onscreen:
    - shard-iclb:         NOTRUN -> FAIL [fdo#103232]

  * igt@kms_cursor_crc@cursor-64x64-onscreen:
    - shard-apl:          PASS -> FAIL [fdo#103232] +3

  * igt@kms_cursor_crc@cursor-64x64-sliding:
    - shard-skl:          NOTRUN -> FAIL [fdo#103232]

  * igt@kms_cursor_legacy@2x-flip-vs-cursor-legacy:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109274] +3

  * igt@kms_cursor_legacy@2x-long-cursor-vs-flip-atomic:
    - shard-hsw:          PASS -> FAIL [fdo#105767]

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-skl:          NOTRUN -> INCOMPLETE [fdo#104108] / [fdo#107773]

  * igt@kms_fbcon_fbt@psr:
    - shard-skl:          NOTRUN -> FAIL [fdo#103833]

  * igt@kms_flip@basic-flip-vs-dpms:
    - shard-hsw:          PASS -> DMESG-WARN [fdo#102614]

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          PASS -> FAIL [fdo#105363]

  * igt@kms_flip@flip-vs-panning-vs-hang:
    - shard-glk:          PASS -> INCOMPLETE [fdo#103359] / [k.org#198133]

  * igt@kms_flip@modeset-vs-vblank-race-interruptible:
    - shard-apl:          PASS -> FAIL [fdo#103060]

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-cur-indfb-draw-blt:
    - shard-apl:          PASS -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-pwrite:
    - shard-glk:          PASS -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-pri-indfb-draw-blt:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] +72

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-primscrn-spr-indfb-draw-mmap-cpu:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109280] +7

  * igt@kms_frontbuffer_tracking@fbcpsr-2p-scndscrn-spr-indfb-draw-blt:
    - shard-hsw:          NOTRUN -> SKIP [fdo#109271] +19

  * igt@kms_plane@pixel-format-pipe-b-planes-source-clamping:
    - shard-skl:          NOTRUN -> DMESG-WARN [fdo#106885]

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-b-planes:
    - shard-skl:          PASS -> INCOMPLETE [fdo#104108]

  * igt@kms_plane@plane-panning-bottom-right-suspend-pipe-c-planes:
    - shard-iclb:         PASS -> INCOMPLETE [fdo#107713]

  * igt@kms_plane_alpha_blend@pipe-b-alpha-7efc:
    - shard-apl:          PASS -> FAIL [fdo#108145]

  * igt@kms_plane_alpha_blend@pipe-c-alpha-basic:
    - shard-skl:          NOTRUN -> FAIL [fdo#107815] / [fdo#108145]

  * igt@kms_plane_alpha_blend@pipe-c-alpha-opaque-fb:
    - shard-skl:          NOTRUN -> FAIL [fdo#108145] +1

  * igt@kms_plane_multiple@atomic-pipe-c-tiling-x:
    - shard-glk:          PASS -> FAIL [fdo#103166] +1

  * igt@kms_rotation_crc@multiplane-rotation-cropping-bottom:
    - shard-kbl:          PASS -> DMESG-FAIL [fdo#105763]

  * igt@prime_nv_api@nv_i915_reimport_twice_check_flink_name:
    - shard-iclb:         NOTRUN -> SKIP [fdo#109291]

  
#### Possible fixes ####

  * igt@gem_eio@in-flight-suspend:
    - shard-apl:          INCOMPLETE [fdo#103927] -> PASS

  * igt@gem_exec_suspend@basic-s3:
    - shard-hsw:          INCOMPLETE [fdo#103540] -> PASS

  * igt@i915_pm_rpm@basic-pci-d3-state:
    - shard-skl:          INCOMPLETE [fdo#107807] -> PASS

  * igt@i915_pm_rpm@fences-dpms:
    - shard-iclb:         DMESG-WARN [fdo#107724] -> PASS

  * igt@i915_pm_rpm@universal-planes:
    - shard-iclb:         DMESG-WARN [fdo#108654] -> PASS

  * igt@kms_color@pipe-c-ctm-max:
    - shard-skl:          FAIL [fdo#108147] -> PASS

  * igt@kms_cursor_crc@cursor-128x128-suspend:
    - shard-skl:          INCOMPLETE [fdo#104108] -> PASS
    - shard-kbl:          DMESG-WARN [fdo#108566] -> PASS

  * igt@kms_cursor_crc@cursor-256x256-onscreen:
    - shard-skl:          FAIL [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-64x21-random:
    - shard-apl:          FAIL [fdo#103232] -> PASS +2

  * igt@kms_cursor_crc@cursor-64x64-suspend:
    - shard-apl:          FAIL [fdo#103191] / [fdo#103232] -> PASS

  * igt@kms_flip@modeset-vs-vblank-race-interruptible:
    - shard-kbl:          FAIL [fdo#103060] -> PASS

  * igt@kms_flip@plain-flip-fb-recreate:
    - shard-skl:          FAIL [fdo#100368] -> PASS

  * igt@kms_frontbuffer_tracking@basic:
    - shard-iclb:         FAIL [fdo#103167] -> PASS +2

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-pwrite:
    - shard-glk:          FAIL [fdo#103167] -> PASS +1

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-fullscreen:
    - shard-apl:          FAIL [fdo#103167] -> PASS +1

  * igt@kms_plane@pixel-format-pipe-b-planes:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@kms_plane@pixel-format-pipe-b-planes-source-clamping:
    - shard-glk:          FAIL [fdo#108948] -> PASS

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          FAIL [fdo#107815] -> PASS

  * igt@kms_plane_multiple@atomic-pipe-b-tiling-yf:
    - shard-iclb:         FAIL [fdo#103166] -> PASS

  * igt@kms_setmode@basic:
    - shard-kbl:          FAIL [fdo#99912] -> PASS

  
#### Warnings ####

  * igt@i915_pm_rpm@gem-execbuf-stress-pc8:
    - shard-iclb:         SKIP [fdo#109506] -> INCOMPLETE [fdo#107713] / [fdo#108840]

  * igt@runner@aborted:
    - shard-iclb:         ( 2 FAIL ) [fdo#108654] / [fdo#109593] -> FAIL [fdo#109593]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#100368]: https://bugs.freedesktop.org/show_bug.cgi?id=100368
  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103060]: https://bugs.freedesktop.org/show_bug.cgi?id=103060
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103359]: https://bugs.freedesktop.org/show_bug.cgi?id=103359
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103833]: https://bugs.freedesktop.org/show_bug.cgi?id=103833
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#104782]: https://bugs.freedesktop.org/show_bug.cgi?id=104782
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105763]: https://bugs.freedesktop.org/show_bug.cgi?id=105763
  [fdo#105767]: https://bugs.freedesktop.org/show_bug.cgi?id=105767
  [fdo#106885]: https://bugs.freedesktop.org/show_bug.cgi?id=106885
  [fdo#107713]: https://bugs.freedesktop.org/show_bug.cgi?id=107713
  [fdo#107724]: https://bugs.freedesktop.org/show_bug.cgi?id=107724
  [fdo#107773]: https://bugs.freedesktop.org/show_bug.cgi?id=107773
  [fdo#107807]: https://bugs.freedesktop.org/show_bug.cgi?id=107807
  [fdo#107815]: https://bugs.freedesktop.org/show_bug.cgi?id=107815
  [fdo#107847]: https://bugs.freedesktop.org/show_bug.cgi?id=107847
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108147]: https://bugs.freedesktop.org/show_bug.cgi?id=108147
  [fdo#108566]: https://bugs.freedesktop.org/show_bug.cgi?id=108566
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108840]: https://bugs.freedesktop.org/show_bug.cgi?id=108840
  [fdo#108948]: https://bugs.freedesktop.org/show_bug.cgi?id=108948
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109274]: https://bugs.freedesktop.org/show_bug.cgi?id=109274
  [fdo#109275]: https://bugs.freedesktop.org/show_bug.cgi?id=109275
  [fdo#109276]: https://bugs.freedesktop.org/show_bug.cgi?id=109276
  [fdo#109277]: https://bugs.freedesktop.org/show_bug.cgi?id=109277
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109280]: https://bugs.freedesktop.org/show_bug.cgi?id=109280
  [fdo#109283]: https://bugs.freedesktop.org/show_bug.cgi?id=109283
  [fdo#109284]: https://bugs.freedesktop.org/show_bug.cgi?id=109284
  [fdo#109287]: https://bugs.freedesktop.org/show_bug.cgi?id=109287
  [fdo#109291]: https://bugs.freedesktop.org/show_bug.cgi?id=109291
  [fdo#109503]: https://bugs.freedesktop.org/show_bug.cgi?id=109503
  [fdo#109506]: https://bugs.freedesktop.org/show_bug.cgi?id=109506
  [fdo#109593]: https://bugs.freedesktop.org/show_bug.cgi?id=109593
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912
  [k.org#198133]: https://bugzilla.kernel.org/show_bug.cgi?id=198133


Participating hosts (7 -> 7)
------------------------------

  No changes in participating hosts


Build changes
-------------

    * Linux: CI_DRM_5676 -> Patchwork_12343

  CI_DRM_5676: 3911a5d7d3de6d8e491868bb0cd506346131d71b @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4866: 189956af183c245eb237b3be4fa22953ec93bbe0 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12343: 800b0544e5143742e54cbcab646cdbd525895733 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12343/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-01 15:39   ` Tvrtko Ursulin
@ 2019-03-01 18:57     ` Chris Wilson
  2019-03-04  8:54       ` Tvrtko Ursulin
  0 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 18:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 15:39:13)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > +int i915_user_extensions(struct i915_user_extension __user *ext,
> > +                      const i915_user_extension_fn *tbl,
> > +                      unsigned long count,
> > +                      void *data)
> > +{
> > +     unsigned int stackdepth = 512;
> > +
> > +     while (ext) {
> > +             int err;
> > +             u64 x;
> > +
> > +             if (!stackdepth--) /* recursion vs useful flexibility */
> > +                     return -EINVAL;
> 
> I don't get this. What stack? Did you mean "static unsigned int 
> stackdepth" in case someone puts i915_user_extension into the extension 
> table? Or just a limit on number of chained extensions? But you are not 
> processing the recursively here.

It's iterative recursion :)

I still think of this loop in terms of its simple tail recursion.

And if we need to individual levels for unwind, that is a manual stack.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/38] drm/i915: Introduce a context barrier callback
  2019-03-01 16:12   ` Tvrtko Ursulin
@ 2019-03-01 19:03     ` Chris Wilson
  2019-03-04  8:55       ` Tvrtko Ursulin
  2019-03-02 10:01     ` Chris Wilson
  1 sibling, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 19:03 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 16:12:33)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > +     counter = 0;
> > +     err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
> > +     if (err) {
> > +             pr_err("Failed at line %d, err=%d\n", __LINE__, err);
> > +             goto out;
> > +     }
> > +     if (counter == 0) {
> > +             pr_err("Did not retire immediately with 0 engines\n");
> > +             err = -EINVAL;
> > +             goto out;
> > +     }
> > +
> > +     counter = 0;
> > +     err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
> > +     if (err) {
> > +             pr_err("Failed at line %d, err=%d\n", __LINE__, err);
> > +             goto out;
> > +     }
> > +     if (counter == 0) {
> > +             pr_err("Did not retire immediately for all inactive engines\n");
> 
> Why would this one retire immediately? It will send requests down the 
> pipe, no? So don't you actually need to wait for the tracker to be 
> signalled and that counter == num_engines?

Nothing has used the context at this point, so we don't emit a request
on any engine, and the barrier can be immediately executed.

> > +             err = -EINVAL;
> > +             goto out;
> > +     }
> > +
> > +     rq = ERR_PTR(-ENODEV);
> > +     with_intel_runtime_pm(i915, wakeref)
> > +             rq = i915_request_alloc(i915->engine[RCS], ctx);
> > +     if (IS_ERR(rq)) {
> > +             pr_err("Request allocation failed!\n");
> > +             goto out;
> > +     }
> > +     i915_request_add(rq);
> 
> Doesn't this need to go under the wakeref as well?

No, we only need the wakeref to start (that's only to avoid blocking
inside request_alloc --- yeah, that's not exactly how it works, we'll be
back later to fix that!). The request then carries the GT wakeref with it.

> > +     GEM_BUG_ON(list_empty(&ctx->active_engines));
> > +
> > +     counter = 0;
> > +     context_barrier_inject_fault = BIT(RCS);
> > +     err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
> > +     context_barrier_inject_fault = 0;
> > +     if (err == -ENXIO)
> > +             err = 0;
> > +     else
> > +             pr_err("Did not hit fault injection!\n");
> > +     if (counter != 0) {
> > +             pr_err("Invoked callback on error!\n");
> > +             err = -EIO;
> > +     }
> > +     if (err)
> > +             goto out;
> > +
> > +     counter = 0;
> > +     err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
> > +     if (err) {
> > +             pr_err("Failed at line %d, err=%d\n", __LINE__, err);
> > +             goto out;
> > +     }
> > +     mock_device_flush(i915);
> > +     if (counter == 0) {
> > +             pr_err("Did not retire on each active engines\n");
> > +             err = -EINVAL;
> > +             goto out;
> > +     }
> 
> This one is inline with my understanding, and the context_barrier_task 
> arguments are the same as the one above.. hm.. I am confused.

This time it is active, so we actually have to wait as the barrier waits
for the GPU before firing.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-03-01 16:36   ` Tvrtko Ursulin
@ 2019-03-01 19:10     ` Chris Wilson
  2019-03-04  8:57       ` Tvrtko Ursulin
  0 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 19:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 16:36:45)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > It can be useful to have a single ioctl to create a context with all
> > the initial parameters instead of a series of create + setparam + setparam
> > ioctls. This extension to create context allows any of the parameters
> > to be passed in as a linked list to be applied to the newly constructed
> > context.
> > 
> > v2: Make a local copy of user setparam (Tvrtko)
> > v3: Use flags to detect availability of extension interface
> 
> Looks good to me.
> 
> Why have you changed to use flags and not just check the extension field 
> being non-null?

Hmm. I was thinking about how new userspace would use it on an old kernel.
As the extension is in a new bit of the extension struct that won't be
passed to the old ioctl, and so it would create a context and not report
any error despite not processing the extensions (userspace would be none
the wiser that the context was invalid). So a simple answer was to use
the flags field to indicate that we want the extension processed; the
old kernel would reject the ioctl due to pad!=0, a new kernel will be
happy. New userspace on old kernel can then fallback gracefully.

+uint32_t
+brw_clone_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id)
+{
+   struct drm_i915_gem_context_create_ext_clone ext_clone = {
+      .base = { I915_CONTEXT_CREATE_EXT_CLONE },
+      .clone = ctx_id,
+      .flags = ~I915_CONTEXT_CLONE_UNKNOWN,
+   };
+   struct drm_i915_gem_context_create_ext arg = {
+      .flags = I915_CONTEXT_CREATE_USE_EXTENSIONS,
+      .extensions = (uintptr_t)&ext_clone
+   };
+   if (drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT, &arg) == 0)
+      return arg.ctx_id;
+
+   return __brw_clone_hw_context(bufmgr, ctx_id);
+}

-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine
  2019-03-01 15:33   ` Tvrtko Ursulin
@ 2019-03-01 19:11     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-01 19:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 15:33:50)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > 
> > Needed for a following patch.
> > 
> > Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> I'll do yours, you do mine. Criss-cross. Now that's an oldend but golden 
> reference. :)

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>

Must remember to s-o-b, or at least check that dim checks it this time.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/38] drm/i915: Introduce a context barrier callback
  2019-03-01 16:12   ` Tvrtko Ursulin
  2019-03-01 19:03     ` Chris Wilson
@ 2019-03-02 10:01     ` Chris Wilson
  1 sibling, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-02 10:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-01 16:12:33)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > +I915_SELFTEST_DECLARE(static unsigned long context_barrier_inject_fault);
> > +static int context_barrier_task(struct i915_gem_context *ctx,
> > +                             unsigned long engines,
> 
> I'm in two minds about usefulness of intel_engine_mask_t.

I'm thinking if we store it in a struct, then yes, always use
intel_engine_mask_t. For function parameters, the natural type is
unsigned long and we have some time before worrying about the lack of
bits. So it's a problem for future self and I'm ambivalent about the
need to enforce it just yet.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-01 18:57     ` Chris Wilson
@ 2019-03-04  8:54       ` Tvrtko Ursulin
  2019-03-04  9:04         ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-04  8:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 18:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-01 15:39:13)
>>
>> On 01/03/2019 14:03, Chris Wilson wrote:
>>> +int i915_user_extensions(struct i915_user_extension __user *ext,
>>> +                      const i915_user_extension_fn *tbl,
>>> +                      unsigned long count,
>>> +                      void *data)
>>> +{
>>> +     unsigned int stackdepth = 512;
>>> +
>>> +     while (ext) {
>>> +             int err;
>>> +             u64 x;
>>> +
>>> +             if (!stackdepth--) /* recursion vs useful flexibility */
>>> +                     return -EINVAL;
>>
>> I don't get this. What stack? Did you mean "static unsigned int
>> stackdepth" in case someone puts i915_user_extension into the extension
>> table? Or just a limit on number of chained extensions? But you are not
>> processing the recursively here.
> 
> It's iterative recursion :)
> 
> I still think of this loop in terms of its simple tail recursion.
> 
> And if we need to individual levels for unwind, that is a manual stack.
Okay..

One thought I had - if unwind is too unwieldy, could we mandate each 
user extension user has an additional output field in the embedding 
struct which would hold the last processed extension id?

This way the state of the object would be less undefined after failure. 
Userspace would be able to tell how far in the extension chain i915 
managed to get to.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 16/38] drm/i915: Introduce a context barrier callback
  2019-03-01 19:03     ` Chris Wilson
@ 2019-03-04  8:55       ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-04  8:55 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 19:03, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-01 16:12:33)
>>
>> On 01/03/2019 14:03, Chris Wilson wrote:
>>> +     counter = 0;
>>> +     err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
>>> +     if (err) {
>>> +             pr_err("Failed at line %d, err=%d\n", __LINE__, err);
>>> +             goto out;
>>> +     }
>>> +     if (counter == 0) {
>>> +             pr_err("Did not retire immediately with 0 engines\n");
>>> +             err = -EINVAL;
>>> +             goto out;
>>> +     }
>>> +
>>> +     counter = 0;
>>> +     err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
>>> +     if (err) {
>>> +             pr_err("Failed at line %d, err=%d\n", __LINE__, err);
>>> +             goto out;
>>> +     }
>>> +     if (counter == 0) {
>>> +             pr_err("Did not retire immediately for all inactive engines\n");
>>
>> Why would this one retire immediately? It will send requests down the
>> pipe, no? So don't you actually need to wait for the tracker to be
>> signalled and that counter == num_engines?
> 
> Nothing has used the context at this point, so we don't emit a request
> on any engine, and the barrier can be immediately executed.

Yep, that it uses ctx->active_engines slipped my mind.

>>> +             err = -EINVAL;
>>> +             goto out;
>>> +     }
>>> +
>>> +     rq = ERR_PTR(-ENODEV);
>>> +     with_intel_runtime_pm(i915, wakeref)
>>> +             rq = i915_request_alloc(i915->engine[RCS], ctx);
>>> +     if (IS_ERR(rq)) {
>>> +             pr_err("Request allocation failed!\n");
>>> +             goto out;
>>> +     }
>>> +     i915_request_add(rq);
>>
>> Doesn't this need to go under the wakeref as well?
> 
> No, we only need the wakeref to start (that's only to avoid blocking
> inside request_alloc --- yeah, that's not exactly how it works, we'll be
> back later to fix that!). The request then carries the GT wakeref with it.
> 
>>> +     GEM_BUG_ON(list_empty(&ctx->active_engines));
>>> +
>>> +     counter = 0;
>>> +     context_barrier_inject_fault = BIT(RCS);
>>> +     err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
>>> +     context_barrier_inject_fault = 0;
>>> +     if (err == -ENXIO)
>>> +             err = 0;
>>> +     else
>>> +             pr_err("Did not hit fault injection!\n");
>>> +     if (counter != 0) {
>>> +             pr_err("Invoked callback on error!\n");
>>> +             err = -EIO;
>>> +     }
>>> +     if (err)
>>> +             goto out;
>>> +
>>> +     counter = 0;
>>> +     err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
>>> +     if (err) {
>>> +             pr_err("Failed at line %d, err=%d\n", __LINE__, err);
>>> +             goto out;
>>> +     }
>>> +     mock_device_flush(i915);
>>> +     if (counter == 0) {
>>> +             pr_err("Did not retire on each active engines\n");
>>> +             err = -EINVAL;
>>> +             goto out;
>>> +     }
>>
>> This one is inline with my understanding, and the context_barrier_task
>> arguments are the same as the one above.. hm.. I am confused.
> 
> This time it is active, so we actually have to wait as the barrier waits
> for the GPU before firing.

Yep.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-03-01 19:10     ` Chris Wilson
@ 2019-03-04  8:57       ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-04  8:57 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 19:10, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-01 16:36:45)
>>
>> On 01/03/2019 14:03, Chris Wilson wrote:
>>> It can be useful to have a single ioctl to create a context with all
>>> the initial parameters instead of a series of create + setparam + setparam
>>> ioctls. This extension to create context allows any of the parameters
>>> to be passed in as a linked list to be applied to the newly constructed
>>> context.
>>>
>>> v2: Make a local copy of user setparam (Tvrtko)
>>> v3: Use flags to detect availability of extension interface
>>
>> Looks good to me.
>>
>> Why have you changed to use flags and not just check the extension field
>> being non-null?
> 
> Hmm. I was thinking about how new userspace would use it on an old kernel.
> As the extension is in a new bit of the extension struct that won't be
> passed to the old ioctl, and so it would create a context and not report
> any error despite not processing the extensions (userspace would be none
> the wiser that the context was invalid). So a simple answer was to use
> the flags field to indicate that we want the extension processed; the
> old kernel would reject the ioctl due to pad!=0, a new kernel will be
> happy. New userspace on old kernel can then fallback gracefully.
> 
> +uint32_t
> +brw_clone_hw_context(struct brw_bufmgr *bufmgr, uint32_t ctx_id)
> +{
> +   struct drm_i915_gem_context_create_ext_clone ext_clone = {
> +      .base = { I915_CONTEXT_CREATE_EXT_CLONE },
> +      .clone = ctx_id,
> +      .flags = ~I915_CONTEXT_CLONE_UNKNOWN,
> +   };
> +   struct drm_i915_gem_context_create_ext arg = {
> +      .flags = I915_CONTEXT_CREATE_USE_EXTENSIONS,
> +      .extensions = (uintptr_t)&ext_clone
> +   };
> +   if (drmIoctl(bufmgr->fd, DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT, &arg) == 0)
> +      return arg.ctx_id;
> +
> +   return __brw_clone_hw_context(bufmgr, ctx_id);
> +}

Yeah, makes sense.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-04  8:54       ` Tvrtko Ursulin
@ 2019-03-04  9:04         ` Chris Wilson
  2019-03-04  9:35           ` Tvrtko Ursulin
  0 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-04  9:04 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-04 08:54:10)
> 
> On 01/03/2019 18:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-01 15:39:13)
> >>
> >> On 01/03/2019 14:03, Chris Wilson wrote:
> >>> +int i915_user_extensions(struct i915_user_extension __user *ext,
> >>> +                      const i915_user_extension_fn *tbl,
> >>> +                      unsigned long count,
> >>> +                      void *data)
> >>> +{
> >>> +     unsigned int stackdepth = 512;
> >>> +
> >>> +     while (ext) {
> >>> +             int err;
> >>> +             u64 x;
> >>> +
> >>> +             if (!stackdepth--) /* recursion vs useful flexibility */
> >>> +                     return -EINVAL;
> >>
> >> I don't get this. What stack? Did you mean "static unsigned int
> >> stackdepth" in case someone puts i915_user_extension into the extension
> >> table? Or just a limit on number of chained extensions? But you are not
> >> processing the recursively here.
> > 
> > It's iterative recursion :)
> > 
> > I still think of this loop in terms of its simple tail recursion.
> > 
> > And if we need to individual levels for unwind, that is a manual stack.
> Okay..
> 
> One thought I had - if unwind is too unwieldy, could we mandate each 
> user extension user has an additional output field in the embedding 
> struct which would hold the last processed extension id?

id? I guess we could do depth, though I guess pointer is more practical.
I don't think you mean ext.name as that we expect to repeat a few times.
 
> This way the state of the object would be less undefined after failure. 
> Userspace would be able to tell how far in the extension chain i915 
> managed to get to.

But I'm not convinced how practical that is. The answer is pretty much
always -EINVAL (anything else is much less ambiguous) and even then you
have no idea which field tripped EINVAL or why?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-04  9:04         ` Chris Wilson
@ 2019-03-04  9:35           ` Tvrtko Ursulin
  2019-03-04  9:45             ` Tvrtko Ursulin
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-04  9:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/03/2019 09:04, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-04 08:54:10)
>>
>> On 01/03/2019 18:57, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-03-01 15:39:13)
>>>>
>>>> On 01/03/2019 14:03, Chris Wilson wrote:
>>>>> +int i915_user_extensions(struct i915_user_extension __user *ext,
>>>>> +                      const i915_user_extension_fn *tbl,
>>>>> +                      unsigned long count,
>>>>> +                      void *data)
>>>>> +{
>>>>> +     unsigned int stackdepth = 512;
>>>>> +
>>>>> +     while (ext) {
>>>>> +             int err;
>>>>> +             u64 x;
>>>>> +
>>>>> +             if (!stackdepth--) /* recursion vs useful flexibility */
>>>>> +                     return -EINVAL;
>>>>
>>>> I don't get this. What stack? Did you mean "static unsigned int
>>>> stackdepth" in case someone puts i915_user_extension into the extension
>>>> table? Or just a limit on number of chained extensions? But you are not
>>>> processing the recursively here.
>>>
>>> It's iterative recursion :)
>>>
>>> I still think of this loop in terms of its simple tail recursion.
>>>
>>> And if we need to individual levels for unwind, that is a manual stack.
>> Okay..
>>
>> One thought I had - if unwind is too unwieldy, could we mandate each
>> user extension user has an additional output field in the embedding
>> struct which would hold the last processed extension id?
> 
> id? I guess we could do depth, though I guess pointer is more practical.
> I don't think you mean ext.name as that we expect to repeat a few times.

Pointer yes, I started with a pointer but then for some reason wrote id, 
where I meant name.

>> This way the state of the object would be less undefined after failure.
>> Userspace would be able to tell how far in the extension chain i915
>> managed to get to.
> 
> But I'm not convinced how practical that is. The answer is pretty much
> always -EINVAL (anything else is much less ambiguous) and even then you
> have no idea which field tripped EINVAL or why?

Which field tripped -EINVAL is why is always true.

I am not 100% convinced myself, just worry about usability. But I don't 
have an usecase at the moment which would be practically helped with 
knowing which extension failed. In all cases I can think of it's a 
programming error ie. should be caught during development and not happen 
in production.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method
  2019-03-04  9:35           ` Tvrtko Ursulin
@ 2019-03-04  9:45             ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-04  9:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 04/03/2019 09:35, Tvrtko Ursulin wrote:
> 
> On 04/03/2019 09:04, Chris Wilson wrote:
>> Quoting Tvrtko Ursulin (2019-03-04 08:54:10)
>>>
>>> On 01/03/2019 18:57, Chris Wilson wrote:
>>>> Quoting Tvrtko Ursulin (2019-03-01 15:39:13)
>>>>>
>>>>> On 01/03/2019 14:03, Chris Wilson wrote:
>>>>>> +int i915_user_extensions(struct i915_user_extension __user *ext,
>>>>>> +                      const i915_user_extension_fn *tbl,
>>>>>> +                      unsigned long count,
>>>>>> +                      void *data)
>>>>>> +{
>>>>>> +     unsigned int stackdepth = 512;
>>>>>> +
>>>>>> +     while (ext) {
>>>>>> +             int err;
>>>>>> +             u64 x;
>>>>>> +
>>>>>> +             if (!stackdepth--) /* recursion vs useful 
>>>>>> flexibility */
>>>>>> +                     return -EINVAL;
>>>>>
>>>>> I don't get this. What stack? Did you mean "static unsigned int
>>>>> stackdepth" in case someone puts i915_user_extension into the 
>>>>> extension
>>>>> table? Or just a limit on number of chained extensions? But you are 
>>>>> not
>>>>> processing the recursively here.
>>>>
>>>> It's iterative recursion :)
>>>>
>>>> I still think of this loop in terms of its simple tail recursion.
>>>>
>>>> And if we need to individual levels for unwind, that is a manual stack.
>>> Okay..
>>>
>>> One thought I had - if unwind is too unwieldy, could we mandate each
>>> user extension user has an additional output field in the embedding
>>> struct which would hold the last processed extension id?
>>
>> id? I guess we could do depth, though I guess pointer is more practical.
>> I don't think you mean ext.name as that we expect to repeat a few times.
> 
> Pointer yes, I started with a pointer but then for some reason wrote id, 
> where I meant name.
> 
>>> This way the state of the object would be less undefined after failure.
>>> Userspace would be able to tell how far in the extension chain i915
>>> managed to get to.
>>
>> But I'm not convinced how practical that is. The answer is pretty much
>> always -EINVAL (anything else is much less ambiguous) and even then you
>> have no idea which field tripped EINVAL or why?
> 
> Which field tripped -EINVAL is why is always true.
> 
> I am not 100% convinced myself, just worry about usability. But I don't 
> have an usecase at the moment which would be practically helped with 
> knowing which extension failed. In all cases I can think of it's a 
> programming error ie. should be caught during development and not happen 
> in production.

To add - if we are adding a new extensible uAPI my gut feeling is we 
should provision for reporting which stage failed. I think it is easy 
now, while later it would be much harder.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines
  2019-03-01 14:03 ` [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
@ 2019-03-05 15:54   ` Tvrtko Ursulin
  2019-03-05 16:26     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 15:54 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> Previously, our view has been always to run the engines independently
> within a context. (Multiple engines happened before we had contexts and
> timelines, so they always operated independently and that behaviour
> persisted into contexts.) However, at the user level the context often
> represents a single timeline (e.g. GL contexts) and userspace must
> ensure that the individual engines are serialised to present that
> ordering to the client (or forgot about this detail entirely and hope no
> one notices - a fair ploy if the client can only directly control one
> engine themselves ;)
> 
> In the next patch, we will want to construct a set of engines that
> operate as one, that have a single timeline interwoven between them, to
> present a single virtual engine to the user. (They submit to the virtual
> engine, then we decide which engine to execute on based.)
> 
> To that end, we want to be able to create contexts which have a single
> timeline (fence context) shared between all engines, rather than multiple
> timelines.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       | 32 ++++++++++++---
>   drivers/gpu/drm/i915/i915_gem_context.h       |  3 ++
>   drivers/gpu/drm/i915/i915_request.c           | 10 ++++-
>   drivers/gpu/drm/i915/i915_request.h           |  5 ++-
>   drivers/gpu/drm/i915/i915_sw_fence.c          | 39 ++++++++++++++++---
>   drivers/gpu/drm/i915/i915_sw_fence.h          | 13 ++++++-
>   drivers/gpu/drm/i915/intel_lrc.c              |  5 ++-
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 19 +++++----
>   drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
>   include/uapi/drm/i915_drm.h                   |  3 +-
>   10 files changed, 104 insertions(+), 27 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f883d99653a3..d8e2228636ba 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -239,6 +239,9 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   			ce->ops->destroy(ce);
>   	}
>   
> +	if (ctx->timeline)
> +		i915_timeline_put(ctx->timeline);
> +
>   	kfree(ctx->name);
>   	put_pid(ctx->pid);
>   
> @@ -478,12 +481,17 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
>   
>   static struct i915_gem_context *
>   i915_gem_create_context(struct drm_i915_private *dev_priv,
> -			struct drm_i915_file_private *file_priv)
> +			struct drm_i915_file_private *file_priv,
> +			unsigned int flags)
>   {
>   	struct i915_gem_context *ctx;
>   
>   	lockdep_assert_held(&dev_priv->drm.struct_mutex);
>   
> +	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE &&
> +	    !HAS_EXECLISTS(dev_priv))
> +		return ERR_PTR(-EINVAL);
> +
>   	/* Reap the most stale context */
>   	contexts_free_first(dev_priv);
>   
> @@ -506,6 +514,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   		i915_ppgtt_put(ppgtt);
>   	}
>   
> +	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) {
> +		struct i915_timeline *timeline;
> +
> +		timeline = i915_timeline_create(dev_priv, ctx->name, NULL);
> +		if (IS_ERR(timeline)) {
> +			__destroy_hw_context(ctx, file_priv);
> +			return ERR_CAST(timeline);
> +		}
> +
> +		ctx->timeline = timeline;
> +	}
> +
>   	trace_i915_context_create(ctx);
>   
>   	return ctx;
> @@ -534,7 +554,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
>   	if (ret)
>   		return ERR_PTR(ret);
>   
> -	ctx = i915_gem_create_context(to_i915(dev), NULL);
> +	ctx = i915_gem_create_context(to_i915(dev), NULL, 0);
>   	if (IS_ERR(ctx))
>   		goto out;
>   
> @@ -570,7 +590,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
>   	struct i915_gem_context *ctx;
>   	int err;
>   
> -	ctx = i915_gem_create_context(i915, NULL);
> +	ctx = i915_gem_create_context(i915, NULL, 0);
>   	if (IS_ERR(ctx))
>   		return ctx;
>   
> @@ -702,7 +722,7 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	idr_init_base(&file_priv->vm_idr, 1);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
> -	ctx = i915_gem_create_context(i915, file_priv);
> +	ctx = i915_gem_create_context(i915, file_priv, 0);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	if (IS_ERR(ctx)) {
>   		idr_destroy(&file_priv->context_idr);
> @@ -818,7 +838,7 @@ last_request_on_engine(struct i915_timeline *timeline,
>   
>   	rq = i915_active_request_raw(&timeline->last_request,
>   				     &engine->i915->drm.struct_mutex);
> -	if (rq && rq->engine == engine) {
> +	if (rq && rq->engine->mask & engine->mask) {
>   		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
>   			  timeline->name, engine->name,
>   			  rq->fence.context, rq->fence.seqno);
> @@ -1476,7 +1496,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> -	ctx = i915_gem_create_context(i915, file_priv);
> +	ctx = i915_gem_create_context(i915, file_priv, args->flags);
>   	mutex_unlock(&dev->struct_mutex);
>   	if (IS_ERR(ctx))
>   		return PTR_ERR(ctx);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 97cf9d3d07ae..e1f270c098f0 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -43,6 +43,7 @@ struct drm_i915_private;
>   struct drm_i915_file_private;
>   struct i915_hw_ppgtt;
>   struct i915_request;
> +struct i915_timeline;
>   struct i915_vma;
>   struct intel_ring;
>   
> @@ -78,6 +79,8 @@ struct i915_gem_context {
>   	/** file_priv: owning file descriptor */
>   	struct drm_i915_file_private *file_priv;
>   
> +	struct i915_timeline *timeline;
> +
>   	/**
>   	 * @ppgtt: unique address space (GTT)
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 9d111eedad5a..e0807a61dcf4 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -1045,8 +1045,14 @@ void i915_request_add(struct i915_request *request)
>   	prev = i915_active_request_raw(&timeline->last_request,
>   				       &request->i915->drm.struct_mutex);
>   	if (prev && !i915_request_completed(prev)) {
> -		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
> -					     &request->submitq);
> +		if (is_power_of_2(prev->engine->mask | engine->mask))
> +			i915_sw_fence_await_sw_fence(&request->submit,
> +						     &prev->submit,
> +						     &request->submitq);
> +		else
> +			__i915_sw_fence_await_dma_fence(&request->submit,
> +							&prev->fence,
> +							&request->dmaq);

Drop a comment here explaining what's happening in this if block.

The subtlety of why we need a special flavours of await helper, new 
which use the builtin call back storage, vs using the existing ones 
which allocate that, totally escapes me at the moment.

It's probably a good idea to put a paragraph in the commit message 
explaining what new sw fence facility needs to be added to implement 
this and why.

>   		if (engine->schedule)
>   			__i915_sched_node_add_dependency(&request->sched,
>   							 &prev->sched,
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 3e0d62d35226..d4113074a8f6 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -128,7 +128,10 @@ struct i915_request {
>   	 * It is used by the driver to then queue the request for execution.
>   	 */
>   	struct i915_sw_fence submit;
> -	wait_queue_entry_t submitq;
> +	union {
> +		wait_queue_entry_t submitq;
> +		struct i915_sw_dma_fence_cb dmaq;
> +	};

Union deserves a comment as well I think, like this is for that and that 
is for this, and only one can be in use at a time because of the third 
thing.

>   	struct list_head execute_cb;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 8d1400d378d7..5387aafd3424 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -359,11 +359,6 @@ int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
>   	return __i915_sw_fence_await_sw_fence(fence, signaler, NULL, gfp);
>   }
>   
> -struct i915_sw_dma_fence_cb {
> -	struct dma_fence_cb base;
> -	struct i915_sw_fence *fence;
> -};
> -
>   struct i915_sw_dma_fence_cb_timer {
>   	struct i915_sw_dma_fence_cb base;
>   	struct dma_fence *dma;
> @@ -480,6 +475,40 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   	return ret;
>   }
>   
> +static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
> +				     struct dma_fence_cb *data)
> +{
> +	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
> +
> +	i915_sw_fence_complete(cb->fence);
> +}
> +
> +int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
> +				    struct dma_fence *dma,
> +				    struct i915_sw_dma_fence_cb *cb)
> +{
> +	int ret;
> +
> +	debug_fence_assert(fence);
> +
> +	if (dma_fence_is_signaled(dma))
> +		return 0;
> +
> +	cb->fence = fence;
> +	i915_sw_fence_await(fence);
> +
> +	ret = dma_fence_add_callback(dma, &cb->base, __dma_i915_sw_fence_wake);
> +	if (ret == 0) {
> +		ret = 1;
> +	} else {
> +		i915_sw_fence_complete(fence);
> +		if (ret == -ENOENT) /* fence already signaled */
> +			ret = 0;
> +	}
> +
> +	return ret;
> +}
> +
>   int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    struct reservation_object *resv,
>   				    const struct dma_fence_ops *exclude,
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 6dec9e1d1102..9cb5c3b307a6 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -9,14 +9,13 @@
>   #ifndef _I915_SW_FENCE_H_
>   #define _I915_SW_FENCE_H_
>   
> +#include <linux/dma-fence.h>
>   #include <linux/gfp.h>
>   #include <linux/kref.h>
>   #include <linux/notifier.h> /* for NOTIFY_DONE */
>   #include <linux/wait.h>
>   
>   struct completion;
> -struct dma_fence;
> -struct dma_fence_ops;
>   struct reservation_object;
>   
>   struct i915_sw_fence {
> @@ -68,10 +67,20 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
>   int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
>   				     struct i915_sw_fence *after,
>   				     gfp_t gfp);
> +
> +struct i915_sw_dma_fence_cb {
> +	struct dma_fence_cb base;
> +	struct i915_sw_fence *fence;
> +};
> +
> +int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
> +				    struct dma_fence *dma,
> +				    struct i915_sw_dma_fence_cb *cb);
>   int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   				  struct dma_fence *dma,
>   				  unsigned long timeout,
>   				  gfp_t gfp);
> +
>   int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    struct reservation_object *resv,
>   				    const struct dma_fence_ops *exclude,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 50a7c87705c8..d50a33c578c5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2853,7 +2853,10 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   		goto error_deref_obj;
>   	}
>   
> -	timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
> +	if (ctx->timeline)
> +		timeline = i915_timeline_get(ctx->timeline);
> +	else
> +		timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
>   	if (IS_ERR(timeline)) {
>   		ret = PTR_ERR(timeline);
>   		goto error_deref_obj;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 9133afc03135..15f016ca8e0d 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
>   	}
>   
>   	for (n = 0; n < nctx; n++) {
> -		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
> +		ctx[n] = i915_gem_create_context(i915, file->driver_priv, 0);
>   		if (IS_ERR(ctx[n])) {
>   			err = PTR_ERR(ctx[n]);
>   			goto out_unlock;
> @@ -526,7 +526,8 @@ static int igt_ctx_exec(void *arg)
>   			struct i915_gem_context *ctx;
>   			intel_wakeref_t wakeref;
>   
> -			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			ctx = i915_gem_create_context(i915,
> +						      file->driver_priv, 0);
>   			if (IS_ERR(ctx)) {
>   				err = PTR_ERR(ctx);
>   				goto out_unlock;
> @@ -623,7 +624,8 @@ static int igt_shared_ctx_exec(void *arg)
>   		if (err)
>   			goto out_unlock;
>   
> -		parent = i915_gem_create_context(i915, file->driver_priv);
> +		parent = i915_gem_create_context(i915,
> +						 file->driver_priv, 0);
>   		if (IS_ERR(parent)) {
>   			err = PTR_ERR(parent);
>   			if (err == -ENODEV) /* no logical ctx support */
> @@ -645,7 +647,8 @@ static int igt_shared_ctx_exec(void *arg)
>   			if (ctx)
>   				__destroy_hw_context(ctx, file->driver_priv);
>   
> -			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			ctx = i915_gem_create_context(i915,
> +						      file->driver_priv, 0);
>   			if (IS_ERR(ctx)) {
>   				err = PTR_ERR(ctx);
>   				goto out_unlock;
> @@ -1091,7 +1094,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   
> -	ctx = i915_gem_create_context(i915, file->driver_priv);
> +	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx)) {
>   		ret = PTR_ERR(ctx);
>   		goto out_unlock;
> @@ -1201,7 +1204,7 @@ static int igt_ctx_readonly(void *arg)
>   	if (err)
>   		goto out_unlock;
>   
> -	ctx = i915_gem_create_context(i915, file->driver_priv);
> +	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx)) {
>   		err = PTR_ERR(ctx);
>   		goto out_unlock;
> @@ -1527,13 +1530,13 @@ static int igt_vm_isolation(void *arg)
>   	if (err)
>   		goto out_unlock;
>   
> -	ctx_a = i915_gem_create_context(i915, file->driver_priv);
> +	ctx_a = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx_a)) {
>   		err = PTR_ERR(ctx_a);
>   		goto out_unlock;
>   	}
>   
> -	ctx_b = i915_gem_create_context(i915, file->driver_priv);
> +	ctx_b = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx_b)) {
>   		err = PTR_ERR(ctx_b);
>   		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 5eddf9fcfe8a..5d0ff2293abc 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -95,7 +95,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
>   {
>   	lockdep_assert_held(&i915->drm.struct_mutex);
>   
> -	return i915_gem_create_context(i915, file->driver_priv);
> +	return i915_gem_create_context(i915, file->driver_priv, 0);
>   }
>   
>   struct i915_gem_context *
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index eec635fb2e1c..451d2f36830b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1452,8 +1452,9 @@ struct drm_i915_gem_context_create_ext {
>   	__u32 ctx_id; /* output: id of new context*/
>   	__u32 flags;
>   #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
> +#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)

And some kernel doc please.

>   #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
> -	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
> +	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
>   	__u64 extensions;
>   };
>   
> 

Look fine, no complaints, apart from needing help to remind me how some 
things work.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/38] drm/i915: Pass around the intel_context
  2019-03-01 14:03 ` [PATCH 26/38] drm/i915: Pass around the intel_context Chris Wilson
@ 2019-03-05 16:16   ` Tvrtko Ursulin
  2019-03-05 16:33     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 16:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> Instead of passing the gem_context and engine to find the instance of
> the intel_context to use, pass around the intel_context instead. This is
> useful for the next few patches, where the intel_context is no longer a
> direct lookup.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.h  |  2 +-
>   drivers/gpu/drm/i915/i915_perf.c | 32 ++++++++++++++--------------
>   drivers/gpu/drm/i915/intel_lrc.c | 36 +++++++++++++++++---------------
>   3 files changed, 36 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cbc2b59722ae..493cba8a76aa 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3134,7 +3134,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
>   int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
>   				  struct drm_file *file);
>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> -			    struct i915_gem_context *ctx,
> +			    struct intel_context *ce,
>   			    u32 *reg_state);
>   
>   /* i915_gem_evict.c */
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index 9ebf99f3d8d3..f969a0512465 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1629,13 +1629,14 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
>    * It's fine to put out-of-date values into these per-context registers
>    * in the case that the OA unit has been disabled.
>    */
> -static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
> -					   u32 *reg_state,
> -					   const struct i915_oa_config *oa_config)
> +static void
> +gen8_update_reg_state_unlocked(struct intel_context *ce,
> +			       u32 *reg_state,
> +			       const struct i915_oa_config *oa_config)
>   {
> -	struct drm_i915_private *dev_priv = ctx->i915;
> -	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
> -	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
> +	struct drm_i915_private *i915 = ce->gem_context->i915;
> +	u32 ctx_oactxctrl = i915->perf.oa.ctx_oactxctrl_offset;
> +	u32 ctx_flexeu0 = i915->perf.oa.ctx_flexeu0_offset;
>   	/* The MMIO offsets for Flex EU registers aren't contiguous */
>   	i915_reg_t flex_regs[] = {
>   		EU_PERF_CNTL0,
> @@ -1649,8 +1650,8 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   	int i;
>   
>   	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
> -		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
> -		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
> +		(i915->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
> +		(i915->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
>   		GEN8_OA_COUNTER_RESUME);
>   
>   	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
> @@ -1678,10 +1679,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
>   		CTX_REG(reg_state, state_offset, flex_regs[i], value);
>   	}
>   
> -	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> -		gen8_make_rpcs(dev_priv,
> -			       &to_intel_context(ctx,
> -						 dev_priv->engine[RCS])->sseu));
> +	CTX_REG(reg_state,
> +		CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
> +		gen8_make_rpcs(i915, &ce->sseu));
>   }
>   
>   /*
> @@ -1754,7 +1754,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   		ce->state->obj->mm.dirty = true;
>   		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
>   
> -		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
> +		gen8_update_reg_state_unlocked(ce, regs, oa_config);
>   
>   		i915_gem_object_unpin_map(ce->state->obj);
>   	}
> @@ -2138,8 +2138,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
>   }
>   
>   void i915_oa_init_reg_state(struct intel_engine_cs *engine,
> -			    struct i915_gem_context *ctx,
> -			    u32 *reg_state)
> +			    struct intel_context *ce,
> +			    u32 *regs)
>   {
>   	struct i915_perf_stream *stream;
>   
> @@ -2148,7 +2148,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
>   
>   	stream = engine->i915->perf.oa.exclusive_stream;
>   	if (stream)
> -		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
> +		gen8_update_reg_state_unlocked(ce, regs, stream->oa_config);
>   }
>   
>   /**
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d50a33c578c5..a2210f79dc67 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -170,7 +170,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   					    struct intel_engine_cs *engine,
>   					    struct intel_context *ce);
>   static void execlists_init_reg_state(u32 *reg_state,
> -				     struct i915_gem_context *ctx,
> +				     struct intel_context *ce,
>   				     struct intel_engine_cs *engine,
>   				     struct intel_ring *ring);
>   
> @@ -1314,9 +1314,10 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
>   	regs[CTX_RING_TAIL + 1] = ring->tail;
>   
>   	/* RPCS */
> -	if (engine->class == RENDER_CLASS)
> -		regs[CTX_R_PWR_CLK_STATE + 1] = gen8_make_rpcs(engine->i915,
> -							       &ce->sseu);
> +	if (engine->class == RENDER_CLASS) {
> +		regs[CTX_R_PWR_CLK_STATE + 1] =
> +			gen8_make_rpcs(engine->i915, &ce->sseu);
> +	}

Rebasing artifact I guess.

>   }
>   
>   static struct intel_context *
> @@ -2021,7 +2022,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
>   	rq->ring->head = intel_ring_wrap(rq->ring, rq->head);
>   	intel_ring_update_space(rq->ring);
>   
> -	execlists_init_reg_state(regs, rq->gem_context, engine, rq->ring);
> +	execlists_init_reg_state(regs, rq->hw_context, engine, rq->ring);
>   	__execlists_update_reg_state(engine, rq->hw_context);
>   
>   out_unlock:
> @@ -2659,7 +2660,7 @@ static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
>   }
>   
>   static void execlists_init_reg_state(u32 *regs,
> -				     struct i915_gem_context *ctx,
> +				     struct intel_context *ce,
>   				     struct intel_engine_cs *engine,

I can't remember if separate engine param is needed since now there's 
ce->engine. Maybe it comes useful later.

>   				     struct intel_ring *ring)
>   {
> @@ -2735,24 +2736,24 @@ static void execlists_init_reg_state(u32 *regs,
>   	CTX_REG(regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(engine, 0), 0);
>   	CTX_REG(regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(engine, 0), 0);
>   
> -	if (i915_vm_is_48bit(&ctx->ppgtt->vm)) {
> +	if (i915_vm_is_48bit(&ce->gem_context->ppgtt->vm)) {
>   		/* 64b PPGTT (48bit canonical)
>   		 * PDP0_DESCRIPTOR contains the base address to PML4 and
>   		 * other PDP Descriptors are ignored.
>   		 */
> -		ASSIGN_CTX_PML4(ctx->ppgtt, regs);
> +		ASSIGN_CTX_PML4(ce->gem_context->ppgtt, regs);
>   	} else {
> -		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 3);
> -		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 2);
> -		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 1);
> -		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 0);
> +		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 3);
> +		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 2);
> +		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 1);
> +		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 0);

Could be worth caching the ppgtt for aesthetics.

>   	}
>   
>   	if (rcs) {
>   		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
>   		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
>   
> -		i915_oa_init_reg_state(engine, ctx, regs);
> +		i915_oa_init_reg_state(engine, ce, regs);
>   	}
>   
>   	regs[CTX_END] = MI_BATCH_BUFFER_END;
> @@ -2761,7 +2762,7 @@ static void execlists_init_reg_state(u32 *regs,
>   }
>   
>   static int
> -populate_lr_context(struct i915_gem_context *ctx,
> +populate_lr_context(struct intel_context *ce,
>   		    struct drm_i915_gem_object *ctx_obj,
>   		    struct intel_engine_cs *engine,
>   		    struct intel_ring *ring)
> @@ -2807,11 +2808,12 @@ populate_lr_context(struct i915_gem_context *ctx,
>   	/* The second page of the context object contains some fields which must
>   	 * be set up prior to the first execution. */
>   	regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
> -	execlists_init_reg_state(regs, ctx, engine, ring);
> +	execlists_init_reg_state(regs, ce, engine, ring);
>   	if (!engine->default_state)
>   		regs[CTX_CONTEXT_CONTROL + 1] |=
>   			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
> -	if (ctx == ctx->i915->preempt_context && INTEL_GEN(engine->i915) < 11)
> +	if (ce->gem_context == engine->i915->preempt_context &&
> +	    INTEL_GEN(engine->i915) < 11)
>   		regs[CTX_CONTEXT_CONTROL + 1] |=
>   			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
>   					   CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT);
> @@ -2869,7 +2871,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   		goto error_deref_obj;
>   	}
>   
> -	ret = populate_lr_context(ctx, ctx_obj, engine, ring);
> +	ret = populate_lr_context(ce, ctx_obj, engine, ring);
>   	if (ret) {
>   		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
>   		goto error_ring_free;
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header
  2019-03-01 14:03 ` [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header Chris Wilson
@ 2019-03-05 16:19   ` Tvrtko Ursulin
  2019-03-05 16:35     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 16:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> This complex struct pulling in half the driver deserves its own
> isolation in preparation for intel_context becoming an outright
> complicated class of its own.
> 
> In order to split this beast into its own header also requests splitting
> several of its dependent types and their dependencies into their own
> headers as well.

I don't feel like I need to read this one in detail. If it compiles it 
should be good.

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.h       | 245 +-------
>   drivers/gpu/drm/i915/i915_gem_context_types.h | 188 +++++++
>   drivers/gpu/drm/i915/i915_timeline.h          |  70 +--
>   drivers/gpu/drm/i915/i915_timeline_types.h    |  80 +++
>   drivers/gpu/drm/i915/intel_context.h          |  47 ++
>   drivers/gpu/drm/i915/intel_context_types.h    |  60 ++
>   drivers/gpu/drm/i915/intel_engine_types.h     | 521 ++++++++++++++++++
>   drivers/gpu/drm/i915/intel_guc.h              |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h       | 502 +----------------
>   drivers/gpu/drm/i915/intel_workarounds.h      |  13 +-
>   .../gpu/drm/i915/intel_workarounds_types.h    |  25 +
>   11 files changed, 928 insertions(+), 824 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_gem_context_types.h
>   create mode 100644 drivers/gpu/drm/i915/i915_timeline_types.h
>   create mode 100644 drivers/gpu/drm/i915/intel_context.h
>   create mode 100644 drivers/gpu/drm/i915/intel_context_types.h
>   create mode 100644 drivers/gpu/drm/i915/intel_engine_types.h
>   create mode 100644 drivers/gpu/drm/i915/intel_workarounds_types.h
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index f09b5badbe73..110d5881c9de 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -25,225 +25,17 @@
>   #ifndef __I915_GEM_CONTEXT_H__
>   #define __I915_GEM_CONTEXT_H__
>   
> -#include <linux/bitops.h>
> -#include <linux/list.h>
> -#include <linux/radix-tree.h>
> +#include "i915_gem_context_types.h"
>   
>   #include "i915_gem.h"
>   #include "i915_scheduler.h"
> +#include "intel_context.h"
>   #include "intel_device_info.h"
>   #include "intel_ringbuffer.h"
>   
> -struct pid;
> -
>   struct drm_device;
>   struct drm_file;
>   
> -struct drm_i915_private;
> -struct drm_i915_file_private;
> -struct i915_hw_ppgtt;
> -struct i915_request;
> -struct i915_timeline;
> -struct i915_vma;
> -struct intel_ring;
> -
> -#define DEFAULT_CONTEXT_HANDLE 0
> -
> -struct intel_context;
> -
> -struct intel_context_ops {
> -	void (*unpin)(struct intel_context *ce);
> -	void (*destroy)(struct intel_context *ce);
> -};
> -
> -/*
> - * Powergating configuration for a particular (context,engine).
> - */
> -struct intel_sseu {
> -	u8 slice_mask;
> -	u8 subslice_mask;
> -	u8 min_eus_per_subslice;
> -	u8 max_eus_per_subslice;
> -};
> -
> -/**
> - * struct i915_gem_context - client state
> - *
> - * The struct i915_gem_context represents the combined view of the driver and
> - * logical hardware state for a particular client.
> - */
> -struct i915_gem_context {
> -	/** i915: i915 device backpointer */
> -	struct drm_i915_private *i915;
> -
> -	/** file_priv: owning file descriptor */
> -	struct drm_i915_file_private *file_priv;
> -
> -	struct intel_engine_cs **engines;
> -
> -	struct i915_timeline *timeline;
> -
> -	/**
> -	 * @ppgtt: unique address space (GTT)
> -	 *
> -	 * In full-ppgtt mode, each context has its own address space ensuring
> -	 * complete seperation of one client from all others.
> -	 *
> -	 * In other modes, this is a NULL pointer with the expectation that
> -	 * the caller uses the shared global GTT.
> -	 */
> -	struct i915_hw_ppgtt *ppgtt;
> -
> -	/**
> -	 * @pid: process id of creator
> -	 *
> -	 * Note that who created the context may not be the principle user,
> -	 * as the context may be shared across a local socket. However,
> -	 * that should only affect the default context, all contexts created
> -	 * explicitly by the client are expected to be isolated.
> -	 */
> -	struct pid *pid;
> -
> -	/**
> -	 * @name: arbitrary name
> -	 *
> -	 * A name is constructed for the context from the creator's process
> -	 * name, pid and user handle in order to uniquely identify the
> -	 * context in messages.
> -	 */
> -	const char *name;
> -
> -	/** link: place with &drm_i915_private.context_list */
> -	struct list_head link;
> -	struct llist_node free_link;
> -
> -	/**
> -	 * @ref: reference count
> -	 *
> -	 * A reference to a context is held by both the client who created it
> -	 * and on each request submitted to the hardware using the request
> -	 * (to ensure the hardware has access to the state until it has
> -	 * finished all pending writes). See i915_gem_context_get() and
> -	 * i915_gem_context_put() for access.
> -	 */
> -	struct kref ref;
> -
> -	/**
> -	 * @rcu: rcu_head for deferred freeing.
> -	 */
> -	struct rcu_head rcu;
> -
> -	/**
> -	 * @user_flags: small set of booleans controlled by the user
> -	 */
> -	unsigned long user_flags;
> -#define UCONTEXT_NO_ZEROMAP		0
> -#define UCONTEXT_NO_ERROR_CAPTURE	1
> -#define UCONTEXT_BANNABLE		2
> -#define UCONTEXT_RECOVERABLE		3
> -
> -	/**
> -	 * @flags: small set of booleans
> -	 */
> -	unsigned long flags;
> -#define CONTEXT_BANNED			0
> -#define CONTEXT_CLOSED			1
> -#define CONTEXT_FORCE_SINGLE_SUBMISSION	2
> -
> -	unsigned int nengine;
> -
> -	/**
> -	 * @hw_id: - unique identifier for the context
> -	 *
> -	 * The hardware needs to uniquely identify the context for a few
> -	 * functions like fault reporting, PASID, scheduling. The
> -	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
> -	 * id for the lifetime of the context.
> -	 *
> -	 * @hw_id_pin_count: - number of times this context had been pinned
> -	 * for use (should be, at most, once per engine).
> -	 *
> -	 * @hw_id_link: - all contexts with an assigned id are tracked
> -	 * for possible repossession.
> -	 */
> -	unsigned int hw_id;
> -	atomic_t hw_id_pin_count;
> -	struct list_head hw_id_link;
> -
> -	struct list_head active_engines;
> -	struct mutex mutex;
> -
> -	/**
> -	 * @user_handle: userspace identifier
> -	 *
> -	 * A unique per-file identifier is generated from
> -	 * &drm_i915_file_private.contexts.
> -	 */
> -	u32 user_handle;
> -
> -	struct i915_sched_attr sched;
> -
> -	/** engine: per-engine logical HW state */
> -	struct intel_context {
> -		struct i915_gem_context *gem_context;
> -		struct intel_engine_cs *engine;
> -		struct intel_engine_cs *active;
> -		struct list_head active_link;
> -		struct list_head signal_link;
> -		struct list_head signals;
> -		struct i915_vma *state;
> -		struct intel_ring *ring;
> -		u32 *lrc_reg_state;
> -		u64 lrc_desc;
> -		int pin_count;
> -
> -		/**
> -		 * active_tracker: Active tracker for the external rq activity
> -		 * on this intel_context object.
> -		 */
> -		struct i915_active_request active_tracker;
> -
> -		const struct intel_context_ops *ops;
> -
> -		/** sseu: Control eu/slice partitioning */
> -		struct intel_sseu sseu;
> -	} __engine[I915_NUM_ENGINES];
> -
> -	/** ring_size: size for allocating the per-engine ring buffer */
> -	u32 ring_size;
> -	/** desc_template: invariant fields for the HW context descriptor */
> -	u32 desc_template;
> -
> -	/** guilty_count: How many times this context has caused a GPU hang. */
> -	atomic_t guilty_count;
> -	/**
> -	 * @active_count: How many times this context was active during a GPU
> -	 * hang, but did not cause it.
> -	 */
> -	atomic_t active_count;
> -
> -	/**
> -	 * @hang_timestamp: The last time(s) this context caused a GPU hang
> -	 */
> -	unsigned long hang_timestamp[2];
> -#define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */
> -
> -	/** remap_slice: Bitmask of cache lines that need remapping */
> -	u8 remap_slice;
> -
> -	/** handles_vma: rbtree to look up our context specific obj/vma for
> -	 * the user handle. (user handles are per fd, but the binding is
> -	 * per vm, which may be one per context or shared with the global GTT)
> -	 */
> -	struct radix_tree_root handles_vma;
> -
> -	/** handles_list: reverse list of all the rbtree entries in use for
> -	 * this context, which allows us to free all the allocations on
> -	 * context close.
> -	 */
> -	struct list_head handles_list;
> -};
> -
>   static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
>   {
>   	return test_bit(CONTEXT_CLOSED, &ctx->flags);
> @@ -345,35 +137,6 @@ static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
>   	return !ctx->file_priv;
>   }
>   
> -static inline struct intel_context *
> -to_intel_context(struct i915_gem_context *ctx,
> -		 const struct intel_engine_cs *engine)
> -{
> -	return &ctx->__engine[engine->id];
> -}
> -
> -static inline struct intel_context *
> -intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
> -{
> -	return engine->context_pin(engine, ctx);
> -}
> -
> -static inline void __intel_context_pin(struct intel_context *ce)
> -{
> -	GEM_BUG_ON(!ce->pin_count);
> -	ce->pin_count++;
> -}
> -
> -static inline void intel_context_unpin(struct intel_context *ce)
> -{
> -	GEM_BUG_ON(!ce->pin_count);
> -	if (--ce->pin_count)
> -		return;
> -
> -	GEM_BUG_ON(!ce->ops);
> -	ce->ops->unpin(ce);
> -}
> -
>   /* i915_gem_context.c */
>   int __must_check i915_gem_contexts_init(struct drm_i915_private *dev_priv);
>   void i915_gem_contexts_lost(struct drm_i915_private *dev_priv);
> @@ -422,10 +185,6 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
>   	kref_put(&ctx->ref, i915_gem_context_release);
>   }
>   
> -void intel_context_init(struct intel_context *ce,
> -			struct i915_gem_context *ctx,
> -			struct intel_engine_cs *engine);
> -
>   struct i915_lut_handle *i915_lut_handle_alloc(void);
>   void i915_lut_handle_free(struct i915_lut_handle *lut);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
> new file mode 100644
> index 000000000000..8baa7a5e7fdb
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
> @@ -0,0 +1,188 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef __I915_GEM_CONTEXT_TYPES_H__
> +#define __I915_GEM_CONTEXT_TYPES_H__
> +
> +#include <linux/atomic.h>
> +#include <linux/list.h>
> +#include <linux/llist.h>
> +#include <linux/kref.h>
> +#include <linux/mutex.h>
> +#include <linux/radix-tree.h>
> +#include <linux/rcupdate.h>
> +#include <linux/types.h>
> +
> +#include "i915_gem.h" /* I915_NUM_ENGINES */
> +#include "i915_scheduler.h"
> +#include "intel_context_types.h"
> +
> +struct pid;
> +
> +struct drm_i915_private;
> +struct drm_i915_file_private;
> +struct i915_hw_ppgtt;
> +struct i915_timeline;
> +struct intel_ring;
> +
> +/**
> + * struct i915_gem_context - client state
> + *
> + * The struct i915_gem_context represents the combined view of the driver and
> + * logical hardware state for a particular client.
> + */
> +struct i915_gem_context {
> +	/** i915: i915 device backpointer */
> +	struct drm_i915_private *i915;
> +
> +	/** file_priv: owning file descriptor */
> +	struct drm_i915_file_private *file_priv;
> +
> +	struct intel_engine_cs **engines;
> +
> +	struct i915_timeline *timeline;
> +
> +	/**
> +	 * @ppgtt: unique address space (GTT)
> +	 *
> +	 * In full-ppgtt mode, each context has its own address space ensuring
> +	 * complete seperation of one client from all others.
> +	 *
> +	 * In other modes, this is a NULL pointer with the expectation that
> +	 * the caller uses the shared global GTT.
> +	 */
> +	struct i915_hw_ppgtt *ppgtt;
> +
> +	/**
> +	 * @pid: process id of creator
> +	 *
> +	 * Note that who created the context may not be the principle user,
> +	 * as the context may be shared across a local socket. However,
> +	 * that should only affect the default context, all contexts created
> +	 * explicitly by the client are expected to be isolated.
> +	 */
> +	struct pid *pid;
> +
> +	/**
> +	 * @name: arbitrary name
> +	 *
> +	 * A name is constructed for the context from the creator's process
> +	 * name, pid and user handle in order to uniquely identify the
> +	 * context in messages.
> +	 */
> +	const char *name;
> +
> +	/** link: place with &drm_i915_private.context_list */
> +	struct list_head link;
> +	struct llist_node free_link;
> +
> +	/**
> +	 * @ref: reference count
> +	 *
> +	 * A reference to a context is held by both the client who created it
> +	 * and on each request submitted to the hardware using the request
> +	 * (to ensure the hardware has access to the state until it has
> +	 * finished all pending writes). See i915_gem_context_get() and
> +	 * i915_gem_context_put() for access.
> +	 */
> +	struct kref ref;
> +
> +	/**
> +	 * @rcu: rcu_head for deferred freeing.
> +	 */
> +	struct rcu_head rcu;
> +
> +	/**
> +	 * @user_flags: small set of booleans controlled by the user
> +	 */
> +	unsigned long user_flags;
> +#define UCONTEXT_NO_ZEROMAP		0
> +#define UCONTEXT_NO_ERROR_CAPTURE	1
> +#define UCONTEXT_BANNABLE		2
> +#define UCONTEXT_RECOVERABLE		3
> +
> +	/**
> +	 * @flags: small set of booleans
> +	 */
> +	unsigned long flags;
> +#define CONTEXT_BANNED			0
> +#define CONTEXT_CLOSED			1
> +#define CONTEXT_FORCE_SINGLE_SUBMISSION	2
> +
> +	unsigned int nengine;
> +
> +	/**
> +	 * @hw_id: - unique identifier for the context
> +	 *
> +	 * The hardware needs to uniquely identify the context for a few
> +	 * functions like fault reporting, PASID, scheduling. The
> +	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
> +	 * id for the lifetime of the context.
> +	 *
> +	 * @hw_id_pin_count: - number of times this context had been pinned
> +	 * for use (should be, at most, once per engine).
> +	 *
> +	 * @hw_id_link: - all contexts with an assigned id are tracked
> +	 * for possible repossession.
> +	 */
> +	unsigned int hw_id;
> +	atomic_t hw_id_pin_count;
> +	struct list_head hw_id_link;
> +
> +	struct list_head active_engines;
> +	struct mutex mutex;
> +
> +	/**
> +	 * @user_handle: userspace identifier
> +	 *
> +	 * A unique per-file identifier is generated from
> +	 * &drm_i915_file_private.contexts.
> +	 */
> +	u32 user_handle;
> +#define DEFAULT_CONTEXT_HANDLE 0
> +
> +	struct i915_sched_attr sched;
> +
> +	/** engine: per-engine logical HW state */
> +	struct intel_context __engine[I915_NUM_ENGINES];
> +
> +	/** ring_size: size for allocating the per-engine ring buffer */
> +	u32 ring_size;
> +	/** desc_template: invariant fields for the HW context descriptor */
> +	u32 desc_template;
> +
> +	/** guilty_count: How many times this context has caused a GPU hang. */
> +	atomic_t guilty_count;
> +	/**
> +	 * @active_count: How many times this context was active during a GPU
> +	 * hang, but did not cause it.
> +	 */
> +	atomic_t active_count;
> +
> +	/**
> +	 * @hang_timestamp: The last time(s) this context caused a GPU hang
> +	 */
> +	unsigned long hang_timestamp[2];
> +#define CONTEXT_FAST_HANG_JIFFIES (120 * HZ) /* 3 hangs within 120s? Banned! */
> +
> +
> +	/** remap_slice: Bitmask of cache lines that need remapping */
> +	u8 remap_slice;
> +
> +	/** handles_vma: rbtree to look up our context specific obj/vma for
> +	 * the user handle. (user handles are per fd, but the binding is
> +	 * per vm, which may be one per context or shared with the global GTT)
> +	 */
> +	struct radix_tree_root handles_vma;
> +
> +	/** handles_list: reverse list of all the rbtree entries in use for
> +	 * this context, which allows us to free all the allocations on
> +	 * context close.
> +	 */
> +	struct list_head handles_list;
> +};
> +
> +#endif /* __I915_GEM_CONTEXT_TYPES_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 60b1dfad93ed..9126c8206490 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -25,76 +25,10 @@
>   #ifndef I915_TIMELINE_H
>   #define I915_TIMELINE_H
>   
> -#include <linux/list.h>
> -#include <linux/kref.h>
> +#include <linux/lockdep.h>
>   
> -#include "i915_active.h"
> -#include "i915_request.h"
>   #include "i915_syncmap.h"
> -#include "i915_utils.h"
> -
> -struct i915_vma;
> -struct i915_timeline_cacheline;
> -
> -struct i915_timeline {
> -	u64 fence_context;
> -	u32 seqno;
> -
> -	spinlock_t lock;
> -#define TIMELINE_CLIENT 0 /* default subclass */
> -#define TIMELINE_ENGINE 1
> -
> -	struct mutex mutex; /* protects the flow of requests */
> -
> -	unsigned int pin_count;
> -	const u32 *hwsp_seqno;
> -	struct i915_vma *hwsp_ggtt;
> -	u32 hwsp_offset;
> -
> -	struct i915_timeline_cacheline *hwsp_cacheline;
> -
> -	bool has_initial_breadcrumb;
> -
> -	/**
> -	 * List of breadcrumbs associated with GPU requests currently
> -	 * outstanding.
> -	 */
> -	struct list_head requests;
> -
> -	/* Contains an RCU guarded pointer to the last request. No reference is
> -	 * held to the request, users must carefully acquire a reference to
> -	 * the request using i915_active_request_get_request_rcu(), or hold the
> -	 * struct_mutex.
> -	 */
> -	struct i915_active_request last_request;
> -
> -	/**
> -	 * We track the most recent seqno that we wait on in every context so
> -	 * that we only have to emit a new await and dependency on a more
> -	 * recent sync point. As the contexts may be executed out-of-order, we
> -	 * have to track each individually and can not rely on an absolute
> -	 * global_seqno. When we know that all tracked fences are completed
> -	 * (i.e. when the driver is idle), we know that the syncmap is
> -	 * redundant and we can discard it without loss of generality.
> -	 */
> -	struct i915_syncmap *sync;
> -
> -	/**
> -	 * Barrier provides the ability to serialize ordering between different
> -	 * timelines.
> -	 *
> -	 * Users can call i915_timeline_set_barrier which will make all
> -	 * subsequent submissions to this timeline be executed only after the
> -	 * barrier has been completed.
> -	 */
> -	struct i915_active_request barrier;
> -
> -	struct list_head link;
> -	const char *name;
> -	struct drm_i915_private *i915;
> -
> -	struct kref kref;
> -};
> +#include "i915_timeline_types.h"
>   
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *tl,
> diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
> new file mode 100644
> index 000000000000..8ff146dc05ba
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_timeline_types.h
> @@ -0,0 +1,80 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2016 Intel Corporation
> + */
> +
> +#ifndef __I915_TIMELINE_TYPES_H__
> +#define __I915_TIMELINE_TYPES_H__
> +
> +#include <linux/list.h>
> +#include <linux/kref.h>
> +#include <linux/types.h>
> +
> +#include "i915_active.h"
> +
> +struct drm_i915_private;
> +struct i915_vma;
> +struct i915_timeline_cacheline;
> +struct i915_syncmap;
> +
> +struct i915_timeline {
> +	u64 fence_context;
> +	u32 seqno;
> +
> +	spinlock_t lock;
> +#define TIMELINE_CLIENT 0 /* default subclass */
> +#define TIMELINE_ENGINE 1
> +	struct mutex mutex; /* protects the flow of requests */
> +
> +	unsigned int pin_count;
> +	const u32 *hwsp_seqno;
> +	struct i915_vma *hwsp_ggtt;
> +	u32 hwsp_offset;
> +
> +	struct i915_timeline_cacheline *hwsp_cacheline;
> +
> +	bool has_initial_breadcrumb;
> +
> +	/**
> +	 * List of breadcrumbs associated with GPU requests currently
> +	 * outstanding.
> +	 */
> +	struct list_head requests;
> +
> +	/* Contains an RCU guarded pointer to the last request. No reference is
> +	 * held to the request, users must carefully acquire a reference to
> +	 * the request using i915_active_request_get_request_rcu(), or hold the
> +	 * struct_mutex.
> +	 */
> +	struct i915_active_request last_request;
> +
> +	/**
> +	 * We track the most recent seqno that we wait on in every context so
> +	 * that we only have to emit a new await and dependency on a more
> +	 * recent sync point. As the contexts may be executed out-of-order, we
> +	 * have to track each individually and can not rely on an absolute
> +	 * global_seqno. When we know that all tracked fences are completed
> +	 * (i.e. when the driver is idle), we know that the syncmap is
> +	 * redundant and we can discard it without loss of generality.
> +	 */
> +	struct i915_syncmap *sync;
> +
> +	/**
> +	 * Barrier provides the ability to serialize ordering between different
> +	 * timelines.
> +	 *
> +	 * Users can call i915_timeline_set_barrier which will make all
> +	 * subsequent submissions to this timeline be executed only after the
> +	 * barrier has been completed.
> +	 */
> +	struct i915_active_request barrier;
> +
> +	struct list_head link;
> +	const char *name;
> +	struct drm_i915_private *i915;
> +
> +	struct kref kref;
> +};
> +
> +#endif /* __I915_TIMELINE_TYPES_H__ */
> diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
> new file mode 100644
> index 000000000000..dd947692bb0b
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_context.h
> @@ -0,0 +1,47 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef __INTEL_CONTEXT_H__
> +#define __INTEL_CONTEXT_H__
> +
> +#include "i915_gem_context_types.h"
> +#include "intel_context_types.h"
> +#include "intel_engine_types.h"
> +
> +void intel_context_init(struct intel_context *ce,
> +			struct i915_gem_context *ctx,
> +			struct intel_engine_cs *engine);
> +
> +static inline struct intel_context *
> +to_intel_context(struct i915_gem_context *ctx,
> +		 const struct intel_engine_cs *engine)
> +{
> +	return &ctx->__engine[engine->id];
> +}
> +
> +static inline struct intel_context *
> +intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
> +{
> +	return engine->context_pin(engine, ctx);
> +}
> +
> +static inline void __intel_context_pin(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(!ce->pin_count);
> +	ce->pin_count++;
> +}
> +
> +static inline void intel_context_unpin(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(!ce->pin_count);
> +	if (--ce->pin_count)
> +		return;
> +
> +	GEM_BUG_ON(!ce->ops);
> +	ce->ops->unpin(ce);
> +}
> +
> +#endif /* __INTEL_CONTEXT_H__ */
> diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
> new file mode 100644
> index 000000000000..16e1306e9595
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_context_types.h
> @@ -0,0 +1,60 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef __INTEL_CONTEXT_TYPES__
> +#define __INTEL_CONTEXT_TYPES__
> +
> +#include <linux/list.h>
> +#include <linux/types.h>
> +
> +#include "i915_active_types.h"
> +
> +struct i915_gem_context;
> +struct i915_vma;
> +struct intel_context;
> +struct intel_ring;
> +
> +struct intel_context_ops {
> +	void (*unpin)(struct intel_context *ce);
> +	void (*destroy)(struct intel_context *ce);
> +};
> +
> +/*
> + * Powergating configuration for a particular (context,engine).
> + */
> +struct intel_sseu {
> +	u8 slice_mask;
> +	u8 subslice_mask;
> +	u8 min_eus_per_subslice;
> +	u8 max_eus_per_subslice;
> +};
> +
> +struct intel_context {
> +	struct i915_gem_context *gem_context;
> +	struct intel_engine_cs *engine;
> +	struct intel_engine_cs *active;
> +	struct list_head active_link;
> +	struct list_head signal_link;
> +	struct list_head signals;
> +	struct i915_vma *state;
> +	struct intel_ring *ring;
> +	u32 *lrc_reg_state;
> +	u64 lrc_desc;
> +	int pin_count;
> +
> +	/**
> +	 * active_tracker: Active tracker for the external rq activity
> +	 * on this intel_context object.
> +	 */
> +	struct i915_active_request active_tracker;
> +
> +	const struct intel_context_ops *ops;
> +
> +	/** sseu: Control eu/slice partitioning */
> +	struct intel_sseu sseu;
> +};
> +
> +#endif /* __INTEL_CONTEXT_TYPES__ */
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> new file mode 100644
> index 000000000000..5ec6e72d0ffb
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -0,0 +1,521 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef __INTEL_ENGINE_TYPES__
> +#define __INTEL_ENGINE_TYPES__
> +
> +#include <linux/hashtable.h>
> +#include <linux/irq_work.h>
> +#include <linux/list.h>
> +#include <linux/types.h>
> +
> +#include "i915_timeline_types.h"
> +#include "intel_device_info.h"
> +#include "intel_workarounds_types.h"
> +
> +#include "i915_gem_batch_pool.h"
> +#include "i915_pmu.h"
> +
> +#define I915_MAX_SLICES	3
> +#define I915_MAX_SUBSLICES 8
> +
> +#define I915_CMD_HASH_ORDER 9
> +
> +struct drm_i915_reg_table;
> +struct i915_gem_context;
> +struct i915_request;
> +struct i915_sched_attr;
> +
> +struct intel_hw_status_page {
> +	struct i915_vma *vma;
> +	u32 *addr;
> +};
> +
> +struct intel_instdone {
> +	u32 instdone;
> +	/* The following exist only in the RCS engine */
> +	u32 slice_common;
> +	u32 sampler[I915_MAX_SLICES][I915_MAX_SUBSLICES];
> +	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
> +};
> +
> +struct intel_engine_hangcheck {
> +	u64 acthd;
> +	u32 last_seqno;
> +	u32 next_seqno;
> +	unsigned long action_timestamp;
> +	struct intel_instdone instdone;
> +};
> +
> +struct intel_ring {
> +	struct i915_vma *vma;
> +	void *vaddr;
> +
> +	struct i915_timeline *timeline;
> +	struct list_head request_list;
> +	struct list_head active_link;
> +
> +	u32 head;
> +	u32 tail;
> +	u32 emit;
> +
> +	u32 space;
> +	u32 size;
> +	u32 effective_size;
> +};
> +
> +/*
> + * we use a single page to load ctx workarounds so all of these
> + * values are referred in terms of dwords
> + *
> + * struct i915_wa_ctx_bb:
> + *  offset: specifies batch starting position, also helpful in case
> + *    if we want to have multiple batches at different offsets based on
> + *    some criteria. It is not a requirement at the moment but provides
> + *    an option for future use.
> + *  size: size of the batch in DWORDS
> + */
> +struct i915_ctx_workarounds {
> +	struct i915_wa_ctx_bb {
> +		u32 offset;
> +		u32 size;
> +	} indirect_ctx, per_ctx;
> +	struct i915_vma *vma;
> +};
> +
> +#define I915_MAX_VCS	4
> +#define I915_MAX_VECS	2
> +
> +/*
> + * Engine IDs definitions.
> + * Keep instances of the same type engine together.
> + */
> +enum intel_engine_id {
> +	RCS = 0,
> +	BCS,
> +	VCS,
> +	VCS2,
> +	VCS3,
> +	VCS4,
> +#define _VCS(n) (VCS + (n))
> +	VECS,
> +	VECS2
> +#define _VECS(n) (VECS + (n))
> +};
> +
> +struct st_preempt_hang {
> +	struct completion completion;
> +	unsigned int count;
> +	bool inject_hang;
> +};
> +
> +/**
> + * struct intel_engine_execlists - execlist submission queue and port state
> + *
> + * The struct intel_engine_execlists represents the combined logical state of
> + * driver and the hardware state for execlist mode of submission.
> + */
> +struct intel_engine_execlists {
> +	/**
> +	 * @tasklet: softirq tasklet for bottom handler
> +	 */
> +	struct tasklet_struct tasklet;
> +
> +	/**
> +	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> +	 */
> +	struct i915_priolist default_priolist;
> +
> +	/**
> +	 * @no_priolist: priority lists disabled
> +	 */
> +	bool no_priolist;
> +
> +	/**
> +	 * @submit_reg: gen-specific execlist submission register
> +	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
> +	 * the ExecList Submission Queue Contents register array for Gen11+
> +	 */
> +	u32 __iomem *submit_reg;
> +
> +	/**
> +	 * @ctrl_reg: the enhanced execlists control register, used to load the
> +	 * submit queue on the HW and to request preemptions to idle
> +	 */
> +	u32 __iomem *ctrl_reg;
> +
> +	/**
> +	 * @port: execlist port states
> +	 *
> +	 * For each hardware ELSP (ExecList Submission Port) we keep
> +	 * track of the last request and the number of times we submitted
> +	 * that port to hw. We then count the number of times the hw reports
> +	 * a context completion or preemption. As only one context can
> +	 * be active on hw, we limit resubmission of context to port[0]. This
> +	 * is called Lite Restore, of the context.
> +	 */
> +	struct execlist_port {
> +		/**
> +		 * @request_count: combined request and submission count
> +		 */
> +		struct i915_request *request_count;
> +#define EXECLIST_COUNT_BITS 2
> +#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> +#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> +#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
> +#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
> +#define port_set(p, packed) ((p)->request_count = (packed))
> +#define port_isset(p) ((p)->request_count)
> +#define port_index(p, execlists) ((p) - (execlists)->port)
> +
> +		/**
> +		 * @context_id: context ID for port
> +		 */
> +		GEM_DEBUG_DECL(u32 context_id);
> +
> +#define EXECLIST_MAX_PORTS 2
> +	} port[EXECLIST_MAX_PORTS];
> +
> +	/**
> +	 * @active: is the HW active? We consider the HW as active after
> +	 * submitting any context for execution and until we have seen the
> +	 * last context completion event. After that, we do not expect any
> +	 * more events until we submit, and so can park the HW.
> +	 *
> +	 * As we have a small number of different sources from which we feed
> +	 * the HW, we track the state of each inside a single bitfield.
> +	 */
> +	unsigned int active;
> +#define EXECLISTS_ACTIVE_USER 0
> +#define EXECLISTS_ACTIVE_PREEMPT 1
> +#define EXECLISTS_ACTIVE_HWACK 2
> +
> +	/**
> +	 * @port_mask: number of execlist ports - 1
> +	 */
> +	unsigned int port_mask;
> +
> +	/**
> +	 * @queue_priority_hint: Highest pending priority.
> +	 *
> +	 * When we add requests into the queue, or adjust the priority of
> +	 * executing requests, we compute the maximum priority of those
> +	 * pending requests. We can then use this value to determine if
> +	 * we need to preempt the executing requests to service the queue.
> +	 * However, since the we may have recorded the priority of an inflight
> +	 * request we wanted to preempt but since completed, at the time of
> +	 * dequeuing the priority hint may no longer may match the highest
> +	 * available request priority.
> +	 */
> +	int queue_priority_hint;
> +
> +	/**
> +	 * @queue: queue of requests, in priority lists
> +	 */
> +	struct rb_root_cached queue;
> +
> +	/**
> +	 * @csb_write: control register for Context Switch buffer
> +	 *
> +	 * Note this register may be either mmio or HWSP shadow.
> +	 */
> +	u32 *csb_write;
> +
> +	/**
> +	 * @csb_status: status array for Context Switch buffer
> +	 *
> +	 * Note these register may be either mmio or HWSP shadow.
> +	 */
> +	u32 *csb_status;
> +
> +	/**
> +	 * @preempt_complete_status: expected CSB upon completing preemption
> +	 */
> +	u32 preempt_complete_status;
> +
> +	/**
> +	 * @csb_head: context status buffer head
> +	 */
> +	u8 csb_head;
> +
> +	I915_SELFTEST_DECLARE(struct st_preempt_hang preempt_hang;)
> +};
> +
> +#define INTEL_ENGINE_CS_MAX_NAME 8
> +
> +struct intel_engine_cs {
> +	struct drm_i915_private *i915;
> +	char name[INTEL_ENGINE_CS_MAX_NAME];
> +
> +	enum intel_engine_id id;
> +	unsigned int hw_id;
> +	unsigned int guc_id;
> +	intel_engine_mask_t mask;
> +
> +	u8 uabi_class;
> +
> +	u8 class;
> +	u8 instance;
> +	u32 context_size;
> +	u32 mmio_base;
> +
> +	struct intel_ring *buffer;
> +
> +	struct i915_timeline timeline;
> +
> +	struct drm_i915_gem_object *default_state;
> +	void *pinned_default_state;
> +
> +	/* Rather than have every client wait upon all user interrupts,
> +	 * with the herd waking after every interrupt and each doing the
> +	 * heavyweight seqno dance, we delegate the task (of being the
> +	 * bottom-half of the user interrupt) to the first client. After
> +	 * every interrupt, we wake up one client, who does the heavyweight
> +	 * coherent seqno read and either goes back to sleep (if incomplete),
> +	 * or wakes up all the completed clients in parallel, before then
> +	 * transferring the bottom-half status to the next client in the queue.
> +	 *
> +	 * Compared to walking the entire list of waiters in a single dedicated
> +	 * bottom-half, we reduce the latency of the first waiter by avoiding
> +	 * a context switch, but incur additional coherent seqno reads when
> +	 * following the chain of request breadcrumbs. Since it is most likely
> +	 * that we have a single client waiting on each seqno, then reducing
> +	 * the overhead of waking that client is much preferred.
> +	 */
> +	struct intel_breadcrumbs {
> +		spinlock_t irq_lock;
> +		struct list_head signalers;
> +
> +		struct irq_work irq_work; /* for use from inside irq_lock */
> +
> +		unsigned int irq_enabled;
> +
> +		bool irq_armed;
> +	} breadcrumbs;
> +
> +	struct intel_engine_pmu {
> +		/**
> +		 * @enable: Bitmask of enable sample events on this engine.
> +		 *
> +		 * Bits correspond to sample event types, for instance
> +		 * I915_SAMPLE_QUEUED is bit 0 etc.
> +		 */
> +		u32 enable;
> +		/**
> +		 * @enable_count: Reference count for the enabled samplers.
> +		 *
> +		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
> +		 */
> +		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
> +		/**
> +		 * @sample: Counter values for sampling events.
> +		 *
> +		 * Our internal timer stores the current counters in this field.
> +		 *
> +		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
> +		 */
> +		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
> +	} pmu;
> +
> +	/*
> +	 * A pool of objects to use as shadow copies of client batch buffers
> +	 * when the command parser is enabled. Prevents the client from
> +	 * modifying the batch contents after software parsing.
> +	 */
> +	struct i915_gem_batch_pool batch_pool;
> +
> +	struct intel_hw_status_page status_page;
> +	struct i915_ctx_workarounds wa_ctx;
> +	struct i915_wa_list ctx_wa_list;
> +	struct i915_wa_list wa_list;
> +	struct i915_wa_list whitelist;
> +
> +	u32             irq_keep_mask; /* always keep these interrupts */
> +	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
> +	void		(*irq_enable)(struct intel_engine_cs *engine);
> +	void		(*irq_disable)(struct intel_engine_cs *engine);
> +
> +	int		(*init_hw)(struct intel_engine_cs *engine);
> +
> +	struct {
> +		void (*prepare)(struct intel_engine_cs *engine);
> +		void (*reset)(struct intel_engine_cs *engine, bool stalled);
> +		void (*finish)(struct intel_engine_cs *engine);
> +	} reset;
> +
> +	void		(*park)(struct intel_engine_cs *engine);
> +	void		(*unpark)(struct intel_engine_cs *engine);
> +
> +	void		(*set_default_submission)(struct intel_engine_cs *engine);
> +
> +	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
> +					     struct i915_gem_context *ctx);
> +
> +	int		(*request_alloc)(struct i915_request *rq);
> +	int		(*init_context)(struct i915_request *rq);
> +
> +	int		(*emit_flush)(struct i915_request *request, u32 mode);
> +#define EMIT_INVALIDATE	BIT(0)
> +#define EMIT_FLUSH	BIT(1)
> +#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
> +	int		(*emit_bb_start)(struct i915_request *rq,
> +					 u64 offset, u32 length,
> +					 unsigned int dispatch_flags);
> +#define I915_DISPATCH_SECURE BIT(0)
> +#define I915_DISPATCH_PINNED BIT(1)
> +	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
> +	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
> +						 u32 *cs);
> +	unsigned int	emit_fini_breadcrumb_dw;
> +
> +	/* Pass the request to the hardware queue (e.g. directly into
> +	 * the legacy ringbuffer or to the end of an execlist).
> +	 *
> +	 * This is called from an atomic context with irqs disabled; must
> +	 * be irq safe.
> +	 */
> +	void		(*submit_request)(struct i915_request *rq);
> +
> +	/*
> +	 * Call when the priority on a request has changed and it and its
> +	 * dependencies may need rescheduling. Note the request itself may
> +	 * not be ready to run!
> +	 */
> +	void		(*schedule)(struct i915_request *request,
> +				    const struct i915_sched_attr *attr);
> +
> +	/*
> +	 * Cancel all requests on the hardware, or queued for execution.
> +	 * This should only cancel the ready requests that have been
> +	 * submitted to the engine (via the engine->submit_request callback).
> +	 * This is called when marking the device as wedged.
> +	 */
> +	void		(*cancel_requests)(struct intel_engine_cs *engine);
> +
> +	void		(*cleanup)(struct intel_engine_cs *engine);
> +
> +	struct intel_engine_execlists execlists;
> +
> +	/* Contexts are pinned whilst they are active on the GPU. The last
> +	 * context executed remains active whilst the GPU is idle - the
> +	 * switch away and write to the context object only occurs on the
> +	 * next execution.  Contexts are only unpinned on retirement of the
> +	 * following request ensuring that we can always write to the object
> +	 * on the context switch even after idling. Across suspend, we switch
> +	 * to the kernel context and trash it as the save may not happen
> +	 * before the hardware is powered down.
> +	 */
> +	struct intel_context *last_retired_context;
> +
> +	/* status_notifier: list of callbacks for context-switch changes */
> +	struct atomic_notifier_head context_status_notifier;
> +
> +	struct intel_engine_hangcheck hangcheck;
> +
> +#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
> +#define I915_ENGINE_SUPPORTS_STATS   BIT(1)
> +#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> +#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> +	unsigned int flags;
> +
> +	/*
> +	 * Table of commands the command parser needs to know about
> +	 * for this engine.
> +	 */
> +	DECLARE_HASHTABLE(cmd_hash, I915_CMD_HASH_ORDER);
> +
> +	/*
> +	 * Table of registers allowed in commands that read/write registers.
> +	 */
> +	const struct drm_i915_reg_table *reg_tables;
> +	int reg_table_count;
> +
> +	/*
> +	 * Returns the bitmask for the length field of the specified command.
> +	 * Return 0 for an unrecognized/invalid command.
> +	 *
> +	 * If the command parser finds an entry for a command in the engine's
> +	 * cmd_tables, it gets the command's length based on the table entry.
> +	 * If not, it calls this function to determine the per-engine length
> +	 * field encoding for the command (i.e. different opcode ranges use
> +	 * certain bits to encode the command length in the header).
> +	 */
> +	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	struct {
> +		/**
> +		 * @lock: Lock protecting the below fields.
> +		 */
> +		seqlock_t lock;
> +		/**
> +		 * @enabled: Reference count indicating number of listeners.
> +		 */
> +		unsigned int enabled;
> +		/**
> +		 * @active: Number of contexts currently scheduled in.
> +		 */
> +		unsigned int active;
> +		/**
> +		 * @enabled_at: Timestamp when busy stats were enabled.
> +		 */
> +		ktime_t enabled_at;
> +		/**
> +		 * @start: Timestamp of the last idle to active transition.
> +		 *
> +		 * Idle is defined as active == 0, active is active > 0.
> +		 */
> +		ktime_t start;
> +		/**
> +		 * @total: Total time this engine was busy.
> +		 *
> +		 * Accumulated time not counting the most recent block in cases
> +		 * where engine is currently busy (active > 0).
> +		 */
> +		ktime_t total;
> +	} stats;
> +};
> +
> +static inline bool
> +intel_engine_needs_cmd_parser(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_NEEDS_CMD_PARSER;
> +}
> +
> +static inline bool
> +intel_engine_supports_stats(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
> +}
> +
> +static inline bool
> +intel_engine_has_preemption(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
> +}
> +
> +static inline bool
> +intel_engine_has_semaphores(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> +}
> +
> +#define instdone_slice_mask(dev_priv__) \
> +	(IS_GEN(dev_priv__, 7) ? \
> +	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> +
> +#define instdone_subslice_mask(dev_priv__) \
> +	(IS_GEN(dev_priv__, 7) ? \
> +	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])
> +
> +#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
> +	for ((slice__) = 0, (subslice__) = 0; \
> +	     (slice__) < I915_MAX_SLICES; \
> +	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
> +	       (slice__) += ((subslice__) == 0)) \
> +		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
> +			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))
> +
> +#endif /* __INTEL_ENGINE_TYPES_H__ */
> diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
> index 744220296653..77ec1bd4df5a 100644
> --- a/drivers/gpu/drm/i915/intel_guc.h
> +++ b/drivers/gpu/drm/i915/intel_guc.h
> @@ -32,6 +32,7 @@
>   #include "intel_guc_log.h"
>   #include "intel_guc_reg.h"
>   #include "intel_uc_fw.h"
> +#include "i915_utils.h"
>   #include "i915_vma.h"
>   
>   struct guc_preempt_work {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 13819de57ef9..5fe38f74b480 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -15,14 +15,11 @@
>   #include "i915_request.h"
>   #include "i915_selftest.h"
>   #include "i915_timeline.h"
> -#include "intel_device_info.h"
> +#include "intel_engine_types.h"
>   #include "intel_gpu_commands.h"
>   #include "intel_workarounds.h"
>   
>   struct drm_printer;
> -struct i915_sched_attr;
> -
> -#define I915_CMD_HASH_ORDER 9
>   
>   /* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
>    * but keeps the logic simple. Indeed, the whole purpose of this macro is just
> @@ -32,11 +29,6 @@ struct i915_sched_attr;
>   #define CACHELINE_BYTES 64
>   #define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(u32))
>   
> -struct intel_hw_status_page {
> -	struct i915_vma *vma;
> -	u32 *addr;
> -};
> -
>   #define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
>   #define I915_WRITE_TAIL(engine, val) I915_WRITE(RING_TAIL((engine)->mmio_base), val)
>   
> @@ -91,498 +83,6 @@ hangcheck_action_to_str(const enum intel_engine_hangcheck_action a)
>   	return "unknown";
>   }
>   
> -#define I915_MAX_SLICES	3
> -#define I915_MAX_SUBSLICES 8
> -
> -#define instdone_slice_mask(dev_priv__) \
> -	(IS_GEN(dev_priv__, 7) ? \
> -	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> -
> -#define instdone_subslice_mask(dev_priv__) \
> -	(IS_GEN(dev_priv__, 7) ? \
> -	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])
> -
> -#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
> -	for ((slice__) = 0, (subslice__) = 0; \
> -	     (slice__) < I915_MAX_SLICES; \
> -	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
> -	       (slice__) += ((subslice__) == 0)) \
> -		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
> -			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))
> -
> -struct intel_instdone {
> -	u32 instdone;
> -	/* The following exist only in the RCS engine */
> -	u32 slice_common;
> -	u32 sampler[I915_MAX_SLICES][I915_MAX_SUBSLICES];
> -	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
> -};
> -
> -struct intel_engine_hangcheck {
> -	u64 acthd;
> -	u32 last_seqno;
> -	u32 next_seqno;
> -	unsigned long action_timestamp;
> -	struct intel_instdone instdone;
> -};
> -
> -struct intel_ring {
> -	struct i915_vma *vma;
> -	void *vaddr;
> -
> -	struct i915_timeline *timeline;
> -	struct list_head request_list;
> -	struct list_head active_link;
> -
> -	u32 head;
> -	u32 tail;
> -	u32 emit;
> -
> -	u32 space;
> -	u32 size;
> -	u32 effective_size;
> -};
> -
> -struct i915_gem_context;
> -struct drm_i915_reg_table;
> -
> -/*
> - * we use a single page to load ctx workarounds so all of these
> - * values are referred in terms of dwords
> - *
> - * struct i915_wa_ctx_bb:
> - *  offset: specifies batch starting position, also helpful in case
> - *    if we want to have multiple batches at different offsets based on
> - *    some criteria. It is not a requirement at the moment but provides
> - *    an option for future use.
> - *  size: size of the batch in DWORDS
> - */
> -struct i915_ctx_workarounds {
> -	struct i915_wa_ctx_bb {
> -		u32 offset;
> -		u32 size;
> -	} indirect_ctx, per_ctx;
> -	struct i915_vma *vma;
> -};
> -
> -struct i915_request;
> -
> -#define I915_MAX_VCS	4
> -#define I915_MAX_VECS	2
> -
> -/*
> - * Engine IDs definitions.
> - * Keep instances of the same type engine together.
> - */
> -enum intel_engine_id {
> -	RCS = 0,
> -	BCS,
> -	VCS,
> -	VCS2,
> -	VCS3,
> -	VCS4,
> -#define _VCS(n) (VCS + (n))
> -	VECS,
> -	VECS2
> -#define _VECS(n) (VECS + (n))
> -};
> -
> -struct st_preempt_hang {
> -	struct completion completion;
> -	unsigned int count;
> -	bool inject_hang;
> -};
> -
> -/**
> - * struct intel_engine_execlists - execlist submission queue and port state
> - *
> - * The struct intel_engine_execlists represents the combined logical state of
> - * driver and the hardware state for execlist mode of submission.
> - */
> -struct intel_engine_execlists {
> -	/**
> -	 * @tasklet: softirq tasklet for bottom handler
> -	 */
> -	struct tasklet_struct tasklet;
> -
> -	/**
> -	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
> -	 */
> -	struct i915_priolist default_priolist;
> -
> -	/**
> -	 * @no_priolist: priority lists disabled
> -	 */
> -	bool no_priolist;
> -
> -	/**
> -	 * @submit_reg: gen-specific execlist submission register
> -	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
> -	 * the ExecList Submission Queue Contents register array for Gen11+
> -	 */
> -	u32 __iomem *submit_reg;
> -
> -	/**
> -	 * @ctrl_reg: the enhanced execlists control register, used to load the
> -	 * submit queue on the HW and to request preemptions to idle
> -	 */
> -	u32 __iomem *ctrl_reg;
> -
> -	/**
> -	 * @port: execlist port states
> -	 *
> -	 * For each hardware ELSP (ExecList Submission Port) we keep
> -	 * track of the last request and the number of times we submitted
> -	 * that port to hw. We then count the number of times the hw reports
> -	 * a context completion or preemption. As only one context can
> -	 * be active on hw, we limit resubmission of context to port[0]. This
> -	 * is called Lite Restore, of the context.
> -	 */
> -	struct execlist_port {
> -		/**
> -		 * @request_count: combined request and submission count
> -		 */
> -		struct i915_request *request_count;
> -#define EXECLIST_COUNT_BITS 2
> -#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> -#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
> -#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
> -#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
> -#define port_set(p, packed) ((p)->request_count = (packed))
> -#define port_isset(p) ((p)->request_count)
> -#define port_index(p, execlists) ((p) - (execlists)->port)
> -
> -		/**
> -		 * @context_id: context ID for port
> -		 */
> -		GEM_DEBUG_DECL(u32 context_id);
> -
> -#define EXECLIST_MAX_PORTS 2
> -	} port[EXECLIST_MAX_PORTS];
> -
> -	/**
> -	 * @active: is the HW active? We consider the HW as active after
> -	 * submitting any context for execution and until we have seen the
> -	 * last context completion event. After that, we do not expect any
> -	 * more events until we submit, and so can park the HW.
> -	 *
> -	 * As we have a small number of different sources from which we feed
> -	 * the HW, we track the state of each inside a single bitfield.
> -	 */
> -	unsigned int active;
> -#define EXECLISTS_ACTIVE_USER 0
> -#define EXECLISTS_ACTIVE_PREEMPT 1
> -#define EXECLISTS_ACTIVE_HWACK 2
> -
> -	/**
> -	 * @port_mask: number of execlist ports - 1
> -	 */
> -	unsigned int port_mask;
> -
> -	/**
> -	 * @queue_priority_hint: Highest pending priority.
> -	 *
> -	 * When we add requests into the queue, or adjust the priority of
> -	 * executing requests, we compute the maximum priority of those
> -	 * pending requests. We can then use this value to determine if
> -	 * we need to preempt the executing requests to service the queue.
> -	 * However, since the we may have recorded the priority of an inflight
> -	 * request we wanted to preempt but since completed, at the time of
> -	 * dequeuing the priority hint may no longer may match the highest
> -	 * available request priority.
> -	 */
> -	int queue_priority_hint;
> -
> -	/**
> -	 * @queue: queue of requests, in priority lists
> -	 */
> -	struct rb_root_cached queue;
> -
> -	/**
> -	 * @csb_write: control register for Context Switch buffer
> -	 *
> -	 * Note this register may be either mmio or HWSP shadow.
> -	 */
> -	u32 *csb_write;
> -
> -	/**
> -	 * @csb_status: status array for Context Switch buffer
> -	 *
> -	 * Note these register may be either mmio or HWSP shadow.
> -	 */
> -	u32 *csb_status;
> -
> -	/**
> -	 * @preempt_complete_status: expected CSB upon completing preemption
> -	 */
> -	u32 preempt_complete_status;
> -
> -	/**
> -	 * @csb_head: context status buffer head
> -	 */
> -	u8 csb_head;
> -
> -	I915_SELFTEST_DECLARE(struct st_preempt_hang preempt_hang;)
> -};
> -
> -#define INTEL_ENGINE_CS_MAX_NAME 8
> -
> -struct intel_engine_cs {
> -	struct drm_i915_private *i915;
> -	char name[INTEL_ENGINE_CS_MAX_NAME];
> -
> -	enum intel_engine_id id;
> -	unsigned int hw_id;
> -	unsigned int guc_id;
> -	intel_engine_mask_t mask;
> -
> -	u8 uabi_class;
> -
> -	u8 class;
> -	u8 instance;
> -	u32 context_size;
> -	u32 mmio_base;
> -
> -	struct intel_ring *buffer;
> -
> -	struct i915_timeline timeline;
> -
> -	struct drm_i915_gem_object *default_state;
> -	void *pinned_default_state;
> -
> -	/* Rather than have every client wait upon all user interrupts,
> -	 * with the herd waking after every interrupt and each doing the
> -	 * heavyweight seqno dance, we delegate the task (of being the
> -	 * bottom-half of the user interrupt) to the first client. After
> -	 * every interrupt, we wake up one client, who does the heavyweight
> -	 * coherent seqno read and either goes back to sleep (if incomplete),
> -	 * or wakes up all the completed clients in parallel, before then
> -	 * transferring the bottom-half status to the next client in the queue.
> -	 *
> -	 * Compared to walking the entire list of waiters in a single dedicated
> -	 * bottom-half, we reduce the latency of the first waiter by avoiding
> -	 * a context switch, but incur additional coherent seqno reads when
> -	 * following the chain of request breadcrumbs. Since it is most likely
> -	 * that we have a single client waiting on each seqno, then reducing
> -	 * the overhead of waking that client is much preferred.
> -	 */
> -	struct intel_breadcrumbs {
> -		spinlock_t irq_lock;
> -		struct list_head signalers;
> -
> -		struct irq_work irq_work; /* for use from inside irq_lock */
> -
> -		unsigned int irq_enabled;
> -
> -		bool irq_armed;
> -	} breadcrumbs;
> -
> -	struct intel_engine_pmu {
> -		/**
> -		 * @enable: Bitmask of enable sample events on this engine.
> -		 *
> -		 * Bits correspond to sample event types, for instance
> -		 * I915_SAMPLE_QUEUED is bit 0 etc.
> -		 */
> -		u32 enable;
> -		/**
> -		 * @enable_count: Reference count for the enabled samplers.
> -		 *
> -		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
> -		 */
> -		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
> -		/**
> -		 * @sample: Counter values for sampling events.
> -		 *
> -		 * Our internal timer stores the current counters in this field.
> -		 *
> -		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
> -		 */
> -		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
> -	} pmu;
> -
> -	/*
> -	 * A pool of objects to use as shadow copies of client batch buffers
> -	 * when the command parser is enabled. Prevents the client from
> -	 * modifying the batch contents after software parsing.
> -	 */
> -	struct i915_gem_batch_pool batch_pool;
> -
> -	struct intel_hw_status_page status_page;
> -	struct i915_ctx_workarounds wa_ctx;
> -	struct i915_wa_list ctx_wa_list;
> -	struct i915_wa_list wa_list;
> -	struct i915_wa_list whitelist;
> -
> -	u32             irq_keep_mask; /* always keep these interrupts */
> -	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
> -	void		(*irq_enable)(struct intel_engine_cs *engine);
> -	void		(*irq_disable)(struct intel_engine_cs *engine);
> -
> -	int		(*init_hw)(struct intel_engine_cs *engine);
> -
> -	struct {
> -		void (*prepare)(struct intel_engine_cs *engine);
> -		void (*reset)(struct intel_engine_cs *engine, bool stalled);
> -		void (*finish)(struct intel_engine_cs *engine);
> -	} reset;
> -
> -	void		(*park)(struct intel_engine_cs *engine);
> -	void		(*unpark)(struct intel_engine_cs *engine);
> -
> -	void		(*set_default_submission)(struct intel_engine_cs *engine);
> -
> -	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
> -					     struct i915_gem_context *ctx);
> -
> -	int		(*request_alloc)(struct i915_request *rq);
> -	int		(*init_context)(struct i915_request *rq);
> -
> -	int		(*emit_flush)(struct i915_request *request, u32 mode);
> -#define EMIT_INVALIDATE	BIT(0)
> -#define EMIT_FLUSH	BIT(1)
> -#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
> -	int		(*emit_bb_start)(struct i915_request *rq,
> -					 u64 offset, u32 length,
> -					 unsigned int dispatch_flags);
> -#define I915_DISPATCH_SECURE BIT(0)
> -#define I915_DISPATCH_PINNED BIT(1)
> -	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
> -	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
> -						 u32 *cs);
> -	unsigned int	emit_fini_breadcrumb_dw;
> -
> -	/* Pass the request to the hardware queue (e.g. directly into
> -	 * the legacy ringbuffer or to the end of an execlist).
> -	 *
> -	 * This is called from an atomic context with irqs disabled; must
> -	 * be irq safe.
> -	 */
> -	void		(*submit_request)(struct i915_request *rq);
> -
> -	/*
> -	 * Call when the priority on a request has changed and it and its
> -	 * dependencies may need rescheduling. Note the request itself may
> -	 * not be ready to run!
> -	 */
> -	void		(*schedule)(struct i915_request *request,
> -				    const struct i915_sched_attr *attr);
> -
> -	/*
> -	 * Cancel all requests on the hardware, or queued for execution.
> -	 * This should only cancel the ready requests that have been
> -	 * submitted to the engine (via the engine->submit_request callback).
> -	 * This is called when marking the device as wedged.
> -	 */
> -	void		(*cancel_requests)(struct intel_engine_cs *engine);
> -
> -	void		(*cleanup)(struct intel_engine_cs *engine);
> -
> -	struct intel_engine_execlists execlists;
> -
> -	/* Contexts are pinned whilst they are active on the GPU. The last
> -	 * context executed remains active whilst the GPU is idle - the
> -	 * switch away and write to the context object only occurs on the
> -	 * next execution.  Contexts are only unpinned on retirement of the
> -	 * following request ensuring that we can always write to the object
> -	 * on the context switch even after idling. Across suspend, we switch
> -	 * to the kernel context and trash it as the save may not happen
> -	 * before the hardware is powered down.
> -	 */
> -	struct intel_context *last_retired_context;
> -
> -	/* status_notifier: list of callbacks for context-switch changes */
> -	struct atomic_notifier_head context_status_notifier;
> -
> -	struct intel_engine_hangcheck hangcheck;
> -
> -#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
> -#define I915_ENGINE_SUPPORTS_STATS   BIT(1)
> -#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> -#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> -	unsigned int flags;
> -
> -	/*
> -	 * Table of commands the command parser needs to know about
> -	 * for this engine.
> -	 */
> -	DECLARE_HASHTABLE(cmd_hash, I915_CMD_HASH_ORDER);
> -
> -	/*
> -	 * Table of registers allowed in commands that read/write registers.
> -	 */
> -	const struct drm_i915_reg_table *reg_tables;
> -	int reg_table_count;
> -
> -	/*
> -	 * Returns the bitmask for the length field of the specified command.
> -	 * Return 0 for an unrecognized/invalid command.
> -	 *
> -	 * If the command parser finds an entry for a command in the engine's
> -	 * cmd_tables, it gets the command's length based on the table entry.
> -	 * If not, it calls this function to determine the per-engine length
> -	 * field encoding for the command (i.e. different opcode ranges use
> -	 * certain bits to encode the command length in the header).
> -	 */
> -	u32 (*get_cmd_length_mask)(u32 cmd_header);
> -
> -	struct {
> -		/**
> -		 * @lock: Lock protecting the below fields.
> -		 */
> -		seqlock_t lock;
> -		/**
> -		 * @enabled: Reference count indicating number of listeners.
> -		 */
> -		unsigned int enabled;
> -		/**
> -		 * @active: Number of contexts currently scheduled in.
> -		 */
> -		unsigned int active;
> -		/**
> -		 * @enabled_at: Timestamp when busy stats were enabled.
> -		 */
> -		ktime_t enabled_at;
> -		/**
> -		 * @start: Timestamp of the last idle to active transition.
> -		 *
> -		 * Idle is defined as active == 0, active is active > 0.
> -		 */
> -		ktime_t start;
> -		/**
> -		 * @total: Total time this engine was busy.
> -		 *
> -		 * Accumulated time not counting the most recent block in cases
> -		 * where engine is currently busy (active > 0).
> -		 */
> -		ktime_t total;
> -	} stats;
> -};
> -
> -static inline bool
> -intel_engine_needs_cmd_parser(const struct intel_engine_cs *engine)
> -{
> -	return engine->flags & I915_ENGINE_NEEDS_CMD_PARSER;
> -}
> -
> -static inline bool
> -intel_engine_supports_stats(const struct intel_engine_cs *engine)
> -{
> -	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
> -}
> -
> -static inline bool
> -intel_engine_has_preemption(const struct intel_engine_cs *engine)
> -{
> -	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
> -}
> -
> -static inline bool
> -intel_engine_has_semaphores(const struct intel_engine_cs *engine)
> -{
> -	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> -}
> -
>   void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
>   
>   static inline bool __execlists_need_preempt(int prio, int last)
> diff --git a/drivers/gpu/drm/i915/intel_workarounds.h b/drivers/gpu/drm/i915/intel_workarounds.h
> index 7c734714b05e..a1bf51c611a9 100644
> --- a/drivers/gpu/drm/i915/intel_workarounds.h
> +++ b/drivers/gpu/drm/i915/intel_workarounds.h
> @@ -9,18 +9,7 @@
>   
>   #include <linux/slab.h>
>   
> -struct i915_wa {
> -	i915_reg_t	  reg;
> -	u32		  mask;
> -	u32		  val;
> -};
> -
> -struct i915_wa_list {
> -	const char	*name;
> -	struct i915_wa	*list;
> -	unsigned int	count;
> -	unsigned int	wa_count;
> -};
> +#include "intel_workarounds_types.h"
>   
>   static inline void intel_wa_list_free(struct i915_wa_list *wal)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_workarounds_types.h b/drivers/gpu/drm/i915/intel_workarounds_types.h
> new file mode 100644
> index 000000000000..032a0dc49275
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_workarounds_types.h
> @@ -0,0 +1,25 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2014-2018 Intel Corporation
> + */
> +
> +#ifndef __INTEL_WORKAROUNDS_TYPES_H__
> +#define __INTEL_WORKAROUNDS_TYPES_H__
> +
> +#include "i915_reg.h"
> +
> +struct i915_wa {
> +	i915_reg_t	  reg;
> +	u32		  mask;
> +	u32		  val;
> +};
> +
> +struct i915_wa_list {
> +	const char	*name;
> +	struct i915_wa	*list;
> +	unsigned int	count;
> +	unsigned int	wa_count;
> +};
> +
> +#endif /* __INTEL_WORKAROUNDS_TYPES_H__ */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines
  2019-03-05 15:54   ` Tvrtko Ursulin
@ 2019-03-05 16:26     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 16:26 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 15:54:25)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> > index 9d111eedad5a..e0807a61dcf4 100644
> > --- a/drivers/gpu/drm/i915/i915_request.c
> > +++ b/drivers/gpu/drm/i915/i915_request.c
> > @@ -1045,8 +1045,14 @@ void i915_request_add(struct i915_request *request)
> >       prev = i915_active_request_raw(&timeline->last_request,
> >                                      &request->i915->drm.struct_mutex);
> >       if (prev && !i915_request_completed(prev)) {
> > -             i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
> > -                                          &request->submitq);
> > +             if (is_power_of_2(prev->engine->mask | engine->mask))
> > +                     i915_sw_fence_await_sw_fence(&request->submit,
> > +                                                  &prev->submit,
> > +                                                  &request->submitq);
> > +             else
> > +                     __i915_sw_fence_await_dma_fence(&request->submit,
> > +                                                     &prev->fence,
> > +                                                     &request->dmaq);
> 
> Drop a comment here explaining what's happening in this if block.
> 
> The subtlety of why we need a special flavours of await helper, new 
> which use the builtin call back storage, vs using the existing ones 
> which allocate that, totally escapes me at the moment.
> 
> It's probably a good idea to put a paragraph in the commit message 
> explaining what new sw fence facility needs to be added to implement 
> this and why.

Alternatively, we could do add this fence during request_alloc and use
an actual malloc rather than rely on the embedded struct. We still have
to bypass the usual await call as that filters out awaits on the same
timeline (because we know we have this dependency). But since we always
have to to this allocation, why not keep on embedding it in the request
itself.

I'm leaning towards keeping the embedded fence for tracking the
dependency along the timeline (rather than kmalloc) as surely there will
be others later. Surely.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs
  2019-03-01 14:03 ` [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs Chris Wilson
@ 2019-03-05 16:27   ` Tvrtko Ursulin
  2019-03-05 16:45     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 16:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> If we place a pointer to the engine specific intel_context_ops in the
> engine itself, we can assign the ops pointer on initialising the
> context, and then rely on it being set. This simplifies the code in
> later patches.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c      |  1 +
>   drivers/gpu/drm/i915/intel_engine_types.h    |  1 +
>   drivers/gpu/drm/i915/intel_lrc.c             | 13 ++++++------
>   drivers/gpu/drm/i915/intel_ringbuffer.c      | 22 +++++++++-----------
>   drivers/gpu/drm/i915/selftests/mock_engine.c | 13 ++++++------
>   5 files changed, 24 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index d04fa649bc0e..04c24caf30d2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -377,6 +377,7 @@ intel_context_init(struct intel_context *ce,
>   {
>   	ce->gem_context = ctx;
>   	ce->engine = engine;
> +	ce->ops = engine->context;
>   
>   	INIT_LIST_HEAD(&ce->signal_link);
>   	INIT_LIST_HEAD(&ce->signals);
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index 5ec6e72d0ffb..546b790871ad 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -351,6 +351,7 @@ struct intel_engine_cs {
>   
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
> +	const struct intel_context_ops *context;

Calling ce_ops / context_ops / hw_context_ops ? Anything but context! :)

>   	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
>   					     struct i915_gem_context *ctx);
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index a2210f79dc67..4f6f09137662 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1381,11 +1381,6 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   	return ERR_PTR(ret);
>   }
>   
> -static const struct intel_context_ops execlists_context_ops = {
> -	.unpin = execlists_context_unpin,
> -	.destroy = execlists_context_destroy,
> -};
> -
>   static struct intel_context *
>   execlists_context_pin(struct intel_engine_cs *engine,
>   		      struct i915_gem_context *ctx)
> @@ -1399,11 +1394,14 @@ execlists_context_pin(struct intel_engine_cs *engine,
>   		return ce;
>   	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
>   
> -	ce->ops = &execlists_context_ops;
> -
>   	return __execlists_context_pin(engine, ctx, ce);
>   }
>   
> +static const struct intel_context_ops execlists_context_ops = {
> +	.unpin = execlists_context_unpin,
> +	.destroy = execlists_context_destroy,
> +};
> +
>   static int gen8_emit_init_breadcrumb(struct i915_request *rq)
>   {
>   	u32 *cs;
> @@ -2347,6 +2345,7 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->reset.reset = execlists_reset;
>   	engine->reset.finish = execlists_reset_finish;
>   
> +	engine->context = &execlists_context_ops;
>   	engine->context_pin = execlists_context_pin;
>   	engine->request_alloc = execlists_request_alloc;
>   
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 764dcc5d5856..848b68e090d5 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1348,7 +1348,7 @@ intel_ring_free(struct intel_ring *ring)
>   	kfree(ring);
>   }
>   
> -static void intel_ring_context_destroy(struct intel_context *ce)
> +static void ring_context_destroy(struct intel_context *ce)
>   {
>   	GEM_BUG_ON(ce->pin_count);
>   
> @@ -1425,7 +1425,7 @@ static void __context_unpin(struct intel_context *ce)
>   	i915_vma_unpin(vma);
>   }
>   
> -static void intel_ring_context_unpin(struct intel_context *ce)
> +static void ring_context_unpin(struct intel_context *ce)
>   {
>   	__context_unpin_ppgtt(ce->gem_context);
>   	__context_unpin(ce);
> @@ -1548,14 +1548,8 @@ __ring_context_pin(struct intel_engine_cs *engine,
>   	return ERR_PTR(err);
>   }
>   
> -static const struct intel_context_ops ring_context_ops = {
> -	.unpin = intel_ring_context_unpin,
> -	.destroy = intel_ring_context_destroy,
> -};
> -
>   static struct intel_context *
> -intel_ring_context_pin(struct intel_engine_cs *engine,
> -		       struct i915_gem_context *ctx)
> +ring_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   {
>   	struct intel_context *ce = to_intel_context(ctx, engine);
>   
> @@ -1565,11 +1559,14 @@ intel_ring_context_pin(struct intel_engine_cs *engine,
>   		return ce;
>   	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
>   
> -	ce->ops = &ring_context_ops;
> -
>   	return __ring_context_pin(engine, ctx, ce);
>   }
>   
> +static const struct intel_context_ops ring_context_ops = {
> +	.unpin = ring_context_unpin,
> +	.destroy = ring_context_destroy,
> +};
> +
>   static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   {
>   	struct i915_timeline *timeline;
> @@ -2275,7 +2272,8 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   	engine->reset.reset = reset_ring;
>   	engine->reset.finish = reset_finish;
>   
> -	engine->context_pin = intel_ring_context_pin;
> +	engine->context = &ring_context_ops;
> +	engine->context_pin = ring_context_pin;

Consolidate context pin into ops?

>   	engine->request_alloc = ring_request_alloc;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 8032a8a9542f..6c09e5162feb 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -137,11 +137,6 @@ static void mock_context_destroy(struct intel_context *ce)
>   		mock_ring_free(ce->ring);
>   }
>   
> -static const struct intel_context_ops mock_context_ops = {
> -	.unpin = mock_context_unpin,
> -	.destroy = mock_context_destroy,
> -};
> -
>   static struct intel_context *
>   mock_context_pin(struct intel_engine_cs *engine,
>   		 struct i915_gem_context *ctx)
> @@ -160,8 +155,6 @@ mock_context_pin(struct intel_engine_cs *engine,
>   
>   	mock_timeline_pin(ce->ring->timeline);
>   
> -	ce->ops = &mock_context_ops;
> -
>   	mutex_lock(&ctx->mutex);
>   	list_add(&ce->active_link, &ctx->active_engines);
>   	mutex_unlock(&ctx->mutex);
> @@ -174,6 +167,11 @@ mock_context_pin(struct intel_engine_cs *engine,
>   	return ERR_PTR(err);
>   }
>   
> +static const struct intel_context_ops mock_context_ops = {
> +	.unpin = mock_context_unpin,
> +	.destroy = mock_context_destroy,
> +};
> +
>   static int mock_request_alloc(struct i915_request *request)
>   {
>   	INIT_LIST_HEAD(&request->mock.link);
> @@ -232,6 +230,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.mask = BIT(id);
>   	engine->base.status_page.addr = (void *)(engine + 1);
>   
> +	engine->base.context = &mock_context_ops;
>   	engine->base.context_pin = mock_context_pin;
>   	engine->base.request_alloc = mock_request_alloc;
>   	engine->base.emit_flush = mock_emit_flush;
> 

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/38] drm/i915: Pass around the intel_context
  2019-03-05 16:16   ` Tvrtko Ursulin
@ 2019-03-05 16:33     ` Chris Wilson
  2019-03-05 19:23       ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 16:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 16:16:01)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > Instead of passing the gem_context and engine to find the instance of
> > the intel_context to use, pass around the intel_context instead. This is
> > useful for the next few patches, where the intel_context is no longer a
> > direct lookup.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> > @@ -1314,9 +1314,10 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
> >       regs[CTX_RING_TAIL + 1] = ring->tail;
> >   
> >       /* RPCS */
> > -     if (engine->class == RENDER_CLASS)
> > -             regs[CTX_R_PWR_CLK_STATE + 1] = gen8_make_rpcs(engine->i915,
> > -                                                            &ce->sseu);
> > +     if (engine->class == RENDER_CLASS) {
> > +             regs[CTX_R_PWR_CLK_STATE + 1] =
> > +                     gen8_make_rpcs(engine->i915, &ce->sseu);
> > +     }
> 
> Rebasing artifact I guess.

I must have slipped :-p

> > @@ -2659,7 +2660,7 @@ static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
> >   }
> >   
> >   static void execlists_init_reg_state(u32 *regs,
> > -                                  struct i915_gem_context *ctx,
> > +                                  struct intel_context *ce,
> >                                    struct intel_engine_cs *engine,
> 
> I can't remember if separate engine param is needed since now there's 
> ce->engine. Maybe it comes useful later.

We could even initialise ce->ring, or just pull that
ce->gem_context->ring_size.

I didn't have a good reason to keep engine, nor a good enough reason to
kill it -- although having ce is enough of a change I guess.

> >                                    struct intel_ring *ring)
> >   {
> > @@ -2735,24 +2736,24 @@ static void execlists_init_reg_state(u32 *regs,
> >       CTX_REG(regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(engine, 0), 0);
> >       CTX_REG(regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(engine, 0), 0);
> >   
> > -     if (i915_vm_is_48bit(&ctx->ppgtt->vm)) {
> > +     if (i915_vm_is_48bit(&ce->gem_context->ppgtt->vm)) {
> >               /* 64b PPGTT (48bit canonical)
> >                * PDP0_DESCRIPTOR contains the base address to PML4 and
> >                * other PDP Descriptors are ignored.
> >                */
> > -             ASSIGN_CTX_PML4(ctx->ppgtt, regs);
> > +             ASSIGN_CTX_PML4(ce->gem_context->ppgtt, regs);
> >       } else {
> > -             ASSIGN_CTX_PDP(ctx->ppgtt, regs, 3);
> > -             ASSIGN_CTX_PDP(ctx->ppgtt, regs, 2);
> > -             ASSIGN_CTX_PDP(ctx->ppgtt, regs, 1);
> > -             ASSIGN_CTX_PDP(ctx->ppgtt, regs, 0);
> > +             ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 3);
> > +             ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 2);
> > +             ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 1);
> > +             ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 0);
> 
> Could be worth caching the ppgtt for aesthetics.

My plan is to switch to ce->ppgtt at some point in the near future.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header
  2019-03-05 16:19   ` Tvrtko Ursulin
@ 2019-03-05 16:35     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 16:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 16:19:12)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > This complex struct pulling in half the driver deserves its own
> > isolation in preparation for intel_context becoming an outright
> > complicated class of its own.
> > 
> > In order to split this beast into its own header also requests splitting
> > several of its dependent types and their dependencies into their own
> > headers as well.
> 
> I don't feel like I need to read this one in detail. If it compiles it 
> should be good.

Hmm, one thing I started doing for the next series is

# Test the headers are compilable as standalone units
i915-$(CONFIG_DRM_I915_WERROR) += \
          gem/test_i915_gemfs_standalone.o \
          gem/test_i915_gem_clflush_standalone.o \
          gem/test_i915_gem_context_standalone.o \
          gem/test_i915_gem_context_types_standalone.o \
          gem/test_i915_gem_ioctls_standalone.o \
          gem/test_i915_gem_object_standalone.o \
          gem/test_i915_gem_object_types_standalone.o

which would be good here as it helps with those fuzzy feelings that the
compiler is doing all the checking.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs
  2019-03-05 16:27   ` Tvrtko Ursulin
@ 2019-03-05 16:45     ` Chris Wilson
  2019-03-05 18:27       ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 16:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 16:27:34)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> > index 5ec6e72d0ffb..546b790871ad 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> > @@ -351,6 +351,7 @@ struct intel_engine_cs {
> >   
> >       void            (*set_default_submission)(struct intel_engine_cs *engine);
> >   
> > +     const struct intel_context_ops *context;
> 
> Calling ce_ops / context_ops / hw_context_ops ? Anything but context! :)

cops.

> > -     engine->context_pin = intel_ring_context_pin;
> > +     engine->context = &ring_context_ops;
> > +     engine->context_pin = ring_context_pin;
> 
> Consolidate context pin into ops?

In the next-but-one patch you get your wish.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 29/38] drm/i915: Move over to intel_context_lookup()
  2019-03-01 14:03 ` [PATCH 29/38] drm/i915: Move over to intel_context_lookup() Chris Wilson
@ 2019-03-05 17:01   ` Tvrtko Ursulin
  2019-03-05 17:10     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 17:01 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> In preparation for an ever growing number of engines and so ever
> increasing static array of HW contexts within the GEM context, move the
> array over to an rbtree, allocated upon first use.
> 
> Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
> but we should be able to mitigate those by moving over to using the HW
> context as our primary type and so only incur the lookup on the boundary
> with the user GEM context and engines.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   drivers/gpu/drm/i915/gvt/mmio_context.c       |   3 +-
>   drivers/gpu/drm/i915/i915_gem.c               |   9 +-
>   drivers/gpu/drm/i915/i915_gem_context.c       |  78 ++++------
>   drivers/gpu/drm/i915/i915_gem_context.h       |   1 -
>   drivers/gpu/drm/i915/i915_gem_context_types.h |   7 +-
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c    |   4 +-
>   drivers/gpu/drm/i915/i915_perf.c              |   4 +-
>   drivers/gpu/drm/i915/intel_context.c          | 139 ++++++++++++++++++
>   drivers/gpu/drm/i915/intel_context.h          |  37 ++++-
>   drivers/gpu/drm/i915/intel_context_types.h    |   2 +
>   drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
>   drivers/gpu/drm/i915/intel_engine_types.h     |   5 +
>   drivers/gpu/drm/i915/intel_guc_ads.c          |   4 +-
>   drivers/gpu/drm/i915/intel_guc_submission.c   |   4 +-
>   drivers/gpu/drm/i915/intel_lrc.c              |  29 ++--
>   drivers/gpu/drm/i915/intel_ringbuffer.c       |  23 ++-
>   drivers/gpu/drm/i915/selftests/mock_context.c |   7 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |   6 +-
>   19 files changed, 271 insertions(+), 94 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/intel_context.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 89105b1aaf12..d7292b349c0d 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -86,6 +86,7 @@ i915-y += \
>   	  i915_trace_points.o \
>   	  i915_vma.o \
>   	  intel_breadcrumbs.o \
> +	  intel_context.o \
>   	  intel_engine_cs.o \
>   	  intel_hangcheck.o \
>   	  intel_lrc.o \
> diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
> index 7d84cfb9051a..442a74805129 100644
> --- a/drivers/gpu/drm/i915/gvt/mmio_context.c
> +++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
> @@ -492,7 +492,8 @@ static void switch_mmio(struct intel_vgpu *pre,
>   			 * itself.
>   			 */
>   			if (mmio->in_context &&
> -			    !is_inhibit_context(&s->shadow_ctx->__engine[ring_id]))
> +			    !is_inhibit_context(intel_context_lookup(s->shadow_ctx,
> +								     dev_priv->engine[ring_id])))
>   				continue;
>   
>   			if (mmio->mask)
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 90bb951b8c99..c6a25c3276ee 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4694,15 +4694,20 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
>   	}
>   
>   	for_each_engine(engine, i915, id) {
> +		struct intel_context *ce;
>   		struct i915_vma *state;
>   		void *vaddr;
>   
> -		GEM_BUG_ON(to_intel_context(ctx, engine)->pin_count);
> +		ce = intel_context_lookup(ctx, engine);
> +		if (!ce)
> +			continue;
>   
> -		state = to_intel_context(ctx, engine)->state;
> +		state = ce->state;
>   		if (!state)
>   			continue;
>   
> +		GEM_BUG_ON(ce->pin_count);
> +
>   		/*
>   		 * As we will hold a reference to the logical state, it will
>   		 * not be torn down with the context, and importantly the
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 04c24caf30d2..60b17f6a727d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -238,7 +238,7 @@ static void release_hw_id(struct i915_gem_context *ctx)
>   
>   static void i915_gem_context_free(struct i915_gem_context *ctx)
>   {
> -	unsigned int n;
> +	struct intel_context *it, *n;
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
> @@ -249,12 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   
>   	kfree(ctx->engines);
>   
> -	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
> -		struct intel_context *ce = &ctx->__engine[n];
> -
> -		if (ce->ops)
> -			ce->ops->destroy(ce);
> -	}

/*
  * No need to take lock since no one can access the context at this
  * point.
  */

?

> +	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
> +		it->ops->destroy(it);
>   
>   	if (ctx->timeline)
>   		i915_timeline_put(ctx->timeline);
> @@ -361,40 +357,11 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
>   	return desc;
>   }
>   
> -static void intel_context_retire(struct i915_active_request *active,
> -				 struct i915_request *rq)
> -{
> -	struct intel_context *ce =
> -		container_of(active, typeof(*ce), active_tracker);
> -
> -	intel_context_unpin(ce);
> -}
> -
> -void
> -intel_context_init(struct intel_context *ce,
> -		   struct i915_gem_context *ctx,
> -		   struct intel_engine_cs *engine)
> -{
> -	ce->gem_context = ctx;
> -	ce->engine = engine;
> -	ce->ops = engine->context;
> -
> -	INIT_LIST_HEAD(&ce->signal_link);
> -	INIT_LIST_HEAD(&ce->signals);
> -
> -	/* Use the whole device by default */
> -	ce->sseu = intel_device_default_sseu(ctx->i915);
> -
> -	i915_active_request_init(&ce->active_tracker,
> -				 NULL, intel_context_retire);
> -}
> -
>   static struct i915_gem_context *
>   __create_hw_context(struct drm_i915_private *dev_priv,
>   		    struct drm_i915_file_private *file_priv)
>   {
>   	struct i915_gem_context *ctx;
> -	unsigned int n;
>   	int ret;
>   	int i;
>   
> @@ -409,8 +376,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   	INIT_LIST_HEAD(&ctx->active_engines);
>   	mutex_init(&ctx->mutex);
>   
> -	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
> -		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
> +	ctx->hw_contexts = RB_ROOT;
> +	spin_lock_init(&ctx->hw_contexts_lock);
>   
>   	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>   	INIT_LIST_HEAD(&ctx->handles_list);
> @@ -959,8 +926,6 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
>   		struct intel_ring *ring;
>   		struct i915_request *rq;
>   
> -		GEM_BUG_ON(!to_intel_context(i915->kernel_context, engine));
> -
>   		rq = i915_request_alloc(engine, i915->kernel_context);
>   		if (IS_ERR(rq))
>   			return PTR_ERR(rq);
> @@ -1195,9 +1160,13 @@ __i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>   				    struct intel_engine_cs *engine,
>   				    struct intel_sseu sseu)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> +	struct intel_context *ce;
>   	int ret = 0;
>   
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return PTR_ERR(ce);
> +
>   	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
>   	GEM_BUG_ON(engine->id != RCS);
>   
> @@ -1631,8 +1600,8 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
>   	return ctx_setparam(data, &local.setparam);
>   }
>   
> -static void clone_sseu(struct i915_gem_context *dst,
> -		       struct i915_gem_context *src)
> +static int clone_sseu(struct i915_gem_context *dst,
> +		      struct i915_gem_context *src)
>   {
>   	const struct intel_sseu default_sseu =
>   		intel_device_default_sseu(dst->i915);
> @@ -1643,7 +1612,7 @@ static void clone_sseu(struct i915_gem_context *dst,
>   		struct intel_context *ce;
>   		struct intel_sseu sseu;
>   
> -		ce = to_intel_context(src, engine);
> +		ce = intel_context_lookup(src, engine);
>   		if (!ce)
>   			continue;
>   
> @@ -1651,9 +1620,14 @@ static void clone_sseu(struct i915_gem_context *dst,
>   		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
>   			continue;
>   
> -		ce = to_intel_context(dst, engine);
> +		ce = intel_context_instance(dst, engine);
> +		if (IS_ERR(ce))
> +			return PTR_ERR(ce);
> +
>   		ce->sseu = sseu;
>   	}
> +
> +	return 0;
>   }
>   
>   static int create_clone(struct i915_user_extension __user *ext, void *data)
> @@ -1661,6 +1635,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
>   	struct drm_i915_gem_context_create_ext_clone local;
>   	struct i915_gem_context *dst = data;
>   	struct i915_gem_context *src;
> +	int err;
>   
>   	if (copy_from_user(&local, ext, sizeof(local)))
>   		return -EFAULT;
> @@ -1683,8 +1658,11 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
>   	if (local.flags & I915_CONTEXT_CLONE_SCHED)
>   		dst->sched = src->sched;
>   
> -	if (local.flags & I915_CONTEXT_CLONE_SSEU)
> -		clone_sseu(dst, src);
> +	if (local.flags & I915_CONTEXT_CLONE_SSEU) {
> +		err = clone_sseu(dst, src);
> +		if (err)
> +			return err;
> +	}
>   
>   	if (local.flags & I915_CONTEXT_CLONE_TIMELINE && src->timeline) {
>   		if (dst->timeline)
> @@ -1839,13 +1817,15 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	if (!engine)
>   		return -EINVAL;
>   
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return PTR_ERR(ce);

Interesting! Nevermind..

> +
>   	/* Only use for mutex here is to serialize get_param and set_param. */
>   	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
>   	if (ret)
>   		return ret;
>   
> -	ce = to_intel_context(ctx, engine);
> -
>   	user_sseu.slice_mask = ce->sseu.slice_mask;
>   	user_sseu.subslice_mask = ce->sseu.subslice_mask;
>   	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 110d5881c9de..5b0e423edf86 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -31,7 +31,6 @@
>   #include "i915_scheduler.h"
>   #include "intel_context.h"
>   #include "intel_device_info.h"
> -#include "intel_ringbuffer.h"
>   
>   struct drm_device;
>   struct drm_file;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
> index 8baa7a5e7fdb..b1df720710f5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
> @@ -13,10 +13,10 @@
>   #include <linux/kref.h>
>   #include <linux/mutex.h>
>   #include <linux/radix-tree.h>
> +#include <linux/rbtree.h>
>   #include <linux/rcupdate.h>
>   #include <linux/types.h>
>   
> -#include "i915_gem.h" /* I915_NUM_ENGINES */
>   #include "i915_scheduler.h"
>   #include "intel_context_types.h"
>   
> @@ -146,8 +146,9 @@ struct i915_gem_context {
>   
>   	struct i915_sched_attr sched;
>   
> -	/** engine: per-engine logical HW state */
> -	struct intel_context __engine[I915_NUM_ENGINES];
> +	/** hw_contexts: per-engine logical HW state */
> +	struct rb_root hw_contexts;
> +	spinlock_t hw_contexts_lock;
>   
>   	/** ring_size: size for allocating the per-engine ring buffer */
>   	u32 ring_size;
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index ff5810a932a8..5ea2e1c8b927 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -794,8 +794,8 @@ static int eb_wait_for_ring(const struct i915_execbuffer *eb)
>   	 * keeping all of their resources pinned.
>   	 */
>   
> -	ce = to_intel_context(eb->ctx, eb->engine);
> -	if (!ce->ring) /* first use, assume empty! */
> +	ce = intel_context_lookup(eb->ctx, eb->engine);
> +	if (!ce || !ce->ring) /* first use, assume empty! */
>   		return 0;
>   
>   	rq = __eb_wait_for_ring(ce->ring);
> diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
> index f969a0512465..ecca231ca83a 100644
> --- a/drivers/gpu/drm/i915/i915_perf.c
> +++ b/drivers/gpu/drm/i915/i915_perf.c
> @@ -1740,11 +1740,11 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
>   
>   	/* Update all contexts now that we've stalled the submission. */
>   	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
> -		struct intel_context *ce = to_intel_context(ctx, engine);
> +		struct intel_context *ce = intel_context_lookup(ctx, engine);
>   		u32 *regs;
>   
>   		/* OA settings will be set upon first use */
> -		if (!ce->state)
> +		if (!ce || !ce->state)
>   			continue;
>   
>   		regs = i915_gem_object_pin_map(ce->state->obj, map_type);
> diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
> new file mode 100644
> index 000000000000..242b1b6ad253
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/intel_context.c
> @@ -0,0 +1,139 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#include "i915_drv.h"
> +#include "i915_gem_context.h"
> +#include "intel_context.h"
> +#include "intel_ringbuffer.h"
> +
> +struct intel_context *
> +intel_context_lookup(struct i915_gem_context *ctx,
> +		     struct intel_engine_cs *engine)
> +{
> +	struct intel_context *ce = NULL;
> +	struct rb_node *p;
> +
> +	spin_lock(&ctx->hw_contexts_lock);
> +	p = ctx->hw_contexts.rb_node;
> +	while (p) {
> +		struct intel_context *this =
> +			rb_entry(p, struct intel_context, node);
> +
> +		if (this->engine == engine) {
> +			GEM_BUG_ON(this->gem_context != ctx);
> +			ce = this;
> +			break;
> +		}
> +
> +		if (this->engine < engine)
> +			p = p->rb_right;
> +		else
> +			p = p->rb_left;
> +	}
> +	spin_unlock(&ctx->hw_contexts_lock);
> +
> +	return ce;
> +}
> +
> +struct intel_context *
> +__intel_context_insert(struct i915_gem_context *ctx,
> +		       struct intel_engine_cs *engine,
> +		       struct intel_context *ce)
> +{
> +	struct rb_node **p, *parent;
> +	int err = 0;
> +
> +	spin_lock(&ctx->hw_contexts_lock);
> +
> +	parent = NULL;
> +	p = &ctx->hw_contexts.rb_node;
> +	while (*p) {
> +		struct intel_context *this;
> +
> +		parent = *p;
> +		this = rb_entry(parent, struct intel_context, node);
> +
> +		if (this->engine == engine) {
> +			err = -EEXIST;
> +			ce = this;
> +			break;
> +		}
> +
> +		if (this->engine < engine)
> +			p = &parent->rb_right;
> +		else
> +			p = &parent->rb_left;
> +	}
> +	if (!err) {
> +		rb_link_node(&ce->node, parent, p);
> +		rb_insert_color(&ce->node, &ctx->hw_contexts);
> +	}
> +
> +	spin_unlock(&ctx->hw_contexts_lock);
> +
> +	return ce;
> +}
> +
> +void __intel_context_remove(struct intel_context *ce)
> +{
> +	struct i915_gem_context *ctx = ce->gem_context;
> +
> +	spin_lock(&ctx->hw_contexts_lock);
> +	rb_erase(&ce->node, &ctx->hw_contexts);
> +	spin_unlock(&ctx->hw_contexts_lock);
> +}

Gets used in a later patch?

> +
> +struct intel_context *
> +intel_context_instance(struct i915_gem_context *ctx,
> +		       struct intel_engine_cs *engine)
> +{
> +	struct intel_context *ce, *pos;
> +
> +	ce = intel_context_lookup(ctx, engine);
> +	if (likely(ce))
> +		return ce;
> +
> +	ce = kzalloc(sizeof(*ce), GFP_KERNEL);

At some point you'll move this to global slab?

> +	if (!ce)
> +		return ERR_PTR(-ENOMEM);
> +
> +	intel_context_init(ce, ctx, engine);
> +
> +	pos = __intel_context_insert(ctx, engine, ce);
> +	if (unlikely(pos != ce)) /* Beaten! Use their HW context instead */
> +		kfree(ce);
> +
> +	GEM_BUG_ON(intel_context_lookup(ctx, engine) != pos);
> +	return pos;
> +}
> +
> +static void intel_context_retire(struct i915_active_request *active,
> +				 struct i915_request *rq)
> +{
> +	struct intel_context *ce =
> +		container_of(active, typeof(*ce), active_tracker);
> +
> +	intel_context_unpin(ce);
> +}
> +
> +void
> +intel_context_init(struct intel_context *ce,
> +		   struct i915_gem_context *ctx,
> +		   struct intel_engine_cs *engine)
> +{
> +	ce->gem_context = ctx;
> +	ce->engine = engine;
> +	ce->ops = engine->context;
> +
> +	INIT_LIST_HEAD(&ce->signal_link);
> +	INIT_LIST_HEAD(&ce->signals);
> +
> +	/* Use the whole device by default */
> +	ce->sseu = intel_device_default_sseu(ctx->i915);
> +
> +	i915_active_request_init(&ce->active_tracker,
> +				 NULL, intel_context_retire);
> +}
> diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
> index dd947692bb0b..c3fffd9b8ae4 100644
> --- a/drivers/gpu/drm/i915/intel_context.h
> +++ b/drivers/gpu/drm/i915/intel_context.h
> @@ -7,7 +7,6 @@
>   #ifndef __INTEL_CONTEXT_H__
>   #define __INTEL_CONTEXT_H__
>   
> -#include "i915_gem_context_types.h"
>   #include "intel_context_types.h"
>   #include "intel_engine_types.h"
>   
> @@ -15,12 +14,36 @@ void intel_context_init(struct intel_context *ce,
>   			struct i915_gem_context *ctx,
>   			struct intel_engine_cs *engine);
>   
> -static inline struct intel_context *
> -to_intel_context(struct i915_gem_context *ctx,
> -		 const struct intel_engine_cs *engine)
> -{
> -	return &ctx->__engine[engine->id];
> -}
> +/**
> + * intel_context_lookup - Find the matching HW context for this (ctx, engine)
> + * @ctx - the parent GEM context
> + * @engine - the target HW engine
> + *
> + * May return NULL if the HW context hasn't been instantiated (i.e. unused).
> + */
> +struct intel_context *
> +intel_context_lookup(struct i915_gem_context *ctx,
> +		     struct intel_engine_cs *engine);
> +
> +/**
> + * intel_context_instance - Lookup or allocate the HW context for (ctx, engine)
> + * @ctx - the parent GEM context
> + * @engine - the target HW engine
> + *
> + * Returns the existing HW context for this pair of (GEM context, engine), or
> + * allocates and initialises a fresh context. Once allocated, the HW context
> + * remains resident until the GEM context is destroyed.
> + */
> +struct intel_context *
> +intel_context_instance(struct i915_gem_context *ctx,
> +		       struct intel_engine_cs *engine);
> +
> +struct intel_context *
> +__intel_context_insert(struct i915_gem_context *ctx,
> +		       struct intel_engine_cs *engine,
> +		       struct intel_context *ce);
> +void
> +__intel_context_remove(struct intel_context *ce);
>   
>   static inline struct intel_context *
>   intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
> diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
> index 16e1306e9595..857f5c335324 100644
> --- a/drivers/gpu/drm/i915/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/intel_context_types.h
> @@ -8,6 +8,7 @@
>   #define __INTEL_CONTEXT_TYPES__
>   
>   #include <linux/list.h>
> +#include <linux/rbtree.h>
>   #include <linux/types.h>
>   
>   #include "i915_active_types.h"
> @@ -52,6 +53,7 @@ struct intel_context {
>   	struct i915_active_request active_tracker;
>   
>   	const struct intel_context_ops *ops;
> +	struct rb_node node;
>   
>   	/** sseu: Control eu/slice partitioning */
>   	struct intel_sseu sseu;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 2559dcc25a6a..112301560745 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -644,7 +644,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
>   static void __intel_context_unpin(struct i915_gem_context *ctx,
>   				  struct intel_engine_cs *engine)
>   {
> -	intel_context_unpin(to_intel_context(ctx, engine));
> +	intel_context_unpin(intel_context_lookup(ctx, engine));
>   }
>   
>   struct measure_breadcrumb {
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index 546b790871ad..5dac6b439f95 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -231,6 +231,11 @@ struct intel_engine_execlists {
>   	 */
>   	u32 *csb_status;
>   
> +	/**
> +	 * @preempt_context: the HW context for injecting preempt-to-idle
> +	 */
> +	struct intel_context *preempt_context;
> +
>   	/**
>   	 * @preempt_complete_status: expected CSB upon completing preemption
>   	 */
> diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
> index f0db62887f50..da220561ac41 100644
> --- a/drivers/gpu/drm/i915/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/intel_guc_ads.c
> @@ -121,8 +121,8 @@ int intel_guc_ads_create(struct intel_guc *guc)
>   	 * to find it. Note that we have to skip our header (1 page),
>   	 * because our GuC shared data is there.
>   	 */
> -	kernel_ctx_vma = to_intel_context(dev_priv->kernel_context,
> -					  dev_priv->engine[RCS])->state;
> +	kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
> +					      dev_priv->engine[RCS])->state;
>   	blob->ads.golden_context_lrca =
>   		intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;

Scary moment but kernel context is perma pinned, ok.

>   
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 119d3326bb5e..4935e0555135 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -382,7 +382,7 @@ static void guc_stage_desc_init(struct intel_guc_client *client)
>   	desc->db_id = client->doorbell_id;
>   
>   	for_each_engine_masked(engine, dev_priv, client->engines, tmp) {
> -		struct intel_context *ce = to_intel_context(ctx, engine);
> +		struct intel_context *ce = intel_context_lookup(ctx, engine);

This one seems to be handling !ce->state a bit below. So it feels you 
have to add the !ce test as well.

>   		u32 guc_engine_id = engine->guc_id;
>   		struct guc_execlist_context *lrc = &desc->lrc[guc_engine_id];
>   
> @@ -567,7 +567,7 @@ static void inject_preempt_context(struct work_struct *work)
>   					     preempt_work[engine->id]);
>   	struct intel_guc_client *client = guc->preempt_client;
>   	struct guc_stage_desc *stage_desc = __get_stage_desc(client);
> -	struct intel_context *ce = to_intel_context(client->owner, engine);
> +	struct intel_context *ce = intel_context_lookup(client->owner, engine);
>   	u32 data[7];
>   
>   	if (!ce->ring->emit) { /* recreate upon load/resume */
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 4f6f09137662..0800f8edffeb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -622,8 +622,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
>   static void inject_preempt_context(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists *execlists = &engine->execlists;
> -	struct intel_context *ce =
> -		to_intel_context(engine->i915->preempt_context, engine);
> +	struct intel_context *ce = execlists->preempt_context;
>   	unsigned int n;
>   
>   	GEM_BUG_ON(execlists->preempt_complete_status !=
> @@ -1231,19 +1230,24 @@ static void execlists_submit_request(struct i915_request *request)
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
> -static void execlists_context_destroy(struct intel_context *ce)
> +static void __execlists_context_fini(struct intel_context *ce)
>   {
> -	GEM_BUG_ON(ce->pin_count);
> -
> -	if (!ce->state)
> -		return;
> -
>   	intel_ring_free(ce->ring);
>   
>   	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
>   	i915_gem_object_put(ce->state->obj);
>   }
>   
> +static void execlists_context_destroy(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(ce->pin_count);
> +
> +	if (ce->state)
> +		__execlists_context_fini(ce);
> +
> +	kfree(ce);
> +}
> +
>   static void execlists_context_unpin(struct intel_context *ce)
>   {
>   	struct intel_engine_cs *engine;
> @@ -1385,7 +1389,11 @@ static struct intel_context *
>   execlists_context_pin(struct intel_engine_cs *engine,
>   		      struct i915_gem_context *ctx)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> +	struct intel_context *ce;
> +
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return ce;
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!ctx->ppgtt);
> @@ -2436,8 +2444,9 @@ static int logical_ring_init(struct intel_engine_cs *engine)
>   	execlists->preempt_complete_status = ~0u;
>   	if (i915->preempt_context) {
>   		struct intel_context *ce =
> -			to_intel_context(i915->preempt_context, engine);
> +			intel_context_lookup(i915->preempt_context, engine);
>   
> +		execlists->preempt_context = ce;
>   		execlists->preempt_complete_status =
>   			upper_32_bits(ce->lrc_desc);
>   	}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 848b68e090d5..9002f7f6d32e 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1348,15 +1348,20 @@ intel_ring_free(struct intel_ring *ring)
>   	kfree(ring);
>   }
>   
> +static void __ring_context_fini(struct intel_context *ce)
> +{
> +	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
> +	i915_gem_object_put(ce->state->obj);
> +}
> +
>   static void ring_context_destroy(struct intel_context *ce)
>   {
>   	GEM_BUG_ON(ce->pin_count);
>   
> -	if (!ce->state)
> -		return;
> +	if (ce->state)
> +		__ring_context_fini(ce);
>   
> -	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
> -	i915_gem_object_put(ce->state->obj);
> +	kfree(ce);
>   }
>   
>   static int __context_pin_ppgtt(struct i915_gem_context *ctx)
> @@ -1551,7 +1556,11 @@ __ring_context_pin(struct intel_engine_cs *engine,
>   static struct intel_context *
>   ring_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> +	struct intel_context *ce;
> +
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return ce;
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
>   
> @@ -1753,8 +1762,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   		 * placeholder we use to flush other contexts.
>   		 */
>   		*cs++ = MI_SET_CONTEXT;
> -		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
> -							  engine)->state) |
> +		*cs++ = i915_ggtt_offset(intel_context_lookup(i915->kernel_context,
> +							      engine)->state) |
>   			MI_MM_SPACE_GTT |
>   			MI_RESTORE_INHIBIT;
>   	}
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 5d0ff2293abc..1d6dc2fe36ab 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -30,7 +30,6 @@ mock_context(struct drm_i915_private *i915,
>   	     const char *name)
>   {
>   	struct i915_gem_context *ctx;
> -	unsigned int n;
>   	int ret;
>   
>   	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> @@ -41,15 +40,15 @@ mock_context(struct drm_i915_private *i915,
>   	INIT_LIST_HEAD(&ctx->link);
>   	ctx->i915 = i915;
>   
> +	ctx->hw_contexts = RB_ROOT;
> +	spin_lock_init(&ctx->hw_contexts_lock);
> +
>   	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>   	INIT_LIST_HEAD(&ctx->handles_list);
>   	INIT_LIST_HEAD(&ctx->hw_id_link);
>   	INIT_LIST_HEAD(&ctx->active_engines);
>   	mutex_init(&ctx->mutex);
>   
> -	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
> -		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
> -
>   	ret = i915_gem_context_pin_hw_id(ctx);
>   	if (ret < 0)
>   		goto err_handles;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 6c09e5162feb..f1a80006289d 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -141,9 +141,13 @@ static struct intel_context *
>   mock_context_pin(struct intel_engine_cs *engine,
>   		 struct i915_gem_context *ctx)
>   {
> -	struct intel_context *ce = to_intel_context(ctx, engine);
> +	struct intel_context *ce;
>   	int err = -ENOMEM;
>   
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return ce;
> +
>   	if (ce->pin_count++)
>   		return ce;
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 29/38] drm/i915: Move over to intel_context_lookup()
  2019-03-05 17:01   ` Tvrtko Ursulin
@ 2019-03-05 17:10     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 17:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 17:01:09)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > @@ -249,12 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
> >   
> >       kfree(ctx->engines);
> >   
> > -     for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
> > -             struct intel_context *ce = &ctx->__engine[n];
> > -
> > -             if (ce->ops)
> > -                     ce->ops->destroy(ce);
> > -     }
> 
> /*
>   * No need to take lock since no one can access the context at this
>   * point.
>   */
> 
> ?

I hope that's self-evident from being the free function. Hmm, perhaps
not if we move to a HW context centric viewpoint. Maybe worth revisiting
in the future when GEM context withers away.

> > @@ -1839,13 +1817,15 @@ static int get_sseu(struct i915_gem_context *ctx,
> >       if (!engine)
> >               return -EINVAL;
> >   
> > +     ce = intel_context_instance(ctx, engine);
> > +     if (IS_ERR(ce))
> > +             return PTR_ERR(ce);
> 
> Interesting! Nevermind..

Wait for the next patch, that one is just for you :)

> > +void __intel_context_remove(struct intel_context *ce)
> > +{
> > +     struct i915_gem_context *ctx = ce->gem_context;
> > +
> > +     spin_lock(&ctx->hw_contexts_lock);
> > +     rb_erase(&ce->node, &ctx->hw_contexts);
> > +     spin_unlock(&ctx->hw_contexts_lock);
> > +}
> 
> Gets used in a later patch?

For the embedded context within virtual_engine.

> > +struct intel_context *
> > +intel_context_instance(struct i915_gem_context *ctx,
> > +                    struct intel_engine_cs *engine)
> > +{
> > +     struct intel_context *ce, *pos;
> > +
> > +     ce = intel_context_lookup(ctx, engine);
> > +     if (likely(ce))
> > +             return ce;
> > +
> > +     ce = kzalloc(sizeof(*ce), GFP_KERNEL);
> 
> At some point you'll move this to global slab?

Can do so now, I didn't think these would be frequent enough to justify
a slab. We expect a few hundred on a system; whereas we expect
thousands, tens of thousands of requests and objects.

Still slabs have secondary benefits of debugging allocation patterns.

> > diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
> > index f0db62887f50..da220561ac41 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_ads.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_ads.c
> > @@ -121,8 +121,8 @@ int intel_guc_ads_create(struct intel_guc *guc)
> >        * to find it. Note that we have to skip our header (1 page),
> >        * because our GuC shared data is there.
> >        */
> > -     kernel_ctx_vma = to_intel_context(dev_priv->kernel_context,
> > -                                       dev_priv->engine[RCS])->state;
> > +     kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
> > +                                           dev_priv->engine[RCS])->state;
> >       blob->ads.golden_context_lrca =
> >               intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;
> 
> Scary moment but kernel context is perma pinned, ok.

Scary moment, but its guc.

> > diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> > index 119d3326bb5e..4935e0555135 100644
> > --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> > +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> > @@ -382,7 +382,7 @@ static void guc_stage_desc_init(struct intel_guc_client *client)
> >       desc->db_id = client->doorbell_id;
> >   
> >       for_each_engine_masked(engine, dev_priv, client->engines, tmp) {
> > -             struct intel_context *ce = to_intel_context(ctx, engine);
> > +             struct intel_context *ce = intel_context_lookup(ctx, engine);
> 
> This one seems to be handling !ce->state a bit below. So it feels you 
> have to add the !ce test as well.

I'm not allowed to explode here, shame.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops
  2019-03-01 14:03 ` [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops Chris Wilson
@ 2019-03-05 17:31   ` Tvrtko Ursulin
  2019-03-05 18:00     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 17:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> Push the intel_context pin callback down from intel_engine_cs onto the
> context itself by virtue of having a central caller for
> intel_context_pin() being able to lookup the intel_context itself.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_context.c         |  34 ++++++
>   drivers/gpu/drm/i915/intel_context.h         |   7 +-
>   drivers/gpu/drm/i915/intel_context_types.h   |   1 +
>   drivers/gpu/drm/i915/intel_engine_types.h    |   2 -
>   drivers/gpu/drm/i915/intel_lrc.c             | 114 ++++++++-----------
>   drivers/gpu/drm/i915/intel_ringbuffer.c      |  45 ++------
>   drivers/gpu/drm/i915/selftests/mock_engine.c |  32 +-----
>   7 files changed, 98 insertions(+), 137 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
> index 242b1b6ad253..7de02be0b552 100644
> --- a/drivers/gpu/drm/i915/intel_context.c
> +++ b/drivers/gpu/drm/i915/intel_context.c
> @@ -110,6 +110,40 @@ intel_context_instance(struct i915_gem_context *ctx,
>   	return pos;
>   }
>   
> +struct intel_context *
> +intel_context_pin(struct i915_gem_context *ctx,
> +		  struct intel_engine_cs *engine)
> +{
> +	struct intel_context *ce;
> +	int err;
> +
> +	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> +
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return ce;
> +
> +	if (unlikely(!ce->pin_count++)) {
> +		err = ce->ops->pin(ce);
> +		if (err)
> +			goto err_unpin;
> +
> +		mutex_lock(&ctx->mutex);
> +		list_add(&ce->active_link, &ctx->active_engines);
> +		mutex_unlock(&ctx->mutex);
> +
> +		i915_gem_context_get(ctx);
> +		GEM_BUG_ON(ce->gem_context != ctx);
> +	}
> +	GEM_BUG_ON(!ce->pin_count); /* no overflow! */
> +
> +	return ce;
> +
> +err_unpin:
> +	ce->pin_count = 0;
> +	return ERR_PTR(err);
> +}
> +
>   static void intel_context_retire(struct i915_active_request *active,
>   				 struct i915_request *rq)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
> index c3fffd9b8ae4..aa34311a472e 100644
> --- a/drivers/gpu/drm/i915/intel_context.h
> +++ b/drivers/gpu/drm/i915/intel_context.h
> @@ -45,11 +45,8 @@ __intel_context_insert(struct i915_gem_context *ctx,
>   void
>   __intel_context_remove(struct intel_context *ce);
>   
> -static inline struct intel_context *
> -intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
> -{
> -	return engine->context_pin(engine, ctx);
> -}
> +struct intel_context *
> +intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine);
>   
>   static inline void __intel_context_pin(struct intel_context *ce)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
> index 857f5c335324..804fa93de853 100644
> --- a/drivers/gpu/drm/i915/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/intel_context_types.h
> @@ -19,6 +19,7 @@ struct intel_context;
>   struct intel_ring;
>   
>   struct intel_context_ops {
> +	int (*pin)(struct intel_context *ce);
>   	void (*unpin)(struct intel_context *ce);
>   	void (*destroy)(struct intel_context *ce);
>   };
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index 5dac6b439f95..aefc66605729 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -357,8 +357,6 @@ struct intel_engine_cs {
>   	void		(*set_default_submission)(struct intel_engine_cs *engine);
>   
>   	const struct intel_context_ops *context;
> -	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
> -					     struct i915_gem_context *ctx);
>   
>   	int		(*request_alloc)(struct i915_request *rq);
>   	int		(*init_context)(struct i915_request *rq);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0800f8edffeb..8dddea8e1c97 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -166,9 +166,8 @@
>   
>   #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
>   
> -static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> -					    struct intel_engine_cs *engine,
> -					    struct intel_context *ce);
> +static int execlists_context_deferred_alloc(struct intel_context *ce,
> +					    struct intel_engine_cs *engine);
>   static void execlists_init_reg_state(u32 *reg_state,
>   				     struct intel_context *ce,
>   				     struct intel_engine_cs *engine,
> @@ -330,11 +329,10 @@ assert_priority_queue(const struct i915_request *prev,
>    * engine info, SW context ID and SW counter need to form a unique number
>    * (Context ID) per lrc.
>    */
> -static void
> -intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
> -				   struct intel_engine_cs *engine,
> -				   struct intel_context *ce)
> +static u64
> +lrc_descriptor(struct intel_context *ce, struct intel_engine_cs *engine)
>   {
> +	struct i915_gem_context *ctx = ce->gem_context;
>   	u64 desc;
>   
>   	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (BIT(GEN8_CTX_ID_WIDTH)));
> @@ -352,7 +350,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
>   	 * Consider updating oa_get_render_ctx_id in i915_perf.c when changing
>   	 * anything below.
>   	 */
> -	if (INTEL_GEN(ctx->i915) >= 11) {
> +	if (INTEL_GEN(engine->i915) >= 11) {
>   		GEM_BUG_ON(ctx->hw_id >= BIT(GEN11_SW_CTX_ID_WIDTH));
>   		desc |= (u64)ctx->hw_id << GEN11_SW_CTX_ID_SHIFT;
>   								/* bits 37-47 */
> @@ -369,7 +367,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
>   		desc |= (u64)ctx->hw_id << GEN8_CTX_ID_SHIFT;	/* bits 32-52 */
>   	}
>   
> -	ce->lrc_desc = desc;
> +	return desc;
>   }
>   
>   static void unwind_wa_tail(struct i915_request *rq)
> @@ -1284,7 +1282,7 @@ static void execlists_context_unpin(struct intel_context *ce)
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> -static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
> +static int __context_pin(struct i915_vma *vma)
>   {
>   	unsigned int flags;
>   	int err;
> @@ -1307,11 +1305,14 @@ static int __context_pin(struct i915_gem_context *ctx, struct i915_vma *vma)
>   }
>   
>   static void
> -__execlists_update_reg_state(struct intel_engine_cs *engine,
> -			     struct intel_context *ce)
> +__execlists_update_reg_state(struct intel_context *ce,
> +			     struct intel_engine_cs *engine)
>   {
> -	u32 *regs = ce->lrc_reg_state;
>   	struct intel_ring *ring = ce->ring;
> +	u32 *regs = ce->lrc_reg_state;
> +
> +	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->head));
> +	GEM_BUG_ON(!intel_ring_offset_valid(ring, ring->tail));
>   
>   	regs[CTX_RING_BUFFER_START + 1] = i915_ggtt_offset(ring->vma);
>   	regs[CTX_RING_HEAD + 1] = ring->head;
> @@ -1324,25 +1325,26 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
>   	}
>   }
>   
> -static struct intel_context *
> -__execlists_context_pin(struct intel_engine_cs *engine,
> -			struct i915_gem_context *ctx,
> -			struct intel_context *ce)
> +static int
> +__execlists_context_pin(struct intel_context *ce,
> +			struct intel_engine_cs *engine)
>   {
>   	void *vaddr;
>   	int ret;
>   
> -	ret = execlists_context_deferred_alloc(ctx, engine, ce);
> +	GEM_BUG_ON(!ce->gem_context->ppgtt);
> +
> +	ret = execlists_context_deferred_alloc(ce, engine);
>   	if (ret)
>   		goto err;
>   	GEM_BUG_ON(!ce->state);
>   
> -	ret = __context_pin(ctx, ce->state);
> +	ret = __context_pin(ce->state);
>   	if (ret)
>   		goto err;
>   
>   	vaddr = i915_gem_object_pin_map(ce->state->obj,
> -					i915_coherent_map_type(ctx->i915) |
> +					i915_coherent_map_type(engine->i915) |
>   					I915_MAP_OVERRIDE);
>   	if (IS_ERR(vaddr)) {
>   		ret = PTR_ERR(vaddr);
> @@ -1353,26 +1355,16 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   	if (ret)
>   		goto unpin_map;
>   
> -	ret = i915_gem_context_pin_hw_id(ctx);
> +	ret = i915_gem_context_pin_hw_id(ce->gem_context);
>   	if (ret)
>   		goto unpin_ring;
>   
> -	intel_lr_context_descriptor_update(ctx, engine, ce);
> -
> -	GEM_BUG_ON(!intel_ring_offset_valid(ce->ring, ce->ring->head));
> -
> +	ce->lrc_desc = lrc_descriptor(ce, engine);
>   	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
> -
> -	__execlists_update_reg_state(engine, ce);
> +	__execlists_update_reg_state(ce, engine);
>   
>   	ce->state->obj->pin_global++;
> -
> -	mutex_lock(&ctx->mutex);
> -	list_add(&ce->active_link, &ctx->active_engines);
> -	mutex_unlock(&ctx->mutex);
> -
> -	i915_gem_context_get(ctx);
> -	return ce;
> +	return 0;
>   
>   unpin_ring:
>   	intel_ring_unpin(ce->ring);
> @@ -1381,31 +1373,16 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   unpin_vma:
>   	__i915_vma_unpin(ce->state);
>   err:
> -	ce->pin_count = 0;
> -	return ERR_PTR(ret);
> +	return ret;
>   }
>   
> -static struct intel_context *
> -execlists_context_pin(struct intel_engine_cs *engine,
> -		      struct i915_gem_context *ctx)
> +static int execlists_context_pin(struct intel_context *ce)
>   {
> -	struct intel_context *ce;
> -
> -	ce = intel_context_instance(ctx, engine);
> -	if (IS_ERR(ce))
> -		return ce;
> -
> -	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -	GEM_BUG_ON(!ctx->ppgtt);
> -
> -	if (likely(ce->pin_count++))
> -		return ce;
> -	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> -
> -	return __execlists_context_pin(engine, ctx, ce);
> +	return __execlists_context_pin(ce, ce->engine);
>   }

Do you need this trivial wrapper for later?

>   
>   static const struct intel_context_ops execlists_context_ops = {
> +	.pin = execlists_context_pin,
>   	.unpin = execlists_context_unpin,
>   	.destroy = execlists_context_destroy,
>   };
> @@ -2029,7 +2006,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
>   	intel_ring_update_space(rq->ring);
>   
>   	execlists_init_reg_state(regs, rq->hw_context, engine, rq->ring);
> -	__execlists_update_reg_state(engine, rq->hw_context);
> +	__execlists_update_reg_state(rq->hw_context, engine);
>   
>   out_unlock:
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> @@ -2354,7 +2331,6 @@ logical_ring_default_vfuncs(struct intel_engine_cs *engine)
>   	engine->reset.finish = execlists_reset_finish;
>   
>   	engine->context = &execlists_context_ops;
> -	engine->context_pin = execlists_context_pin;
>   	engine->request_alloc = execlists_request_alloc;
>   
>   	engine->emit_flush = gen8_emit_flush;
> @@ -2831,9 +2807,16 @@ populate_lr_context(struct intel_context *ce,
>   	return ret;
>   }
>   
> -static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
> -					    struct intel_engine_cs *engine,
> -					    struct intel_context *ce)
> +static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
> +{
> +	if (ctx->timeline)
> +		return i915_timeline_get(ctx->timeline);
> +	else
> +		return i915_timeline_create(ctx->i915, ctx->name, NULL);
> +}
> +
> +static int execlists_context_deferred_alloc(struct intel_context *ce,
> +					    struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_gem_object *ctx_obj;
>   	struct i915_vma *vma;
> @@ -2853,26 +2836,25 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   	 */
>   	context_size += LRC_HEADER_PAGES * PAGE_SIZE;
>   
> -	ctx_obj = i915_gem_object_create(ctx->i915, context_size);
> +	ctx_obj = i915_gem_object_create(engine->i915, context_size);
>   	if (IS_ERR(ctx_obj))
>   		return PTR_ERR(ctx_obj);
>   
> -	vma = i915_vma_instance(ctx_obj, &ctx->i915->ggtt.vm, NULL);
> +	vma = i915_vma_instance(ctx_obj, &engine->i915->ggtt.vm, NULL);
>   	if (IS_ERR(vma)) {
>   		ret = PTR_ERR(vma);
>   		goto error_deref_obj;
>   	}
>   
> -	if (ctx->timeline)
> -		timeline = i915_timeline_get(ctx->timeline);
> -	else
> -		timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
> +	timeline = get_timeline(ce->gem_context);
>   	if (IS_ERR(timeline)) {
>   		ret = PTR_ERR(timeline);
>   		goto error_deref_obj;
>   	}
>   
> -	ring = intel_engine_create_ring(engine, timeline, ctx->ring_size);
> +	ring = intel_engine_create_ring(engine,
> +					timeline,
> +					ce->gem_context->ring_size);
>   	i915_timeline_put(timeline);
>   	if (IS_ERR(ring)) {
>   		ret = PTR_ERR(ring);
> @@ -2917,7 +2899,7 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
>   		list_for_each_entry(ce, &ctx->active_engines, active_link) {
>   			GEM_BUG_ON(!ce->ring);
>   			intel_ring_reset(ce->ring, 0);
> -			__execlists_update_reg_state(ce->engine, ce);
> +			__execlists_update_reg_state(ce, ce->engine);
>   		}
>   	}
>   }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 9002f7f6d32e..6af2d393b7ff 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1508,11 +1508,9 @@ alloc_context_vma(struct intel_engine_cs *engine)
>   	return ERR_PTR(err);
>   }
>   
> -static struct intel_context *
> -__ring_context_pin(struct intel_engine_cs *engine,
> -		   struct i915_gem_context *ctx,
> -		   struct intel_context *ce)
> +static int ring_context_pin(struct intel_context *ce)
>   {
> +	struct intel_engine_cs *engine = ce->engine;
>   	int err;
>   
>   	/* One ringbuffer to rule them all */
> @@ -1523,55 +1521,29 @@ __ring_context_pin(struct intel_engine_cs *engine,
>   		struct i915_vma *vma;
>   
>   		vma = alloc_context_vma(engine);
> -		if (IS_ERR(vma)) {
> -			err = PTR_ERR(vma);
> -			goto err;
> -		}
> +		if (IS_ERR(vma))
> +			return PTR_ERR(vma);
>   
>   		ce->state = vma;
>   	}
>   
>   	err = __context_pin(ce);
>   	if (err)
> -		goto err;
> +		return err;
>   
>   	err = __context_pin_ppgtt(ce->gem_context);
>   	if (err)
>   		goto err_unpin;
>   
> -	mutex_lock(&ctx->mutex);
> -	list_add(&ce->active_link, &ctx->active_engines);
> -	mutex_unlock(&ctx->mutex);
> -
> -	i915_gem_context_get(ctx);
> -	return ce;
> +	return 0;
>   
>   err_unpin:
>   	__context_unpin(ce);
> -err:
> -	ce->pin_count = 0;
> -	return ERR_PTR(err);
> -}
> -
> -static struct intel_context *
> -ring_context_pin(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
> -{
> -	struct intel_context *ce;
> -
> -	ce = intel_context_instance(ctx, engine);
> -	if (IS_ERR(ce))
> -		return ce;
> -
> -	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -
> -	if (likely(ce->pin_count++))
> -		return ce;
> -	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> -
> -	return __ring_context_pin(engine, ctx, ce);
> +	return err;
>   }
>   
>   static const struct intel_context_ops ring_context_ops = {
> +	.pin = ring_context_pin,
>   	.unpin = ring_context_unpin,
>   	.destroy = ring_context_destroy,
>   };
> @@ -2282,7 +2254,6 @@ static void intel_ring_default_vfuncs(struct drm_i915_private *dev_priv,
>   	engine->reset.finish = reset_finish;
>   
>   	engine->context = &ring_context_ops;
> -	engine->context_pin = ring_context_pin;
>   	engine->request_alloc = ring_request_alloc;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index f1a80006289d..5a5c3ba22061 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -137,41 +137,20 @@ static void mock_context_destroy(struct intel_context *ce)
>   		mock_ring_free(ce->ring);
>   }
>   
> -static struct intel_context *
> -mock_context_pin(struct intel_engine_cs *engine,
> -		 struct i915_gem_context *ctx)
> +static int mock_context_pin(struct intel_context *ce)
>   {
> -	struct intel_context *ce;
> -	int err = -ENOMEM;
> -
> -	ce = intel_context_instance(ctx, engine);
> -	if (IS_ERR(ce))
> -		return ce;
> -
> -	if (ce->pin_count++)
> -		return ce;
> -
>   	if (!ce->ring) {
> -		ce->ring = mock_ring(engine);
> +		ce->ring = mock_ring(ce->engine);
>   		if (!ce->ring)
> -			goto err;
> +			return -ENOMEM;
>   	}
>   
>   	mock_timeline_pin(ce->ring->timeline);
> -
> -	mutex_lock(&ctx->mutex);
> -	list_add(&ce->active_link, &ctx->active_engines);
> -	mutex_unlock(&ctx->mutex);
> -
> -	i915_gem_context_get(ctx);
> -	return ce;
> -
> -err:
> -	ce->pin_count = 0;
> -	return ERR_PTR(err);
> +	return 0;
>   }
>   
>   static const struct intel_context_ops mock_context_ops = {
> +	.pin = mock_context_pin,
>   	.unpin = mock_context_unpin,
>   	.destroy = mock_context_destroy,
>   };
> @@ -235,7 +214,6 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.status_page.addr = (void *)(engine + 1);
>   
>   	engine->base.context = &mock_context_ops;
> -	engine->base.context_pin = mock_context_pin;
>   	engine->base.request_alloc = mock_request_alloc;
>   	engine->base.emit_flush = mock_emit_flush;
>   	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops
  2019-03-05 17:31   ` Tvrtko Ursulin
@ 2019-03-05 18:00     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 18:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 17:31:23)
> 
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > Push the intel_context pin callback down from intel_engine_cs onto the
> > context itself by virtue of having a central caller for
> > intel_context_pin() being able to lookup the intel_context itself.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> > -static struct intel_context *
> > -execlists_context_pin(struct intel_engine_cs *engine,
> > -                   struct i915_gem_context *ctx)
> > +static int execlists_context_pin(struct intel_context *ce)
> >   {
> > -     struct intel_context *ce;
> > -
> > -     ce = intel_context_instance(ctx, engine);
> > -     if (IS_ERR(ce))
> > -             return ce;
> > -
> > -     lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> > -     GEM_BUG_ON(!ctx->ppgtt);
> > -
> > -     if (likely(ce->pin_count++))
> > -             return ce;
> > -     GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
> > -
> > -     return __execlists_context_pin(engine, ctx, ce);
> > +     return __execlists_context_pin(ce, ce->engine);
> >   }
> 
> Do you need this trivial wrapper for later?

Yes, it's being kept because virtual engine wants to feed a different
engine.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine
  2019-03-01 14:03 ` [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine Chris Wilson
@ 2019-03-05 18:07   ` Tvrtko Ursulin
  2019-03-05 18:10     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 18:07 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> Each engine acquires a pin on the kernel contexts (normal and preempt)
> so that the logical state is always available on demand. Keep track of
> each engines pin by storing the returned pointer on the engine for quick
> access.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/intel_engine_cs.c       | 65 ++++++++++----------
>   drivers/gpu/drm/i915/intel_engine_types.h    |  8 +--
>   drivers/gpu/drm/i915/intel_guc_ads.c         |  3 +-
>   drivers/gpu/drm/i915/intel_lrc.c             | 13 ++--
>   drivers/gpu/drm/i915/intel_ringbuffer.c      |  3 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c |  5 +-
>   6 files changed, 45 insertions(+), 52 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 112301560745..0f311752ee08 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -641,12 +641,6 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
>   		i915->caps.scheduler = 0;
>   }
>   
> -static void __intel_context_unpin(struct i915_gem_context *ctx,
> -				  struct intel_engine_cs *engine)
> -{
> -	intel_context_unpin(intel_context_lookup(ctx, engine));
> -}
> -
>   struct measure_breadcrumb {
>   	struct i915_request rq;
>   	struct i915_timeline timeline;
> @@ -697,6 +691,20 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
>   	return dw;
>   }
>   
> +static int pin_context(struct i915_gem_context *ctx,
> +		       struct intel_engine_cs *engine,
> +		       struct intel_context **out)
> +{
> +	struct intel_context *ce;
> +
> +	ce = intel_context_pin(ctx, engine);
> +	if (IS_ERR(ce))
> +		return PTR_ERR(ce);
> +
> +	*out = ce;
> +	return 0;
> +}
> +
>   /**
>    * intel_engines_init_common - initialize cengine state which might require hw access
>    * @engine: Engine to initialize.
> @@ -711,11 +719,8 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
>   int intel_engine_init_common(struct intel_engine_cs *engine)
>   {
>   	struct drm_i915_private *i915 = engine->i915;
> -	struct intel_context *ce;
>   	int ret;
>   
> -	engine->set_default_submission(engine);
> -
>   	/* We may need to do things with the shrinker which
>   	 * require us to immediately switch back to the default
>   	 * context. This can cause a problem as pinning the
> @@ -723,36 +728,34 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>   	 * be available. To avoid this we always pin the default
>   	 * context.
>   	 */
> -	ce = intel_context_pin(i915->kernel_context, engine);
> -	if (IS_ERR(ce))
> -		return PTR_ERR(ce);
> +	ret = pin_context(i915->kernel_context, engine,
> +			  &engine->kernel_context);
> +	if (ret)
> +		return ret;
>   
>   	/*
>   	 * Similarly the preempt context must always be available so that
> -	 * we can interrupt the engine at any time.
> +	 * we can interrupt the engine at any time. However, as preemption
> +	 * is optional, we allow it to fail.
>   	 */
> -	if (i915->preempt_context) {
> -		ce = intel_context_pin(i915->preempt_context, engine);
> -		if (IS_ERR(ce)) {
> -			ret = PTR_ERR(ce);
> -			goto err_unpin_kernel;
> -		}
> -	}
> +	if (i915->preempt_context)
> +		pin_context(i915->preempt_context, engine,
> +			    &engine->preempt_context);

You lost the failure path here. I suspect deliberately? But I am not 
convinced we want to silently lose preemption when keeping the failure 
path is so easy.

>   
>   	ret = measure_breadcrumb_dw(engine);
>   	if (ret < 0)
> -		goto err_unpin_preempt;
> +		goto err_unpin;
>   
>   	engine->emit_fini_breadcrumb_dw = ret;
>   
> -	return 0;
> +	engine->set_default_submission(engine);
>   
> -err_unpin_preempt:
> -	if (i915->preempt_context)
> -		__intel_context_unpin(i915->preempt_context, engine);
> +	return 0;
>   
> -err_unpin_kernel:
> -	__intel_context_unpin(i915->kernel_context, engine);
> +err_unpin:
> +	if (engine->preempt_context)
> +		intel_context_unpin(engine->preempt_context);
> +	intel_context_unpin(engine->kernel_context);
>   	return ret;
>   }
>   
> @@ -765,8 +768,6 @@ int intel_engine_init_common(struct intel_engine_cs *engine)
>    */
>   void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   {
> -	struct drm_i915_private *i915 = engine->i915;
> -
>   	cleanup_status_page(engine);
>   
>   	intel_engine_fini_breadcrumbs(engine);
> @@ -776,9 +777,9 @@ void intel_engine_cleanup_common(struct intel_engine_cs *engine)
>   	if (engine->default_state)
>   		i915_gem_object_put(engine->default_state);
>   
> -	if (i915->preempt_context)
> -		__intel_context_unpin(i915->preempt_context, engine);
> -	__intel_context_unpin(i915->kernel_context, engine);
> +	if (engine->preempt_context)
> +		intel_context_unpin(engine->preempt_context);
> +	intel_context_unpin(engine->kernel_context);
>   
>   	i915_timeline_fini(&engine->timeline);
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index aefc66605729..85158e119638 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -231,11 +231,6 @@ struct intel_engine_execlists {
>   	 */
>   	u32 *csb_status;
>   
> -	/**
> -	 * @preempt_context: the HW context for injecting preempt-to-idle
> -	 */
> -	struct intel_context *preempt_context;
> -
>   	/**
>   	 * @preempt_complete_status: expected CSB upon completing preemption
>   	 */
> @@ -271,6 +266,9 @@ struct intel_engine_cs {
>   
>   	struct i915_timeline timeline;
>   
> +	struct intel_context *kernel_context; /* pinned */
> +	struct intel_context *preempt_context; /* pinned; optional */
> +
>   	struct drm_i915_gem_object *default_state;
>   	void *pinned_default_state;
>   
> diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
> index da220561ac41..5fa169dadd67 100644
> --- a/drivers/gpu/drm/i915/intel_guc_ads.c
> +++ b/drivers/gpu/drm/i915/intel_guc_ads.c
> @@ -121,8 +121,7 @@ int intel_guc_ads_create(struct intel_guc *guc)
>   	 * to find it. Note that we have to skip our header (1 page),
>   	 * because our GuC shared data is there.
>   	 */
> -	kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
> -					      dev_priv->engine[RCS])->state;
> +	kernel_ctx_vma = dev_priv->engine[RCS]->kernel_context->state;
>   	blob->ads.golden_context_lrca =
>   		intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 8dddea8e1c97..473cfea4c503 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -620,7 +620,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
>   static void inject_preempt_context(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists *execlists = &engine->execlists;
> -	struct intel_context *ce = execlists->preempt_context;
> +	struct intel_context *ce = engine->preempt_context;
>   	unsigned int n;
>   
>   	GEM_BUG_ON(execlists->preempt_complete_status !=
> @@ -2316,7 +2316,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   
>   	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
> -	if (engine->i915->preempt_context)
> +	if (engine->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> @@ -2418,14 +2418,9 @@ static int logical_ring_init(struct intel_engine_cs *engine)
>   	}
>   
>   	execlists->preempt_complete_status = ~0u;
> -	if (i915->preempt_context) {
> -		struct intel_context *ce =
> -			intel_context_lookup(i915->preempt_context, engine);
> -
> -		execlists->preempt_context = ce;
> +	if (engine->preempt_context)
>   		execlists->preempt_complete_status =
> -			upper_32_bits(ce->lrc_desc);
> -	}
> +			upper_32_bits(engine->preempt_context->lrc_desc);
>   
>   	execlists->csb_status =
>   		&engine->status_page.addr[I915_HWS_CSB_BUF0_INDEX];
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 6af2d393b7ff..4f137d0a5316 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1734,8 +1734,7 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
>   		 * placeholder we use to flush other contexts.
>   		 */
>   		*cs++ = MI_SET_CONTEXT;
> -		*cs++ = i915_ggtt_offset(intel_context_lookup(i915->kernel_context,
> -							      engine)->state) |
> +		*cs++ = i915_ggtt_offset(engine->kernel_context->state) |
>   			MI_MM_SPACE_GTT |
>   			MI_RESTORE_INHIBIT;
>   	}
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 5a5c3ba22061..6434e968ecb0 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -233,7 +233,8 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	timer_setup(&engine->hw_delay, hw_delay_complete, 0);
>   	INIT_LIST_HEAD(&engine->hw_queue);
>   
> -	if (IS_ERR(intel_context_pin(i915->kernel_context, &engine->base)))
> +	if (pin_context(i915->kernel_context, &engine->base,
> +			&engine->base.kernel_context))
>   		goto err_breadcrumbs;
>   
>   	return &engine->base;
> @@ -276,7 +277,7 @@ void mock_engine_free(struct intel_engine_cs *engine)
>   	if (ce)
>   		intel_context_unpin(ce);
>   
> -	__intel_context_unpin(engine->i915->kernel_context, engine);
> +	intel_context_unpin(engine->kernel_context);
>   
>   	intel_engine_fini_breadcrumbs(engine);
>   	i915_timeline_fini(&engine->timeline);
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine
  2019-03-05 18:07   ` Tvrtko Ursulin
@ 2019-03-05 18:10     ` Chris Wilson
  2019-03-05 18:17       ` Tvrtko Ursulin
  0 siblings, 1 reply; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 18:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 18:07:39)
> >       /*
> >        * Similarly the preempt context must always be available so that
> > -      * we can interrupt the engine at any time.
> > +      * we can interrupt the engine at any time. However, as preemption
> > +      * is optional, we allow it to fail.
> >        */
> > -     if (i915->preempt_context) {
> > -             ce = intel_context_pin(i915->preempt_context, engine);
> > -             if (IS_ERR(ce)) {
> > -                     ret = PTR_ERR(ce);
> > -                     goto err_unpin_kernel;
> > -             }
> > -     }
> > +     if (i915->preempt_context)
> > +             pin_context(i915->preempt_context, engine,
> > +                         &engine->preempt_context);
> 
> You lost the failure path here. I suspect deliberately? But I am not 
> convinced we want to silently lose preemption when keeping the failure 
> path is so easy.

The failure path kills the module. Whereas we can quite happily survive
without preemption.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine
  2019-03-05 18:10     ` Chris Wilson
@ 2019-03-05 18:17       ` Tvrtko Ursulin
  2019-03-05 19:26         ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-05 18:17 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 05/03/2019 18:10, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-05 18:07:39)
>>>        /*
>>>         * Similarly the preempt context must always be available so that
>>> -      * we can interrupt the engine at any time.
>>> +      * we can interrupt the engine at any time. However, as preemption
>>> +      * is optional, we allow it to fail.
>>>         */
>>> -     if (i915->preempt_context) {
>>> -             ce = intel_context_pin(i915->preempt_context, engine);
>>> -             if (IS_ERR(ce)) {
>>> -                     ret = PTR_ERR(ce);
>>> -                     goto err_unpin_kernel;
>>> -             }
>>> -     }
>>> +     if (i915->preempt_context)
>>> +             pin_context(i915->preempt_context, engine,
>>> +                         &engine->preempt_context);
>>
>> You lost the failure path here. I suspect deliberately? But I am not
>> convinced we want to silently lose preemption when keeping the failure
>> path is so easy.
> 
> The failure path kills the module. Whereas we can quite happily survive
> without preemption.

Yes it is hard to decide what is worse, modprobe failure which never 
happens, or change in performance profile which also never happens. :)

For something so unlikely I'd rather see it fail than silently change 
behaviour. Perhaps it has some relevance during development and platform 
bringup if nowhere else.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs
  2019-03-05 16:45     ` Chris Wilson
@ 2019-03-05 18:27       ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 18:27 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2019-03-05 16:45:32)
> Quoting Tvrtko Ursulin (2019-03-05 16:27:34)
> > 
> > On 01/03/2019 14:03, Chris Wilson wrote:
> > > diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> > > index 5ec6e72d0ffb..546b790871ad 100644
> > > --- a/drivers/gpu/drm/i915/intel_engine_types.h
> > > +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> > > @@ -351,6 +351,7 @@ struct intel_engine_cs {
> > >   
> > >       void            (*set_default_submission)(struct intel_engine_cs *engine);
> > >   
> > > +     const struct intel_context_ops *context;
> > 
> > Calling ce_ops / context_ops / hw_context_ops ? Anything but context! :)
> 
> cops.

Or khufu (via cheops).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 26/38] drm/i915: Pass around the intel_context
  2019-03-05 16:33     ` Chris Wilson
@ 2019-03-05 19:23       ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 19:23 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2019-03-05 16:33:21)
> Quoting Tvrtko Ursulin (2019-03-05 16:16:01)
> > 
> > On 01/03/2019 14:03, Chris Wilson wrote:
> > > Instead of passing the gem_context and engine to find the instance of
> > > the intel_context to use, pass around the intel_context instead. This is
> > > useful for the next few patches, where the intel_context is no longer a
> > > direct lookup.
> > > 
> > > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > > ---
> > > @@ -1314,9 +1314,10 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
> > >       regs[CTX_RING_TAIL + 1] = ring->tail;
> > >   
> > >       /* RPCS */
> > > -     if (engine->class == RENDER_CLASS)
> > > -             regs[CTX_R_PWR_CLK_STATE + 1] = gen8_make_rpcs(engine->i915,
> > > -                                                            &ce->sseu);
> > > +     if (engine->class == RENDER_CLASS) {
> > > +             regs[CTX_R_PWR_CLK_STATE + 1] =
> > > +                     gen8_make_rpcs(engine->i915, &ce->sseu);
> > > +     }
> > 
> > Rebasing artifact I guess.
> 
> I must have slipped :-p
> 
> > > @@ -2659,7 +2660,7 @@ static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
> > >   }
> > >   
> > >   static void execlists_init_reg_state(u32 *regs,
> > > -                                  struct i915_gem_context *ctx,
> > > +                                  struct intel_context *ce,
> > >                                    struct intel_engine_cs *engine,
> > 
> > I can't remember if separate engine param is needed since now there's 
> > ce->engine. Maybe it comes useful later.
> 
> We could even initialise ce->ring, or just pull that
> ce->gem_context->ring_size.
> 
> I didn't have a good reason to keep engine, nor a good enough reason to
> kill it -- although having ce is enough of a change I guess.

Ah, yes there was. It's for virtual engine.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine
  2019-03-05 18:17       ` Tvrtko Ursulin
@ 2019-03-05 19:26         ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-05 19:26 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-05 18:17:35)
> 
> On 05/03/2019 18:10, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-05 18:07:39)
> >>>        /*
> >>>         * Similarly the preempt context must always be available so that
> >>> -      * we can interrupt the engine at any time.
> >>> +      * we can interrupt the engine at any time. However, as preemption
> >>> +      * is optional, we allow it to fail.
> >>>         */
> >>> -     if (i915->preempt_context) {
> >>> -             ce = intel_context_pin(i915->preempt_context, engine);
> >>> -             if (IS_ERR(ce)) {
> >>> -                     ret = PTR_ERR(ce);
> >>> -                     goto err_unpin_kernel;
> >>> -             }
> >>> -     }
> >>> +     if (i915->preempt_context)
> >>> +             pin_context(i915->preempt_context, engine,
> >>> +                         &engine->preempt_context);
> >>
> >> You lost the failure path here. I suspect deliberately? But I am not
> >> convinced we want to silently lose preemption when keeping the failure
> >> path is so easy.
> > 
> > The failure path kills the module. Whereas we can quite happily survive
> > without preemption.
> 
> Yes it is hard to decide what is worse, modprobe failure which never 
> happens, or change in performance profile which also never happens. :)
> 
> For something so unlikely I'd rather see it fail than silently change 
> behaviour. Perhaps it has some relevance during development and platform 
> bringup if nowhere else.

I err on the opposite side, keep going at all costs so that the user
isn't scaring at a blank screen of doom. It's not exactly a silent
failure, we tell anyone who asks that preemption is enabled :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management
  2019-03-01 14:03 ` [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management Chris Wilson
@ 2019-03-06  9:45   ` Tvrtko Ursulin
  2019-03-06 10:15     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-06  9:45 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> Introduce a mutex to start locking the HW contexts independently of
> struct_mutex, with a view to reducing the coarse struct_mutex. The
> intel_context.pin_mutex is used to guard the transition to and from being
> pinned on the gpu, and so is required before starting to build any
> request, and released when the execution is completed.

execution is completed or construction/emission is completed?

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c               |  2 +-
>   drivers/gpu/drm/i915/i915_gem_context.c       | 76 +++++++------------
>   drivers/gpu/drm/i915/intel_context.c          | 56 ++++++++++++--
>   drivers/gpu/drm/i915/intel_context.h          | 38 ++++++----
>   drivers/gpu/drm/i915/intel_context_types.h    |  8 +-
>   drivers/gpu/drm/i915/intel_lrc.c              |  4 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c       |  4 +-
>   .../gpu/drm/i915/selftests/i915_gem_context.c |  2 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  2 +-
>   9 files changed, 110 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c6a25c3276ee..4babea524fc8 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4706,7 +4706,7 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
>   		if (!state)
>   			continue;
>   
> -		GEM_BUG_ON(ce->pin_count);
> +		GEM_BUG_ON(intel_context_is_pinned(ce));
>   
>   		/*
>   		 * As we will hold a reference to the logical state, it will
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 60b17f6a727d..321b26e302e5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1096,23 +1096,27 @@ static int gen8_emit_rpcs_config(struct i915_request *rq,
>   }
>   
>   static int
> -gen8_modify_rpcs_gpu(struct intel_context *ce,
> -		     struct intel_engine_cs *engine,
> -		     struct intel_sseu sseu)
> +gen8_modify_rpcs_gpu(struct intel_context *ce, struct intel_sseu sseu)
>   {
> -	struct drm_i915_private *i915 = engine->i915;
> +	struct drm_i915_private *i915 = ce->engine->i915;
>   	struct i915_request *rq, *prev;
>   	intel_wakeref_t wakeref;
>   	int ret;
>   
> -	GEM_BUG_ON(!ce->pin_count);
> +	lockdep_assert_held(&ce->pin_mutex);
>   
> -	lockdep_assert_held(&i915->drm.struct_mutex);
> +	/*
> +	 * If context is not idle we have to submit an ordered request to modify
> +	 * its context image via the kernel context. Pristine and idle contexts
> +	 * will be configured on pinning.
> +	 */
> +	if (!intel_context_is_pinned(ce))
> +		return 0;
>   
>   	/* Submitting requests etc needs the hw awake. */
>   	wakeref = intel_runtime_pm_get(i915);
>   
> -	rq = i915_request_alloc(engine, i915->kernel_context);
> +	rq = i915_request_alloc(ce->engine, i915->kernel_context);
>   	if (IS_ERR(rq)) {
>   		ret = PTR_ERR(rq);
>   		goto out_put;
> @@ -1156,53 +1160,30 @@ gen8_modify_rpcs_gpu(struct intel_context *ce,
>   }
>   
>   static int
> -__i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> -				    struct intel_engine_cs *engine,
> -				    struct intel_sseu sseu)
> +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> +				  struct intel_engine_cs *engine,
> +				  struct intel_sseu sseu)
>   {
>   	struct intel_context *ce;
>   	int ret = 0;
>   
> -	ce = intel_context_instance(ctx, engine);
> -	if (IS_ERR(ce))
> -		return PTR_ERR(ce);
> -
>   	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
>   	GEM_BUG_ON(engine->id != RCS);
>   
> +	ce = intel_context_pin_lock(ctx, engine);
> +	if (IS_ERR(ce))
> +		return PTR_ERR(ce);
> +
>   	/* Nothing to do if unmodified. */
>   	if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
> -		return 0;
> -
> -	/*
> -	 * If context is not idle we have to submit an ordered request to modify
> -	 * its context image via the kernel context. Pristine and idle contexts
> -	 * will be configured on pinning.
> -	 */
> -	if (ce->pin_count)
> -		ret = gen8_modify_rpcs_gpu(ce, engine, sseu);
> +		goto unlock;
>   
> +	ret = gen8_modify_rpcs_gpu(ce, sseu);

The _gpu suffix is not true any more so I'd drop it.

>   	if (!ret)
>   		ce->sseu = sseu;
>   
> -	return ret;
> -}
> -
> -static int
> -i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> -				  struct intel_engine_cs *engine,
> -				  struct intel_sseu sseu)
> -{
> -	int ret;
> -
> -	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
> -
> -	mutex_unlock(&ctx->i915->drm.struct_mutex);
> -
> +unlock:
> +	intel_context_pin_unlock(ce);
>   	return ret;
>   }
>   
> @@ -1620,11 +1601,12 @@ static int clone_sseu(struct i915_gem_context *dst,
>   		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
>   			continue;
>   
> -		ce = intel_context_instance(dst, engine);
> +		ce = intel_context_pin_lock(dst, engine);
>   		if (IS_ERR(ce))
>   			return PTR_ERR(ce);
>   
>   		ce->sseu = sseu;
> +		intel_context_pin_unlock(ce);
>   	}
>   
>   	return 0;
> @@ -1790,7 +1772,6 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	struct intel_engine_cs *engine;
>   	struct intel_context *ce;
>   	unsigned long lookup;
> -	int ret;
>   
>   	if (args->size == 0)
>   		goto out;
> @@ -1817,21 +1798,16 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	if (!engine)
>   		return -EINVAL;
>   
> -	ce = intel_context_instance(ctx, engine);
> +	ce = intel_context_pin_lock(ctx, engine); /* serialise with set_sseu */
>   	if (IS_ERR(ce))
>   		return PTR_ERR(ce);
>   
> -	/* Only use for mutex here is to serialize get_param and set_param. */
> -	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> -	if (ret)
> -		return ret;
> -
>   	user_sseu.slice_mask = ce->sseu.slice_mask;
>   	user_sseu.subslice_mask = ce->sseu.subslice_mask;
>   	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
>   	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
>   
> -	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +	intel_context_pin_unlock(ce);
>   
>   	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
>   			 sizeof(user_sseu)))
> diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
> index 7de02be0b552..7c4171114e5b 100644
> --- a/drivers/gpu/drm/i915/intel_context.c
> +++ b/drivers/gpu/drm/i915/intel_context.c
> @@ -86,7 +86,7 @@ void __intel_context_remove(struct intel_context *ce)
>   	spin_unlock(&ctx->hw_contexts_lock);
>   }
>   
> -struct intel_context *
> +static struct intel_context *
>   intel_context_instance(struct i915_gem_context *ctx,
>   		       struct intel_engine_cs *engine)
>   {
> @@ -110,6 +110,21 @@ intel_context_instance(struct i915_gem_context *ctx,
>   	return pos;
>   }
>   
> +struct intel_context *
> +intel_context_pin_lock(struct i915_gem_context *ctx,
> +		       struct intel_engine_cs *engine)
> +	__acquires(ce->pin_mutex)
> +{
> +	struct intel_context *ce;
> +
> +	ce = intel_context_instance(ctx, engine);
> +	if (IS_ERR(ce))
> +		return ce;
> +
> +	mutex_lock(&ce->pin_mutex);
> +	return ce;
> +}
> +
>   struct intel_context *
>   intel_context_pin(struct i915_gem_context *ctx,
>   		  struct intel_engine_cs *engine)
> @@ -117,16 +132,20 @@ intel_context_pin(struct i915_gem_context *ctx,
>   	struct intel_context *ce;
>   	int err;
>   
> -	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
> -
>   	ce = intel_context_instance(ctx, engine);
>   	if (IS_ERR(ce))
>   		return ce;
>   
> -	if (unlikely(!ce->pin_count++)) {
> +	if (likely(atomic_inc_not_zero(&ce->pin_count)))
> +		return ce;
> +
> +	if (mutex_lock_interruptible(&ce->pin_mutex))
> +		return ERR_PTR(-EINTR);
> +
> +	if (likely(!atomic_read(&ce->pin_count))) {
>   		err = ce->ops->pin(ce);
>   		if (err)
> -			goto err_unpin;
> +			goto err;
>   
>   		mutex_lock(&ctx->mutex);
>   		list_add(&ce->active_link, &ctx->active_engines);
> @@ -134,16 +153,35 @@ intel_context_pin(struct i915_gem_context *ctx,
>   
>   		i915_gem_context_get(ctx);
>   		GEM_BUG_ON(ce->gem_context != ctx);
> +
> +		smp_mb__before_atomic();

Why is this needed?

>   	}
> -	GEM_BUG_ON(!ce->pin_count); /* no overflow! */
>   
> +	atomic_inc(&ce->pin_count);
> +	GEM_BUG_ON(!intel_context_is_pinned(ce)); /* no overflow! */
> +
> +	mutex_unlock(&ce->pin_mutex);
>   	return ce;
>   
> -err_unpin:
> -	ce->pin_count = 0;
> +err:
> +	mutex_unlock(&ce->pin_mutex);
>   	return ERR_PTR(err);
>   }
>   
> +void intel_context_unpin(struct intel_context *ce)
> +{
> +	if (likely(atomic_add_unless(&ce->pin_count, -1, 1)))
> +		return;
> +
> +	/* We may be called from inside intel_context_pin() to evict another */
> +	mutex_lock_nested(&ce->pin_mutex, SINGLE_DEPTH_NESTING);
> +
> +	if (likely(atomic_dec_and_test(&ce->pin_count)))
> +		ce->ops->unpin(ce);
> +
> +	mutex_unlock(&ce->pin_mutex);
> +}
> +
>   static void intel_context_retire(struct i915_active_request *active,
>   				 struct i915_request *rq)
>   {
> @@ -165,6 +203,8 @@ intel_context_init(struct intel_context *ce,
>   	INIT_LIST_HEAD(&ce->signal_link);
>   	INIT_LIST_HEAD(&ce->signals);
>   
> +	mutex_init(&ce->pin_mutex);
> +
>   	/* Use the whole device by default */
>   	ce->sseu = intel_device_default_sseu(ctx->i915);
>   
> diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
> index aa34311a472e..3657f3a34f8e 100644
> --- a/drivers/gpu/drm/i915/intel_context.h
> +++ b/drivers/gpu/drm/i915/intel_context.h
> @@ -7,6 +7,8 @@
>   #ifndef __INTEL_CONTEXT_H__
>   #define __INTEL_CONTEXT_H__
>   
> +#include <linux/lockdep.h>
> +
>   #include "intel_context_types.h"
>   #include "intel_engine_types.h"
>   
> @@ -26,18 +28,30 @@ intel_context_lookup(struct i915_gem_context *ctx,
>   		     struct intel_engine_cs *engine);
>   
>   /**
> - * intel_context_instance - Lookup or allocate the HW context for (ctx, engine)
> + * intel_context_pin_lock - Stablises the 'pinned' status of the HW context
>    * @ctx - the parent GEM context
>    * @engine - the target HW engine
>    *
> - * Returns the existing HW context for this pair of (GEM context, engine), or
> - * allocates and initialises a fresh context. Once allocated, the HW context
> - * remains resident until the GEM context is destroyed.
> + * Acquire a lock on the pinned status of the HW context, such that the context
> + * can neither be bound to the GPU or unbound whilst the lock is held, i.e.
> + * intel_context_is_pinned() remains stable.
>    */
>   struct intel_context *
> -intel_context_instance(struct i915_gem_context *ctx,
> +intel_context_pin_lock(struct i915_gem_context *ctx,
>   		       struct intel_engine_cs *engine);
>   
> +static inline bool
> +intel_context_is_pinned(struct intel_context *ce)
> +{
> +	return atomic_read(&ce->pin_count);
> +}
> +
> +static inline void intel_context_pin_unlock(struct intel_context *ce)
> +__releases(ce->pin_mutex)
> +{
> +	mutex_unlock(&ce->pin_mutex);
> +}
> +
>   struct intel_context *
>   __intel_context_insert(struct i915_gem_context *ctx,
>   		       struct intel_engine_cs *engine,
> @@ -50,18 +64,10 @@ intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine);
>   
>   static inline void __intel_context_pin(struct intel_context *ce)
>   {
> -	GEM_BUG_ON(!ce->pin_count);
> -	ce->pin_count++;
> +	GEM_BUG_ON(!intel_context_is_pinned(ce));
> +	atomic_inc(&ce->pin_count);

This can be called without the mutex held but only if not zero, right?

BUG_ON(!atomic_inc_not_zero(&ce->pin_count)) ?

>   }
>   
> -static inline void intel_context_unpin(struct intel_context *ce)
> -{
> -	GEM_BUG_ON(!ce->pin_count);
> -	if (--ce->pin_count)
> -		return;
> -
> -	GEM_BUG_ON(!ce->ops);
> -	ce->ops->unpin(ce);
> -}
> +void intel_context_unpin(struct intel_context *ce);
>   
>   #endif /* __INTEL_CONTEXT_H__ */
> diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
> index 804fa93de853..6dc9b4b9067b 100644
> --- a/drivers/gpu/drm/i915/intel_context_types.h
> +++ b/drivers/gpu/drm/i915/intel_context_types.h
> @@ -8,6 +8,7 @@
>   #define __INTEL_CONTEXT_TYPES__
>   
>   #include <linux/list.h>
> +#include <linux/mutex.h>
>   #include <linux/rbtree.h>
>   #include <linux/types.h>
>   
> @@ -38,14 +39,19 @@ struct intel_context {
>   	struct i915_gem_context *gem_context;
>   	struct intel_engine_cs *engine;
>   	struct intel_engine_cs *active;
> +
>   	struct list_head active_link;
>   	struct list_head signal_link;
>   	struct list_head signals;
> +
>   	struct i915_vma *state;
>   	struct intel_ring *ring;
> +
>   	u32 *lrc_reg_state;
>   	u64 lrc_desc;
> -	int pin_count;
> +
> +	atomic_t pin_count;
> +	struct mutex pin_mutex; /* guards pinning and associated on-gpuing */

Not pinning but transition from pinned to unpinned state AFAIU.

>   
>   	/**
>   	 * active_tracker: Active tracker for the external rq activity
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 473cfea4c503..e69b52c4d3ce 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1238,7 +1238,7 @@ static void __execlists_context_fini(struct intel_context *ce)
>   
>   static void execlists_context_destroy(struct intel_context *ce)
>   {
> -	GEM_BUG_ON(ce->pin_count);
> +	GEM_BUG_ON(intel_context_is_pinned(ce));
>   
>   	if (ce->state)
>   		__execlists_context_fini(ce);
> @@ -1476,7 +1476,7 @@ static int execlists_request_alloc(struct i915_request *request)
>   {
>   	int ret;
>   
> -	GEM_BUG_ON(!request->hw_context->pin_count);
> +	GEM_BUG_ON(!intel_context_is_pinned(request->hw_context));
>   
>   	/*
>   	 * Flush enough space to reduce the likelihood of waiting after
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 4f137d0a5316..a2bdf188f3f6 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1356,7 +1356,7 @@ static void __ring_context_fini(struct intel_context *ce)
>   
>   static void ring_context_destroy(struct intel_context *ce)
>   {
> -	GEM_BUG_ON(ce->pin_count);
> +	GEM_BUG_ON(intel_context_is_pinned(ce));
>   
>   	if (ce->state)
>   		__ring_context_fini(ce);
> @@ -1917,7 +1917,7 @@ static int ring_request_alloc(struct i915_request *request)
>   {
>   	int ret;
>   
> -	GEM_BUG_ON(!request->hw_context->pin_count);
> +	GEM_BUG_ON(!intel_context_is_pinned(request->hw_context));
>   	GEM_BUG_ON(request->timeline->has_initial_breadcrumb);
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 15f016ca8e0d..c6c14d1b2ce8 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -1029,7 +1029,7 @@ __sseu_test(struct drm_i915_private *i915,
>   	if (ret)
>   		goto out_context;
>   
> -	ret = __i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
> +	ret = i915_gem_context_reconfigure_sseu(ctx, engine, sseu);
>   	if (ret)
>   		goto out_spin;
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 6434e968ecb0..c032c0cf20e6 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -131,7 +131,7 @@ static void mock_context_unpin(struct intel_context *ce)
>   
>   static void mock_context_destroy(struct intel_context *ce)
>   {
> -	GEM_BUG_ON(ce->pin_count);
> +	GEM_BUG_ON(intel_context_is_pinned(ce));
>   
>   	if (ce->ring)
>   		mock_ring_free(ce->ring);
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management
  2019-03-06  9:45   ` Tvrtko Ursulin
@ 2019-03-06 10:15     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-06 10:15 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-06 09:45:27)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > Introduce a mutex to start locking the HW contexts independently of
> > struct_mutex, with a view to reducing the coarse struct_mutex. The
> > intel_context.pin_mutex is used to guard the transition to and from being
> > pinned on the gpu, and so is required before starting to build any
> > request, and released when the execution is completed.
> 
> execution is completed or construction/emission is completed?

We keep the context pinned until the end of emission, but we only need
to hold the lock while pinning the request. Yup, that last sentence
changed direction half-way through and is confused.

> >   static int
> > -__i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> > -                                 struct intel_engine_cs *engine,
> > -                                 struct intel_sseu sseu)
> > +i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
> > +                               struct intel_engine_cs *engine,
> > +                               struct intel_sseu sseu)
> >   {
> >       struct intel_context *ce;
> >       int ret = 0;
> >   
> > -     ce = intel_context_instance(ctx, engine);
> > -     if (IS_ERR(ce))
> > -             return PTR_ERR(ce);
> > -
> >       GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
> >       GEM_BUG_ON(engine->id != RCS);
> >   
> > +     ce = intel_context_pin_lock(ctx, engine);
> > +     if (IS_ERR(ce))
> > +             return PTR_ERR(ce);
> > +
> >       /* Nothing to do if unmodified. */
> >       if (!memcmp(&ce->sseu, &sseu, sizeof(sseu)))
> > -             return 0;
> > -
> > -     /*
> > -      * If context is not idle we have to submit an ordered request to modify
> > -      * its context image via the kernel context. Pristine and idle contexts
> > -      * will be configured on pinning.
> > -      */
> > -     if (ce->pin_count)
> > -             ret = gen8_modify_rpcs_gpu(ce, engine, sseu);
> > +             goto unlock;
> >   
> > +     ret = gen8_modify_rpcs_gpu(ce, sseu);
> 
> The _gpu suffix is not true any more so I'd drop it.

Ok.

> > @@ -134,16 +153,35 @@ intel_context_pin(struct i915_gem_context *ctx,
> >   
> >               i915_gem_context_get(ctx);
> >               GEM_BUG_ON(ce->gem_context != ctx);
> > +
> > +             smp_mb__before_atomic();
> 
> Why is this needed?

nop on intel, but because we are using the pin_count == 0 as the barrier
for others, before that is incremented we need to be sure that writes
into ce (for the pinned state) are visible.

> > @@ -50,18 +64,10 @@ intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine);
> >   
> >   static inline void __intel_context_pin(struct intel_context *ce)
> >   {
> > -     GEM_BUG_ON(!ce->pin_count);
> > -     ce->pin_count++;
> > +     GEM_BUG_ON(!intel_context_is_pinned(ce));
> > +     atomic_inc(&ce->pin_count);
> 
> This can be called without the mutex held but only if not zero, right?
> 
> BUG_ON(!atomic_inc_not_zero(&ce->pin_count)) ?

That's quite heavy (it's a locked cmpxchg), whereas a plain locked inc
doesn't involve a roundtrip.

In terms of this being a function that should only be used at controlled
points, and for that I think the GEM_BUG_ON(!intel_context_is_pinned())
is the right documentation (i.e. you should only be using this if you
know you have already pinned the ce yourself).

> >       u32 *lrc_reg_state;
> >       u64 lrc_desc;
> > -     int pin_count;
> > +
> > +     atomic_t pin_count;
> > +     struct mutex pin_mutex; /* guards pinning and associated on-gpuing */
> 
> Not pinning but transition from pinned to unpinned state AFAIU.

unpinned to pinned! A lot may happen the first time we pin, and every
user needs to be serialised across that.

pinned to unpinned is less tricky, but still requires serialisation with a
potential concurrent pinner.

The only cheap (as cheap as any cmpxchg operation may be) path is for
already pinned ce, which fortunately is our steady state.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-01 14:03 ` [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-03-06 11:27   ` Tvrtko Ursulin
  2019-03-06 11:36     ` Chris Wilson
  0 siblings, 1 reply; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-06 11:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:03, Chris Wilson wrote:
> In preparation to making the ppGTT binding for a context explicit (to
> facilitate reusing the same ppGTT between different contexts), allow the
> user to create and destroy named ppGTT.
> 
> v2: Replace global barrier for swapping over the ppgtt and tlbs with a
> local context barrier (Tvrtko)
> v3: serialise with struct_mutex; it's lazy but required dammit
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c               |   2 +
>   drivers/gpu/drm/i915/i915_drv.h               |   3 +
>   drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
>   drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 239 ++++++++++++----
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
>   drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
>   include/uapi/drm/i915_drm.h                   |  36 +++
>   11 files changed, 511 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 224bb96b7877..6b75d1b7b8bd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -3008,6 +3008,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
>   };
>   
>   static struct drm_driver driver = {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 195e71bb4a4f..cbc2b59722ae 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -218,6 +218,9 @@ struct drm_i915_file_private {
>   	} mm;
>   	struct idr context_idr;
>   
> +	struct mutex vm_lock;
> +	struct idr vm_idr;
> +
>   	unsigned int bsd_engine;
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 91926a407548..8c35b6019f0d 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -124,6 +124,8 @@ static void lut_close(struct i915_gem_context *ctx)
>   		struct i915_vma *vma = rcu_dereference_raw(*slot);
>   
>   		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
> +
> +		vma->open_count--;

Assuming none of the new code in this patch gets called, then this 
change looks like standalone? What am I missing?

>   		__i915_gem_object_release_unless_active(vma->obj);
>   	}
>   	rcu_read_unlock();
> @@ -308,7 +310,7 @@ static void context_close(struct i915_gem_context *ctx)
>   	 */
>   	lut_close(ctx);
>   	if (ctx->ppgtt)
> -		i915_ppgtt_close(&ctx->ppgtt->vm);
> +		i915_ppgtt_close(ctx->ppgtt);
>   
>   	ctx->file_priv = ERR_PTR(-EBADF);
>   	i915_gem_context_put(ctx);
> @@ -447,6 +449,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
>   	context_close(ctx);
>   }
>   
> +static struct i915_hw_ppgtt *
> +__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct i915_hw_ppgtt *old = ctx->ppgtt;
> +
> +	i915_ppgtt_open(ppgtt);
> +	ctx->ppgtt = i915_ppgtt_get(ppgtt);
> +
> +	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
> +
> +	return old;
> +}
> +
> +static void __assign_ppgtt(struct i915_gem_context *ctx,
> +			   struct i915_hw_ppgtt *ppgtt)
> +{
> +	if (ppgtt == ctx->ppgtt)
> +		return;
> +
> +	ppgtt = __set_ppgtt(ctx, ppgtt);
> +	if (ppgtt) {
> +		i915_ppgtt_close(ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +}
> +
>   static struct i915_gem_context *
>   i915_gem_create_context(struct drm_i915_private *dev_priv,
>   			struct drm_i915_file_private *file_priv)
> @@ -473,8 +501,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   			return ERR_CAST(ppgtt);
>   		}
>   
> -		ctx->ppgtt = ppgtt;
> -		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
> +		__assign_ppgtt(ctx, ppgtt);
> +		i915_ppgtt_put(ppgtt);
>   	}
>   
>   	trace_i915_context_create(ctx);
> @@ -655,19 +683,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
>   	return 0;
>   }
>   
> +static int vm_idr_cleanup(int id, void *p, void *data)
> +{
> +	i915_ppgtt_put(p);
> +	return 0;
> +}
> +
>   int i915_gem_context_open(struct drm_i915_private *i915,
>   			  struct drm_file *file)
>   {
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct i915_gem_context *ctx;
>   
> +	mutex_init(&file_priv->vm_lock);
> +
>   	idr_init(&file_priv->context_idr);
> +	idr_init_base(&file_priv->vm_idr, 1);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   	ctx = i915_gem_create_context(i915, file_priv);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	if (IS_ERR(ctx)) {
>   		idr_destroy(&file_priv->context_idr);
> +		idr_destroy(&file_priv->vm_idr);
>   		return PTR_ERR(ctx);
>   	}
>   
> @@ -684,6 +722,89 @@ void i915_gem_context_close(struct drm_file *file)
>   
>   	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
>   	idr_destroy(&file_priv->context_idr);
> +
> +	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
> +	idr_destroy(&file_priv->vm_idr);
> +
> +	mutex_destroy(&file_priv->vm_lock);
> +}
> +
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +
> +	if (!HAS_FULL_PPGTT(i915))
> +		return -ENODEV;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	ppgtt = i915_ppgtt_create(i915, file_priv);
> +	if (IS_ERR(ppgtt))
> +		return PTR_ERR(ppgtt);
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		goto err_put;
> +
> +	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (err < 0)
> +		goto err_put;
> +
> +	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
> +	ppgtt->user_handle = err;
> +	args->id = err;
> +	return 0;
> +
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +	u32 id;
> +

Do we want an -ENODEV check here as well?

> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	id = args->id;
> +	if (!id)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_remove(&file_priv->vm_idr, id);
> +	if (ppgtt) {
> +		GEM_BUG_ON(!ppgtt->user_handle);
> +		ppgtt->user_handle = 0;
> +	}
> +
> +	mutex_unlock(&file_priv->vm_lock);

Nit - move to just after idr_remove for symmetry with create?

> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	i915_ppgtt_put(ppgtt);
> +	return 0;
>   }
>   
>   static struct i915_request *
> @@ -831,6 +952,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
>   	return 0;
>   }
>   
> +static int get_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int ret;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	/* XXX rcu acquire? */
> +	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt = i915_ppgtt_get(ctx->ppgtt);
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	ret = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (ret)
> +		goto err_put;
> +
> +	if (!ppgtt->user_handle) {
> +		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +		GEM_BUG_ON(!ret);
> +		if (ret < 0)
> +			goto err_unlock;
> +
> +		ppgtt->user_handle = ret;
> +		i915_ppgtt_get(ppgtt);
> +	}
> +
> +	args->size = 0;
> +	args->value = ppgtt->user_handle;
> +
> +	ret = 0;
> +err_unlock:
> +	mutex_unlock(&file_priv->vm_lock);
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return ret;
> +}
> +
> +static void set_ppgtt_barrier(void *data)
> +{
> +	struct i915_hw_ppgtt *old = data;
> +
> +	i915_ppgtt_close(old);
> +	i915_ppgtt_put(old);
> +}
> +
> +static int set_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt, *old;
> +	int err;
> +
> +	if (args->size)
> +		return -EINVAL;
> +
> +	if (upper_32_bits(args->value))
> +		return -EINVAL;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_find(&file_priv->vm_idr, args->value);
> +	if (ppgtt) {
> +		GEM_BUG_ON(ppgtt->user_handle != args->value);
> +		i915_ppgtt_get(ppgtt);
> +	}
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (err)
> +		goto out;
> +
> +	if (ppgtt == ctx->ppgtt)
> +		goto unlock;
> +
> +	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
> +	lut_close(ctx);
> +
> +	old = __set_ppgtt(ctx, ppgtt);
> +
> +	/*
> +	 * We need to flush any requests using the current ppgtt before
> +	 * we release it as the requests do not hold a reference themselves,
> +	 * only indirectly through the context.
> +	 */
> +	err = context_barrier_task(ctx, -1, set_ppgtt_barrier, old);

s/-1/ALL_ENGINES/ ?

> +	if (err) {
> +		ctx->ppgtt = old;
> +		ctx->desc_template = default_desc_template(ctx->i915, old);
> +
> +		i915_ppgtt_close(ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +
> +unlock:
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +out:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
>   static bool client_is_banned(struct drm_i915_file_private *file_priv)
>   {
>   	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> @@ -1009,6 +1244,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = get_sseu(ctx, args);
>   		break;
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = get_ppgtt(ctx, args);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -1306,9 +1544,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		return -ENOENT;
>   
>   	switch (args->param) {
> -	case I915_CONTEXT_PARAM_BAN_PERIOD:
> -		ret = -EINVAL;
> -		break;
>   	case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1364,9 +1599,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   					I915_USER_PRIORITY(priority);
>   		}
>   		break;
> +
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = set_sseu(ctx, args);
>   		break;
> +
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = set_ppgtt(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index df48013b581e..97cf9d3d07ae 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -384,6 +384,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
>   struct i915_gem_context *
>   i915_gem_context_create_gvt(struct drm_device *dev);
>   
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file);
> +
>   int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   				  struct drm_file *file);
>   int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 99022738cc89..d3f80dbf13ef 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2103,10 +2103,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
>   	return ppgtt;
>   }
>   
> -void i915_ppgtt_close(struct i915_address_space *vm)
> +void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
>   {
> -	GEM_BUG_ON(vm->closed);
> -	vm->closed = true;
> +	GEM_BUG_ON(ppgtt->vm.closed);
> +
> +	ppgtt->open_count++;
> +}
> +
> +void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
> +{
> +	GEM_BUG_ON(!ppgtt->open_count);
> +	if (--ppgtt->open_count)
> +		return;
> +
> +	GEM_BUG_ON(ppgtt->vm.closed);
> +	ppgtt->vm.closed = true;
>   }
>   
>   static void ppgtt_destroy_vma(struct i915_address_space *vm)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 03ade71b8d9a..bb750318f52a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
>   	struct kref ref;
>   
>   	unsigned long pd_dirty_rings;
> +	unsigned int open_count;
> +
>   	union {
>   		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
>   		struct i915_page_directory_pointer pdp;	/* GEN8+ */
>   		struct i915_page_directory pd;		/* GEN6-7 */
>   	};
> +
> +	u32 user_handle;
>   };
>   
>   struct gen6_hw_ppgtt {
> @@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
>   void i915_ppgtt_release(struct kref *kref);
>   struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
>   					struct drm_i915_file_private *fpriv);
> -void i915_ppgtt_close(struct i915_address_space *vm);
> -static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
> +
> +void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
> +void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
> +
> +static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
>   {
> -	if (ppgtt)
> -		kref_get(&ppgtt->ref);
> +	kref_get(&ppgtt->ref);
> +	return ppgtt;
>   }
> +
>   static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
>   {
>   	if (ppgtt)
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index 4b9ded4ca0f5..c51aa09668cc 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
>   	err = i915_subtests(tests, ppgtt);
>   
>   out_close:
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   
>   out_unlock:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 4f7c04247354..9133afc03135 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
>   	return 0;
>   }
>   
> -static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
> +static noinline int cpu_check(struct drm_i915_gem_object *obj,
> +			      unsigned int idx, unsigned int max)
>   {
>   	unsigned int n, m, needs_flush;
>   	int err;
> @@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (m = 0; m < max; m++) {
>   			if (map[m] != m) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], m);
> +				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
> +				       __builtin_return_address(0), idx,
> +				       n, real_page_count(obj), m, max,
> +				       map[m], m);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (; m < DW_PER_PAGE; m++) {
>   			if (map[m] != STACK_MAGIC) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], STACK_MAGIC);
> +				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
> +				       __builtin_return_address(0), idx, n, m,
> +				       map[m], STACK_MAGIC);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
>   static int igt_ctx_exec(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> -	struct drm_i915_gem_object *obj = NULL;
> -	unsigned long ncontexts, ndwords, dw;
> -	struct igt_live_test t;
> -	struct drm_file *file;
> -	IGT_TIMEOUT(end_time);
> -	LIST_HEAD(objects);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
>   	int err = -ENODEV;
>   
>   	/*
> @@ -495,38 +495,42 @@ static int igt_ctx_exec(void *arg)
>   	if (!DRIVER_CAPS(i915)->has_logical_contexts)
>   		return 0;
>   
> -	file = mock_file(i915);
> -	if (IS_ERR(file))
> -		return PTR_ERR(file);
> +	for_each_engine(engine, i915, id) {
> +		struct drm_i915_gem_object *obj = NULL;
> +		unsigned long ncontexts, ndwords, dw;
> +		struct igt_live_test t;
> +		struct drm_file *file;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
>   
> -	mutex_lock(&i915->drm.struct_mutex);
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
>   
> -	err = igt_live_test_begin(&t, i915, __func__, "");
> -	if (err)
> -		goto out_unlock;
> +		if (!engine->context_size)
> +			continue; /* No logical context support in HW */
>   
> -	ncontexts = 0;
> -	ndwords = 0;
> -	dw = 0;
> -	while (!time_after(jiffies, end_time)) {
> -		struct intel_engine_cs *engine;
> -		struct i915_gem_context *ctx;
> -		unsigned int id;
> +		file = mock_file(i915);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
>   
> -		ctx = i915_gem_create_context(i915, file->driver_priv);
> -		if (IS_ERR(ctx)) {
> -			err = PTR_ERR(ctx);
> +		mutex_lock(&i915->drm.struct_mutex);
> +
> +		err = igt_live_test_begin(&t, i915, __func__, engine->name);
> +		if (err)
>   			goto out_unlock;
> -		}
>   
> -		for_each_engine(engine, i915, id) {
> +		ncontexts = 0;
> +		ndwords = 0;
> +		dw = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			struct i915_gem_context *ctx;
>   			intel_wakeref_t wakeref;
>   
> -			if (!engine->context_size)
> -				continue; /* No logical context support in HW */
> -
> -			if (!intel_engine_can_store_dword(engine))
> -				continue;
> +			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
>   
>   			if (!obj) {
>   				obj = create_test_object(ctx, file, &objects);
> @@ -536,7 +540,6 @@ static int igt_ctx_exec(void *arg)
>   				}
>   			}
>   
> -			err = 0;
>   			with_intel_runtime_pm(i915, wakeref)
>   				err = gpu_fill(obj, ctx, engine, dw);
>   			if (err) {
> @@ -551,32 +554,158 @@ static int igt_ctx_exec(void *arg)
>   				obj = NULL;
>   				dw = 0;
>   			}
> +
>   			ndwords++;
> +			ncontexts++;
>   		}
> -		ncontexts++;
> +
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
> +
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				break;
> +
> +			dw += rem;
> +		}
> +
> +out_unlock:
> +		if (igt_live_test_end(&t))
> +			err = -EIO;
> +		mutex_unlock(&i915->drm.struct_mutex);
> +
> +		mock_file_free(i915, file);
> +		if (err)
> +			return err;
>   	}
> -	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
> -		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
>   
> -	dw = 0;
> -	list_for_each_entry(obj, &objects, st_link) {
> -		unsigned int rem =
> -			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +	return 0;
> +}
> +
> +static int igt_shared_ctx_exec(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	int err = -ENODEV;
> +
> +	/*
> +	 * Create a few different contexts with the same mm and write
> +	 * through each ctx using the GPU making sure those writes end
> +	 * up in the expected pages of our obj.
> +	 */
> +
> +	for_each_engine(engine, i915, id) {
> +		unsigned long ncontexts, ndwords, dw;
> +		struct drm_i915_gem_object *obj = NULL;
> +		struct i915_gem_context *ctx = NULL;
> +		struct i915_gem_context *parent;
> +		struct igt_live_test t;
> +		struct drm_file *file;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
>   
> -		err = cpu_check(obj, rem);
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
> +
> +		file = mock_file(i915);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
> +
> +		mutex_lock(&i915->drm.struct_mutex);
> +
> +		err = igt_live_test_begin(&t, i915, __func__, engine->name);
>   		if (err)
> -			break;
> +			goto out_unlock;
>   
> -		dw += rem;
> -	}
> +		parent = i915_gem_create_context(i915, file->driver_priv);
> +		if (IS_ERR(parent)) {
> +			err = PTR_ERR(parent);
> +			if (err == -ENODEV) /* no logical ctx support */

Check it earlier like igt_ctx_exec?

> +				err = 0;
> +			goto out_unlock;
> +		}
> +
> +		if (!parent->ppgtt) {
> +			err = 0;
> +			goto out_unlock;
> +		}
> +
> +		ncontexts = 0;
> +		ndwords = 0;
> +		dw = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			intel_wakeref_t wakeref;
> +
> +			if (ctx)
> +				__destroy_hw_context(ctx, file->driver_priv);
> +
> +			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
> +
> +			__assign_ppgtt(ctx, parent->ppgtt);
> +
> +			if (!obj) {
> +				obj = create_test_object(parent, file, &objects);
> +				if (IS_ERR(obj)) {
> +					err = PTR_ERR(obj);
> +					goto out_unlock;
> +				}
> +			}
> +
> +			err = 0;
> +			with_intel_runtime_pm(i915, wakeref)
> +				err = gpu_fill(obj, ctx, engine, dw);
> +			if (err) {
> +				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
> +				       ndwords, dw, max_dwords(obj),
> +				       engine->name, ctx->hw_id,
> +				       yesno(!!ctx->ppgtt), err);
> +				goto out_unlock;
> +			}
> +
> +			if (++dw == max_dwords(obj)) {
> +				obj = NULL;
> +				dw = 0;
> +			}
> +
> +			ndwords++;
> +			ncontexts++;
> +		}
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
> +
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				break;
> +
> +			dw += rem;
> +		}
>   
>   out_unlock:
> -	if (igt_live_test_end(&t))
> -		err = -EIO;
> -	mutex_unlock(&i915->drm.struct_mutex);
> +		if (igt_live_test_end(&t))
> +			err = -EIO;
> +		mutex_unlock(&i915->drm.struct_mutex);
>   
> -	mock_file_free(i915, file);
> -	return err;
> +		mock_file_free(i915, file);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
>   }

Is my eyeballing igt_ctx_exec and igt_ctx_shared_exec correct in 
noticing the tests are basically the same apart from ppgtt sharing bit? 
Could you consolidate and use test flags to control whether or not to share?

>   
>   static struct i915_vma *rpcs_query_batch(struct i915_vma *vma)
> @@ -1048,7 +1177,7 @@ static int igt_ctx_readonly(void *arg)
>   	struct drm_i915_gem_object *obj = NULL;
>   	struct i915_gem_context *ctx;
>   	struct i915_hw_ppgtt *ppgtt;
> -	unsigned long ndwords, dw;
> +	unsigned long idx, ndwords, dw;
>   	struct igt_live_test t;
>   	struct drm_file *file;
>   	I915_RND_STATE(prng);
> @@ -1129,6 +1258,7 @@ static int igt_ctx_readonly(void *arg)
>   		ndwords, RUNTIME_INFO(i915)->num_engines);
>   
>   	dw = 0;
> +	idx = 0;
>   	list_for_each_entry(obj, &objects, st_link) {
>   		unsigned int rem =
>   			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> @@ -1138,7 +1268,7 @@ static int igt_ctx_readonly(void *arg)
>   		if (i915_gem_object_is_readonly(obj))
>   			num_writes = 0;
>   
> -		err = cpu_check(obj, num_writes);
> +		err = cpu_check(obj, idx++, num_writes);
>   		if (err)
>   			break;
>   
> @@ -1723,6 +1853,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
>   		SUBTEST(igt_ctx_exec),
>   		SUBTEST(igt_ctx_readonly),
>   		SUBTEST(igt_ctx_sseu),
> +		SUBTEST(igt_shared_ctx_exec),
>   		SUBTEST(igt_vm_isolation),
>   	};
>   
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 826fd51c331e..57b3d9867070 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>   
>   	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
>   
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   out_unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 353b37b9f78e..5eddf9fcfe8a 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -55,13 +55,17 @@ mock_context(struct drm_i915_private *i915,
>   		goto err_handles;
>   
>   	if (name) {
> +		struct i915_hw_ppgtt *ppgtt;
> +
>   		ctx->name = kstrdup(name, GFP_KERNEL);
>   		if (!ctx->name)
>   			goto err_put;
>   
> -		ctx->ppgtt = mock_ppgtt(i915, name);
> -		if (!ctx->ppgtt)
> +		ppgtt = mock_ppgtt(i915, name);
> +		if (!ppgtt)
>   			goto err_put;
> +
> +		__set_ppgtt(ctx, ppgtt);
>   	}
>   
>   	return ctx;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index f4fa0825722a..9fcfb54a13f2 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -341,6 +341,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_I915_PERF_ADD_CONFIG	0x37
>   #define DRM_I915_PERF_REMOVE_CONFIG	0x38
>   #define DRM_I915_QUERY			0x39
> +#define DRM_I915_GEM_VM_CREATE		0x3a
> +#define DRM_I915_GEM_VM_DESTROY		0x3b
>   /* Must be kept compact -- no holes */
>   
>   #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
> @@ -400,6 +402,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
>   #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
>   #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
> +#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
>   
>   /* Allow drivers to submit batchbuffers directly to hardware, relying
>    * on the security mechanisms provided by hardware.
> @@ -1449,6 +1453,26 @@ struct drm_i915_gem_context_destroy {
>   	__u32 pad;
>   };
>   
> +/*
> + * DRM_I915_GEM_VM_CREATE -
> + *
> + * Create a new virtual memory address space (ppGTT) for use within a context
> + * on the same file. Extensions can be provided to configure exactly how the
> + * address space is setup upon creation.
> + *
> + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> + * returned.
> + *
> + * DRM_I915_GEM_VM_DESTROY -
> + *
> + * Destroys a previously created VM id.
> + */
> +struct drm_i915_gem_vm_control {
> +	__u64 extensions;
> +	__u32 flags;
> +	__u32 id;
> +};
> +
>   struct drm_i915_reg_read {
>   	/*
>   	 * Register offset.
> @@ -1538,7 +1562,19 @@ struct drm_i915_gem_context_param {
>    * On creation, all new contexts are marked as recoverable.
>    */
>   #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
> +
> +	/*
> +	 * The id of the associated virtual memory address space (ppGTT) of
> +	 * this context. Can be retrieved and passed to another context
> +	 * (on the same fd) for both to use the same ppGTT and so share
> +	 * address layouts, and avoid reloading the page tables on context
> +	 * switches between themselves.
> +	 *
> +	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
> +	 */
> +#define I915_CONTEXT_PARAM_VM		0x9
>   /* Must be kept compact -- no holes and well documented */
> +
>   	__u64 value;
>   };
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 37/38] drm/i915/selftests: Check preemption support on each engine
  2019-03-01 14:04 ` [PATCH 37/38] drm/i915/selftests: Check preemption support on each engine Chris Wilson
@ 2019-03-06 11:29   ` Tvrtko Ursulin
  0 siblings, 0 replies; 88+ messages in thread
From: Tvrtko Ursulin @ 2019-03-06 11:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 01/03/2019 14:04, Chris Wilson wrote:
> Check that we have setup on preemption for the engine before testing,
> instead warn if it is not enabled on supported HW.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 18 ++++++++++++++++++
>   1 file changed, 18 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index b6329280a59f..a7de7a8fc24a 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -90,6 +90,9 @@ static int live_preempt(void *arg)
>   	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
>   		return 0;
>   
> +	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_PREEMPTION))
> +		pr_err("Logical preemption supported, but not exposed\n");
> +
>   	mutex_lock(&i915->drm.struct_mutex);
>   	wakeref = intel_runtime_pm_get(i915);
>   
> @@ -114,6 +117,9 @@ static int live_preempt(void *arg)
>   	for_each_engine(engine, i915, id) {
>   		struct i915_request *rq;
>   
> +		if (!intel_engine_has_preemption(engine))
> +			continue;
> +
>   		rq = igt_spinner_create_request(&spin_lo, ctx_lo, engine,
>   						MI_ARB_CHECK);
>   		if (IS_ERR(rq)) {
> @@ -205,6 +211,9 @@ static int live_late_preempt(void *arg)
>   	for_each_engine(engine, i915, id) {
>   		struct i915_request *rq;
>   
> +		if (!intel_engine_has_preemption(engine))
> +			continue;
> +
>   		rq = igt_spinner_create_request(&spin_lo, ctx_lo, engine,
>   						MI_ARB_CHECK);
>   		if (IS_ERR(rq)) {
> @@ -337,6 +346,9 @@ static int live_suppress_self_preempt(void *arg)
>   		struct i915_request *rq_a, *rq_b;
>   		int depth;
>   
> +		if (!intel_engine_has_preemption(engine))
> +			continue;
> +
>   		engine->execlists.preempt_hang.count = 0;
>   
>   		rq_a = igt_spinner_create_request(&a.spin,
> @@ -483,6 +495,9 @@ static int live_suppress_wait_preempt(void *arg)
>   	for_each_engine(engine, i915, id) {
>   		int depth;
>   
> +		if (!intel_engine_has_preemption(engine))
> +			continue;
> +
>   		if (!engine->emit_init_breadcrumb)
>   			continue;
>   
> @@ -604,6 +619,9 @@ static int live_chain_preempt(void *arg)
>   		};
>   		int count, i;
>   
> +		if (!intel_engine_has_preemption(engine))
> +			continue;
> +
>   		for_each_prime_number_from(count, 1, 32) { /* must fit ring! */
>   			struct i915_request *rq;
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

* Re: [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-06 11:27   ` Tvrtko Ursulin
@ 2019-03-06 11:36     ` Chris Wilson
  0 siblings, 0 replies; 88+ messages in thread
From: Chris Wilson @ 2019-03-06 11:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-06 11:27:37)
> 
> On 01/03/2019 14:03, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 91926a407548..8c35b6019f0d 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -124,6 +124,8 @@ static void lut_close(struct i915_gem_context *ctx)
> >               struct i915_vma *vma = rcu_dereference_raw(*slot);
> >   
> >               radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
> > +
> > +             vma->open_count--;
> 
> Assuming none of the new code in this patch gets called, then this 
> change looks like standalone? What am I missing?

It's only required because of the new code. Previously it was always 1
for ppgtt.

> > +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> > +                           struct drm_file *file)
> > +{
> > +     struct drm_i915_file_private *file_priv = file->driver_priv;
> > +     struct drm_i915_gem_vm_control *args = data;
> > +     struct i915_hw_ppgtt *ppgtt;
> > +     int err;
> > +     u32 id;
> > +
> 
> Do we want an -ENODEV check here as well?

I don't think so; you can only get here with a garbage id in that case
and ENOENT for feeding in an unknown id makes sense.
> 
> > +     if (args->flags)
> > +             return -EINVAL;
> > +
> > +     if (args->extensions)
> > +             return -EINVAL;
> > +
> > +     id = args->id;
> > +     if (!id)
> > +             return -ENOENT;
> > +
> > +     err = mutex_lock_interruptible(&file_priv->vm_lock);
> > +     if (err)
> > +             return err;
> > +
> > +     ppgtt = idr_remove(&file_priv->vm_idr, id);
> > +     if (ppgtt) {
> > +             GEM_BUG_ON(!ppgtt->user_handle);
> > +             ppgtt->user_handle = 0;
> > +     }
> > +
> > +     mutex_unlock(&file_priv->vm_lock);
> 
> Nit - move to just after idr_remove for symmetry with create?

We need to lock user_handle = 0, I think.

> > +     /*
> > +      * We need to flush any requests using the current ppgtt before
> > +      * we release it as the requests do not hold a reference themselves,
> > +      * only indirectly through the context.
> > +      */
> > +     err = context_barrier_task(ctx, -1, set_ppgtt_barrier, old);
> 
> s/-1/ALL_ENGINES/ ?

Grr, magic macros. It's not even an intel_engine_mask_t yet :-p

> > +             parent = i915_gem_create_context(i915, file->driver_priv);
> > +             if (IS_ERR(parent)) {
> > +                     err = PTR_ERR(parent);
> > +                     if (err == -ENODEV) /* no logical ctx support */
> 
> Check it earlier like igt_ctx_exec?

Possibly, I did have a reason but I can't recall what.

> > -     mock_file_free(i915, file);
> > -     return err;
> > +             mock_file_free(i915, file);
> > +             if (err)
> > +                     return err;
> > +     }
> > +
> > +     return 0;
> >   }
> 
> Is my eyeballing igt_ctx_exec and igt_ctx_shared_exec correct in 
> noticing the tests are basically the same apart from ppgtt sharing bit? 
> Could you consolidate and use test flags to control whether or not to share?

That takes effort! Memory says that the current tests shared the guts
and only the setup was custom.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 88+ messages in thread

end of thread, other threads:[~2019-03-06 11:36 UTC | newest]

Thread overview: 88+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-01 14:03 [PATCH 01/38] drm/i915/execlists: Suppress redundant preemption Chris Wilson
2019-03-01 14:03 ` [PATCH 02/38] drm/i915: Introduce i915_timeline.mutex Chris Wilson
2019-03-01 15:09   ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 03/38] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
2019-03-01 14:03 ` [PATCH 04/38] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
2019-03-01 15:11   ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 05/38] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
2019-03-01 15:12   ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 06/38] drm/i915/selftests: Check that whitelisted registers are accessible Chris Wilson
2019-03-01 15:18   ` Michał Winiarski
2019-03-01 15:25     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 07/38] drm/i915: Force GPU idle on suspend Chris Wilson
2019-03-01 14:03 ` [PATCH 08/38] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
2019-03-01 14:03 ` [PATCH 09/38] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
2019-03-01 14:03 ` [PATCH 10/38] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
2019-03-01 15:25   ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 11/38] drm/i915: Refactor common code to load initial power context Chris Wilson
2019-03-01 14:03 ` [PATCH 12/38] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
2019-03-01 14:03 ` [PATCH 13/38] drm/i915: Remove has-kernel-context Chris Wilson
2019-03-01 14:03 ` [PATCH 14/38] drm/i915: Introduce the i915_user_extension_method Chris Wilson
2019-03-01 15:39   ` Tvrtko Ursulin
2019-03-01 18:57     ` Chris Wilson
2019-03-04  8:54       ` Tvrtko Ursulin
2019-03-04  9:04         ` Chris Wilson
2019-03-04  9:35           ` Tvrtko Ursulin
2019-03-04  9:45             ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 15/38] drm/i915: Track active engines within a context Chris Wilson
2019-03-01 15:46   ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 16/38] drm/i915: Introduce a context barrier callback Chris Wilson
2019-03-01 16:12   ` Tvrtko Ursulin
2019-03-01 19:03     ` Chris Wilson
2019-03-04  8:55       ` Tvrtko Ursulin
2019-03-02 10:01     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 17/38] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
2019-03-06 11:27   ` Tvrtko Ursulin
2019-03-06 11:36     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 18/38] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
2019-03-01 16:36   ` Tvrtko Ursulin
2019-03-01 19:10     ` Chris Wilson
2019-03-04  8:57       ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 19/38] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
2019-03-05 15:54   ` Tvrtko Ursulin
2019-03-05 16:26     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 20/38] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
2019-03-01 14:03 ` [PATCH 21/38] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
2019-03-01 15:29   ` Tvrtko Ursulin
2019-03-01 14:03 ` [PATCH 22/38] drm/i915: Remove last traces of exec-id (GEM_BUSY) Chris Wilson
2019-03-01 14:03 ` [PATCH 23/38] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
2019-03-01 15:33   ` Tvrtko Ursulin
2019-03-01 19:11     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 24/38] drm/i915: Allow a context to define its set of engines Chris Wilson
2019-03-01 14:03 ` [PATCH 25/38] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
2019-03-01 14:03 ` [PATCH 26/38] drm/i915: Pass around the intel_context Chris Wilson
2019-03-05 16:16   ` Tvrtko Ursulin
2019-03-05 16:33     ` Chris Wilson
2019-03-05 19:23       ` Chris Wilson
2019-03-01 14:03 ` [PATCH 27/38] drm/i915: Split struct intel_context definition to its own header Chris Wilson
2019-03-05 16:19   ` Tvrtko Ursulin
2019-03-05 16:35     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 28/38] drm/i915: Store the intel_context_ops in the intel_engine_cs Chris Wilson
2019-03-05 16:27   ` Tvrtko Ursulin
2019-03-05 16:45     ` Chris Wilson
2019-03-05 18:27       ` Chris Wilson
2019-03-01 14:03 ` [PATCH 29/38] drm/i915: Move over to intel_context_lookup() Chris Wilson
2019-03-05 17:01   ` Tvrtko Ursulin
2019-03-05 17:10     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 30/38] drm/i915: Make context pinning part of intel_context_ops Chris Wilson
2019-03-05 17:31   ` Tvrtko Ursulin
2019-03-05 18:00     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 31/38] drm/i915: Track the pinned kernel contexts on each engine Chris Wilson
2019-03-05 18:07   ` Tvrtko Ursulin
2019-03-05 18:10     ` Chris Wilson
2019-03-05 18:17       ` Tvrtko Ursulin
2019-03-05 19:26         ` Chris Wilson
2019-03-01 14:03 ` [PATCH 32/38] drm/i915: Introduce intel_context.pin_mutex for pin management Chris Wilson
2019-03-06  9:45   ` Tvrtko Ursulin
2019-03-06 10:15     ` Chris Wilson
2019-03-01 14:03 ` [PATCH 33/38] drm/i915: Load balancing across a virtual engine Chris Wilson
2019-03-01 14:04 ` [PATCH 34/38] drm/i915: Extend execution fence to support a callback Chris Wilson
2019-03-01 14:04 ` [PATCH 35/38] drm/i915/execlists: Virtual engine bonding Chris Wilson
2019-03-01 14:04 ` [PATCH 36/38] drm/i915: Allow specification of parallel execbuf Chris Wilson
2019-03-01 14:04 ` [PATCH 37/38] drm/i915/selftests: Check preemption support on each engine Chris Wilson
2019-03-06 11:29   ` Tvrtko Ursulin
2019-03-01 14:04 ` [PATCH 38/38] drm/i915/execlists: Skip direct submission if only lite-restore Chris Wilson
2019-03-01 14:15 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/38] drm/i915/execlists: Suppress redundant preemption Patchwork
2019-03-01 14:32 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-03-01 14:36 ` ✓ Fi.CI.BAT: success " Patchwork
2019-03-01 18:03 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.