The road to load balancing

All of lore.kernel.org
 help / color / mirror / Atom feed

* The road to load balancing
@ 2019-02-06 13:03 Chris Wilson
  2019-02-06 13:03 ` [PATCH 01/46] drm/i915: Hack and slash, throttle execbuffer hogs Chris Wilson
                   ` (52 more replies)
  0 siblings, 53 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

The target for the last several months has been to get to the ability to
submit a request and have it run on the first idle engine. Currently
userspace has to either use the ancient BSD round-robin interface, or
more intelligently distribute its workload by querying the busy status
of the system and predicting its own requirements. This series replaces
that with a late greedy dispatch over a predefined set of engines.
-chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH 01/46] drm/i915: Hack and slash, throttle execbuffer hogs
  2019-02-06 13:03 The road to load balancing Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
                   ` (51 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Apply backpressure to hogs that emit requests faster than the GPU can
process them by waiting for their ring to be less than half-full before
proceeding with taking the struct_mutex.

This is a gross hack to apply throttling backpressure, the long term
goal is to remove the struct_mutex contention so that each client
naturally waits, preferably in an asynchronous, nonblocking fashion
(pipelined operations for the win), for their own resources and never
blocks another client within the driver at least. (Realtime priority
goals would extend to ensuring that resource contention favours high
priority clients as well.)

This patch only limits excessive request production and does not attempt
to throttle clients that block waiting for eviction (either global GTT or
system memory) or any other global resources, see above for the long term
goal.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 63 ++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c    | 13 -----
 drivers/gpu/drm/i915/intel_ringbuffer.h    | 12 +++++
 3 files changed, 75 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 8eedf7cac493..84ef3abc567e 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -753,6 +753,64 @@ static int eb_select_context(struct i915_execbuffer *eb)
 	return 0;
 }
 
+static struct i915_request *__eb_wait_for_ring(struct intel_ring *ring)
+{
+	struct i915_request *rq;
+
+	if (intel_ring_update_space(ring) >= PAGE_SIZE)
+		return NULL;
+
+	/*
+	 * Find a request that after waiting upon, there will be at least half
+	 * the ring available. The hystersis allows us to compete for the
+	 * shared ring and should mean that we sleep less often prior to
+	 * claiming our resources, but not so long that the ring completely
+	 * drains before we can submit our next request.
+	 */
+	list_for_each_entry(rq, &ring->request_list, ring_link) {
+		if (__intel_ring_space(rq->postfix,
+				       ring->emit, ring->size) > ring->size / 2)
+			break;
+	}
+	if (&rq->ring_link == &ring->request_list)
+		return NULL; /* weird, we will check again later for real */
+
+	return i915_request_get(rq);
+}
+
+static int eb_wait_for_ring(const struct i915_execbuffer *eb)
+{
+	const struct intel_context *ce;
+	struct i915_request *rq;
+	int ret = 0;
+
+	/*
+	 * Apply a light amount of backpressure to prevent excessive hogs
+	 * from blocking waiting for space whilst holding struct_mutex and
+	 * keeping all of their resources pinned.
+	 */
+
+	ce = to_intel_context(eb->ctx, eb->engine);
+	if (!ce->ring) /* first use, assume empty! */
+		return 0;
+
+	rq = __eb_wait_for_ring(ce->ring);
+	if (rq) {
+		mutex_unlock(&eb->i915->drm.struct_mutex);
+
+		if (i915_request_wait(rq,
+				      I915_WAIT_INTERRUPTIBLE,
+				      MAX_SCHEDULE_TIMEOUT) < 0)
+			ret = -EINTR;
+
+		i915_request_put(rq);
+
+		mutex_lock(&eb->i915->drm.struct_mutex);
+	}
+
+	return ret;
+}
+
 static int eb_lookup_vmas(struct i915_execbuffer *eb)
 {
 	struct radix_tree_root *handles_vma = &eb->ctx->handles_vma;
@@ -2291,6 +2349,10 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (err)
 		goto err_rpm;
 
+	err = eb_wait_for_ring(&eb); /* may temporarily drop struct_mutex */
+	if (unlikely(err))
+		goto err_unlock;
+
 	err = eb_relocate(&eb);
 	if (err) {
 		/*
@@ -2435,6 +2497,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_vma:
 	if (eb.exec)
 		eb_release_vmas(&eb);
+err_unlock:
 	mutex_unlock(&dev->struct_mutex);
 err_rpm:
 	intel_runtime_pm_put(eb.i915, wakeref);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index b889b27f8aeb..7f841dba87b3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -49,19 +49,6 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 		I915_GEM_HWS_INDEX_ADDR);
 }
 
-static unsigned int __intel_ring_space(unsigned int head,
-				       unsigned int tail,
-				       unsigned int size)
-{
-	/*
-	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
-	 * same cacheline, the Head Pointer must not be greater than the Tail
-	 * Pointer."
-	 */
-	GEM_BUG_ON(!is_power_of_2(size));
-	return (head - tail - CACHELINE_BYTES) & (size - 1);
-}
-
 unsigned int intel_ring_update_space(struct intel_ring *ring)
 {
 	unsigned int space;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 4d4ea6963a72..710ffb221775 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -832,6 +832,18 @@ intel_ring_set_tail(struct intel_ring *ring, unsigned int tail)
 	return tail;
 }
 
+static inline unsigned int
+__intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
+{
+	/*
+	 * "If the Ring Buffer Head Pointer and the Tail Pointer are on the
+	 * same cacheline, the Head Pointer must not be greater than the Tail
+	 * Pointer."
+	 */
+	GEM_BUG_ON(!is_power_of_2(size));
+	return (head - tail - CACHELINE_BYTES) & (size - 1);
+}
+
 void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno);
 
 int intel_engine_setup_common(struct intel_engine_cs *engine);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-06 13:03 The road to load balancing Chris Wilson
  2019-02-06 13:03 ` [PATCH 01/46] drm/i915: Hack and slash, throttle execbuffer hogs Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 15:56   ` Mika Kuoppala
  2019-02-26 19:53   ` Rodrigo Vivi
  2019-02-06 13:03 ` [PATCH 03/46] drm/i915: Force the GPU reset upon wedging Chris Wilson
                   ` (50 subsequent siblings)
  52 siblings, 2 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

Previously, we were able to rely on the recursive properties of
struct_mutex to allow us to serialise revoking mmaps and reacquiring the
FENCE registers with them being clobbered over a global device reset.
I then proceeded to throw out the baby with the bath water in order to
pursue a struct_mutex-less reset.

Perusing LWN for alternative strategies, the dilemma on how to serialise
access to a global resource on one side was answered by
https://lwn.net/Articles/202847/ -- Sleepable RCU:

    1  int readside(void) {
    2      int idx;
    3      rcu_read_lock();
    4	   if (nomoresrcu) {
    5          rcu_read_unlock();
    6	       return -EINVAL;
    7      }
    8	   idx = srcu_read_lock(&ss);
    9	   rcu_read_unlock();
    10	   /* SRCU read-side critical section. */
    11	   srcu_read_unlock(&ss, idx);
    12	   return 0;
    13 }
    14
    15 void cleanup(void)
    16 {
    17     nomoresrcu = 1;
    18     synchronize_rcu();
    19     synchronize_srcu(&ss);
    20     cleanup_srcu_struct(&ss);
    21 }

No more worrying about stop_machine, just an uber-complex mutex,
optimised for reads, with the overhead pushed to the rare reset path.

However, we do run the risk of a deadlock as we allocate underneath the
SRCU read lock, and the allocation may require a GPU reset, causing a
dependency cycle via the in-flight requests. We resolve that by declaring
the driver wedged and cancelling all in-flight rendering.

v2: Use expedited rcu barriers to match our earlier timing
characteristics.
v3: Try to annotate locking contexts for sparse
v4: Reduce selftest lock duration to avoid a reset deadlock with fences
v5: s/srcu/reset_backoff_srcu/

Testcase: igt/gem_mmap_gtt/hang
Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  12 +-
 drivers/gpu/drm/i915/i915_drv.h               |  18 +--
 drivers/gpu/drm/i915/i915_gem.c               |  56 +++------
 drivers/gpu/drm/i915/i915_gem_fence_reg.c     |  31 +----
 drivers/gpu/drm/i915/i915_gpu_error.h         |  12 +-
 drivers/gpu/drm/i915/i915_reset.c             | 109 +++++++++++-------
 drivers/gpu/drm/i915/i915_reset.h             |   4 +
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |   5 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |   1 +
 9 files changed, 109 insertions(+), 139 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 0bd890c04fe4..a6fd157b1637 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1281,14 +1281,11 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	intel_wakeref_t wakeref;
 	enum intel_engine_id id;
 
+	seq_printf(m, "Reset flags: %lx\n", dev_priv->gpu_error.flags);
 	if (test_bit(I915_WEDGED, &dev_priv->gpu_error.flags))
-		seq_puts(m, "Wedged\n");
+		seq_puts(m, "\tWedged\n");
 	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
-		seq_puts(m, "Reset in progress: struct_mutex backoff\n");
-	if (waitqueue_active(&dev_priv->gpu_error.wait_queue))
-		seq_puts(m, "Waiter holding struct mutex\n");
-	if (waitqueue_active(&dev_priv->gpu_error.reset_queue))
-		seq_puts(m, "struct_mutex blocked for reset\n");
+		seq_puts(m, "\tDevice (global) reset in progress\n");
 
 	if (!i915_modparams.enable_hangcheck) {
 		seq_puts(m, "Hangcheck disabled\n");
@@ -3885,9 +3882,6 @@ i915_wedged_set(void *data, u64 val)
 	 * while it is writing to 'i915_wedged'
 	 */
 
-	if (i915_reset_backoff(&i915->gpu_error))
-		return -EAGAIN;
-
 	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
 			  "Manually set wedged engine mask = %llx", val);
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a2293152cb6a..37230ae7fbe6 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2989,7 +2989,12 @@ i915_gem_obj_finish_shmem_access(struct drm_i915_gem_object *obj)
 	i915_gem_object_unpin_pages(obj);
 }
 
-int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
+static inline int __must_check
+i915_mutex_lock_interruptible(struct drm_device *dev)
+{
+	return mutex_lock_interruptible(&dev->struct_mutex);
+}
+
 int i915_gem_dumb_create(struct drm_file *file_priv,
 			 struct drm_device *dev,
 			 struct drm_mode_create_dumb *args);
@@ -3006,21 +3011,11 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
 struct i915_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine);
 
-static inline bool i915_reset_backoff(struct i915_gpu_error *error)
-{
-	return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags));
-}
-
 static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
 {
 	return unlikely(test_bit(I915_WEDGED, &error->flags));
 }
 
-static inline bool i915_reset_backoff_or_wedged(struct i915_gpu_error *error)
-{
-	return i915_reset_backoff(error) | i915_terminally_wedged(error);
-}
-
 static inline u32 i915_reset_count(struct i915_gpu_error *error)
 {
 	return READ_ONCE(error->reset_count);
@@ -3093,7 +3088,6 @@ struct drm_i915_fence_reg *
 i915_reserve_fence(struct drm_i915_private *dev_priv);
 void i915_unreserve_fence(struct drm_i915_fence_reg *fence);
 
-void i915_gem_revoke_fences(struct drm_i915_private *dev_priv);
 void i915_gem_restore_fences(struct drm_i915_private *dev_priv);
 
 void i915_gem_detect_bit_6_swizzle(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 05ce9176ac4e..1eb3a5f8654c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -100,47 +100,6 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
 	spin_unlock(&dev_priv->mm.object_stat_lock);
 }
 
-static int
-i915_gem_wait_for_error(struct i915_gpu_error *error)
-{
-	int ret;
-
-	might_sleep();
-
-	/*
-	 * Only wait 10 seconds for the gpu reset to complete to avoid hanging
-	 * userspace. If it takes that long something really bad is going on and
-	 * we should simply try to bail out and fail as gracefully as possible.
-	 */
-	ret = wait_event_interruptible_timeout(error->reset_queue,
-					       !i915_reset_backoff(error),
-					       I915_RESET_TIMEOUT);
-	if (ret == 0) {
-		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
-		return -EIO;
-	} else if (ret < 0) {
-		return ret;
-	} else {
-		return 0;
-	}
-}
-
-int i915_mutex_lock_interruptible(struct drm_device *dev)
-{
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	int ret;
-
-	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
-	if (ret)
-		return ret;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		return ret;
-
-	return 0;
-}
-
 static u32 __i915_gem_park(struct drm_i915_private *i915)
 {
 	intel_wakeref_t wakeref;
@@ -1869,6 +1828,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	intel_wakeref_t wakeref;
 	struct i915_vma *vma;
 	pgoff_t page_offset;
+	int srcu;
 	int ret;
 
 	/* Sanity check that we allow writing into this object */
@@ -1908,7 +1868,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		goto err_unlock;
 	}
 
-
 	/* Now pin it into the GTT as needed */
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
 				       PIN_MAPPABLE |
@@ -1946,9 +1905,15 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	if (ret)
 		goto err_unpin;
 
+	srcu = i915_reset_trylock(dev_priv);
+	if (srcu < 0) {
+		ret = srcu;
+		goto err_unpin;
+	}
+
 	ret = i915_vma_pin_fence(vma);
 	if (ret)
-		goto err_unpin;
+		goto err_reset;
 
 	/* Finally, remap it using the new GTT offset */
 	ret = remap_io_mapping(area,
@@ -1969,6 +1934,8 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 err_fence:
 	i915_vma_unpin_fence(vma);
+err_reset:
+	i915_reset_unlock(dev_priv, srcu);
 err_unpin:
 	__i915_vma_unpin(vma);
 err_unlock:
@@ -5326,6 +5293,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
 	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
 	mutex_init(&dev_priv->gpu_error.wedge_mutex);
+	init_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
 
 	atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
 
@@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
 	WARN_ON(dev_priv->mm.object_count);
 
+	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
+
 	kmem_cache_destroy(dev_priv->priorities);
 	kmem_cache_destroy(dev_priv->dependencies);
 	kmem_cache_destroy(dev_priv->requests);
diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
index e037e94792f3..36d548fa3aa2 100644
--- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
+++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
@@ -240,6 +240,10 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 		i915_vma_flush_writes(old);
 	}
 
+	ret = i915_reset_trylock(fence->i915);
+	if (ret < 0)
+		return ret;
+
 	if (fence->vma && fence->vma != vma) {
 		/* Ensure that all userspace CPU access is completed before
 		 * stealing the fence.
@@ -272,6 +276,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
 		list_move_tail(&fence->link, &fence->i915->mm.fence_list);
 	}
 
+	i915_reset_unlock(fence->i915, ret);
 	return 0;
 }
 
@@ -435,32 +440,6 @@ void i915_unreserve_fence(struct drm_i915_fence_reg *fence)
 	list_add(&fence->link, &fence->i915->mm.fence_list);
 }
 
-/**
- * i915_gem_revoke_fences - revoke fence state
- * @dev_priv: i915 device private
- *
- * Removes all GTT mmappings via the fence registers. This forces any user
- * of the fence to reacquire that fence before continuing with their access.
- * One use is during GPU reset where the fence register is lost and we need to
- * revoke concurrent userspace access via GTT mmaps until the hardware has been
- * reset and the fence registers have been restored.
- */
-void i915_gem_revoke_fences(struct drm_i915_private *dev_priv)
-{
-	int i;
-
-	lockdep_assert_held(&dev_priv->drm.struct_mutex);
-
-	for (i = 0; i < dev_priv->num_fence_regs; i++) {
-		struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
-
-		GEM_BUG_ON(fence->vma && fence->vma->fence != fence);
-
-		if (fence->vma)
-			i915_vma_revoke_mmap(fence->vma);
-	}
-}
-
 /**
  * i915_gem_restore_fences - restore fence state
  * @dev_priv: i915 device private
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 53b1f22dd365..d5c58e82508b 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -231,12 +231,10 @@ struct i915_gpu_error {
 	/**
 	 * flags: Control various stages of the GPU reset
 	 *
-	 * #I915_RESET_BACKOFF - When we start a reset, we want to stop any
-	 * other users acquiring the struct_mutex. To do this we set the
-	 * #I915_RESET_BACKOFF bit in the error flags when we detect a reset
-	 * and then check for that bit before acquiring the struct_mutex (in
-	 * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
-	 * secondary role in preventing two concurrent global reset attempts.
+	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
+	 * serialise with any other users attempting to do the same, and
+	 * any global resources that may be clobber by the reset (such as
+	 * FENCE registers).
 	 *
 	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
 	 * acquire the struct_mutex to reset an engine, we need an explicit
@@ -272,6 +270,8 @@ struct i915_gpu_error {
 	 */
 	wait_queue_head_t reset_queue;
 
+	struct srcu_struct reset_backoff_srcu;
+
 	struct i915_gpu_restart *restart;
 };
 
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 0e0ddf2e6815..272d00d4b8a3 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -639,6 +639,31 @@ static void reset_prepare_engine(struct intel_engine_cs *engine)
 	engine->reset.prepare(engine);
 }
 
+static void revoke_mmaps(struct drm_i915_private *i915)
+{
+	int i;
+
+	for (i = 0; i < i915->num_fence_regs; i++) {
+		struct i915_vma *vma = i915->fence_regs[i].vma;
+		struct drm_vma_offset_node *node;
+		u64 vma_offset;
+
+		if (!vma)
+			continue;
+
+		GEM_BUG_ON(vma->fence != &i915->fence_regs[i]);
+		if (!i915_vma_has_userfault(vma))
+			continue;
+
+		node = &vma->obj->base.vma_node;
+		vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT;
+		unmap_mapping_range(i915->drm.anon_inode->i_mapping,
+				    drm_vma_node_offset_addr(node) + vma_offset,
+				    vma->size,
+				    1);
+	}
+}
+
 static void reset_prepare(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
@@ -648,6 +673,7 @@ static void reset_prepare(struct drm_i915_private *i915)
 		reset_prepare_engine(engine);
 
 	intel_uc_sanitize(i915);
+	revoke_mmaps(i915);
 }
 
 static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
@@ -911,50 +937,22 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	return ret;
 }
 
-struct __i915_reset {
-	struct drm_i915_private *i915;
-	unsigned int stalled_mask;
-};
-
-static int __i915_reset__BKL(void *data)
-{
-	struct __i915_reset *arg = data;
-	int err;
-
-	err = intel_gpu_reset(arg->i915, ALL_ENGINES);
-	if (err)
-		return err;
-
-	return gt_reset(arg->i915, arg->stalled_mask);
-}
-
-#if RESET_UNDER_STOP_MACHINE
-/*
- * XXX An alternative to using stop_machine would be to park only the
- * processes that have a GGTT mmap. By remote parking the threads (SIGSTOP)
- * we should be able to prevent their memmory accesses via the lost fence
- * registers over the course of the reset without the potential recursive
- * of mutexes between the pagefault handler and reset.
- *
- * See igt/gem_mmap_gtt/hang
- */
-#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)
-#else
-#define __do_reset(fn, arg) fn(arg)
-#endif
-
 static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
 {
-	struct __i915_reset arg = { i915, stalled_mask };
 	int err, i;
 
-	err = __do_reset(__i915_reset__BKL, &arg);
+	/* Flush everyone currently using a resource about to be clobbered */
+	synchronize_srcu(&i915->gpu_error.reset_backoff_srcu);
+
+	err = intel_gpu_reset(i915, ALL_ENGINES);
 	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
-		msleep(100);
-		err = __do_reset(__i915_reset__BKL, &arg);
+		msleep(10 * (i + 1));
+		err = intel_gpu_reset(i915, ALL_ENGINES);
 	}
+	if (err)
+		return err;
 
-	return err;
+	return gt_reset(i915, stalled_mask);
 }
 
 /**
@@ -966,8 +964,6 @@ static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
  * Reset the chip.  Useful if a hang is detected. Marks the device as wedged
  * on failure.
  *
- * Caller must hold the struct_mutex.
- *
  * Procedure is fairly simple:
  *   - reset the chip using the reset reg
  *   - re-init context state
@@ -1274,9 +1270,12 @@ void i915_handle_error(struct drm_i915_private *i915,
 		wait_event(i915->gpu_error.reset_queue,
 			   !test_bit(I915_RESET_BACKOFF,
 				     &i915->gpu_error.flags));
-		goto out;
+		goto out; /* piggy-back on the other reset */
 	}
 
+	/* Make sure i915_reset_trylock() sees the I915_RESET_BACKOFF */
+	synchronize_rcu_expedited();
+
 	/* Prevent any other reset-engine attempt. */
 	for_each_engine(engine, i915, tmp) {
 		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
@@ -1300,6 +1299,36 @@ void i915_handle_error(struct drm_i915_private *i915,
 	intel_runtime_pm_put(i915, wakeref);
 }
 
+int i915_reset_trylock(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+	int srcu;
+
+	rcu_read_lock();
+	while (test_bit(I915_RESET_BACKOFF, &error->flags)) {
+		rcu_read_unlock();
+
+		if (wait_event_interruptible(error->reset_queue,
+					     !test_bit(I915_RESET_BACKOFF,
+						       &error->flags)))
+			return -EINTR;
+
+		rcu_read_lock();
+	}
+	srcu = srcu_read_lock(&error->reset_backoff_srcu);
+	rcu_read_unlock();
+
+	return srcu;
+}
+
+void i915_reset_unlock(struct drm_i915_private *i915, int tag)
+__releases(&i915->gpu_error.reset_backoff_srcu)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	srcu_read_unlock(&error->reset_backoff_srcu, tag);
+}
+
 bool i915_reset_flush(struct drm_i915_private *i915)
 {
 	int err;
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index f2d347f319df..893c5d1c2eb8 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -9,6 +9,7 @@
 
 #include <linux/compiler.h>
 #include <linux/types.h>
+#include <linux/srcu.h>
 
 struct drm_i915_private;
 struct intel_engine_cs;
@@ -32,6 +33,9 @@ int i915_reset_engine(struct intel_engine_cs *engine,
 void i915_reset_request(struct i915_request *rq, bool guilty);
 bool i915_reset_flush(struct drm_i915_private *i915);
 
+int __must_check i915_reset_trylock(struct drm_i915_private *i915);
+void i915_reset_unlock(struct drm_i915_private *i915, int tag);
+
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 7b6f3bea9ef8..4886fac12628 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1039,8 +1039,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 	/* Check that we can recover an unbind stuck on a hanging request */
 
-	igt_global_reset_lock(i915);
-
 	mutex_lock(&i915->drm.struct_mutex);
 	err = hang_init(&h, i915);
 	if (err)
@@ -1138,7 +1136,9 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 	}
 
 out_reset:
+	igt_global_reset_lock(i915);
 	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
+	igt_global_reset_unlock(i915);
 
 	if (tsk) {
 		struct igt_wedge_me w;
@@ -1159,7 +1159,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 	hang_fini(&h);
 unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
-	igt_global_reset_unlock(i915);
 
 	if (i915_terminally_wedged(&i915->gpu_error))
 		return -EIO;
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 14ae46fda49f..fc516a2970f4 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -189,6 +189,7 @@ struct drm_i915_private *mock_gem_device(void)
 
 	init_waitqueue_head(&i915->gpu_error.wait_queue);
 	init_waitqueue_head(&i915->gpu_error.reset_queue);
+	init_srcu_struct(&i915->gpu_error.reset_backoff_srcu);
 	mutex_init(&i915->gpu_error.wedge_mutex);
 
 	i915->wq = alloc_ordered_workqueue("mock", 0);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 03/46] drm/i915: Force the GPU reset upon wedging
  2019-02-06 13:03 The road to load balancing Chris Wilson
  2019-02-06 13:03 ` [PATCH 01/46] drm/i915: Hack and slash, throttle execbuffer hogs Chris Wilson
  2019-02-06 13:03 ` [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 04/46] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
                   ` (49 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Mika Kuoppala

When declaring the GPU wedged, we do need to hit the GPU with the reset
hammer so that its state matches our presumed state during cleanup. If
the reset fails, it fails, and we may be unhappy but wedged. However, if
we are testing our wedge/unwedged handling, the desync carries over into
the next test and promptly explodes.

References: https://bugs.freedesktop.org/show_bug.cgi?id=106702
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_reset.c | 8 ++++----
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 272d00d4b8a3..14970177ac11 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -532,9 +532,6 @@ typedef int (*reset_func)(struct drm_i915_private *,
 
 static reset_func intel_get_gpu_reset(struct drm_i915_private *i915)
 {
-	if (!i915_modparams.reset)
-		return NULL;
-
 	if (INTEL_GEN(i915) >= 8)
 		return gen8_reset_engines;
 	else if (INTEL_GEN(i915) >= 6)
@@ -599,6 +596,9 @@ bool intel_has_gpu_reset(struct drm_i915_private *i915)
 	if (USES_GUC(i915))
 		return false;
 
+	if (!i915_modparams.reset)
+		return NULL;
+
 	return intel_get_gpu_reset(i915);
 }
 
@@ -823,7 +823,7 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 		reset_prepare_engine(engine);
 
 	/* Even if the GPU reset fails, it should still stop the engines */
-	if (INTEL_GEN(i915) >= 5)
+	if (!INTEL_INFO(i915)->gpu_reset_clobbers_display)
 		intel_gpu_reset(i915, ALL_ENGINES);
 
 	for_each_engine(engine, i915, id) {
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 04/46] drm/i915: Uninterruptibly drain the timelines on unwedging
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (2 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 03/46] drm/i915: Force the GPU reset upon wedging Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 05/46] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
                   ` (48 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

On wedging, we mark all executing requests as complete and all pending
requests completed as soon as they are ready. Before unwedging though we
wish to flush those pending requests prior to restoring default
execution, and so we must wait. Do so interruptibly as we do not provide
the EINTR gracefully back to userspace in this case but persistent in
the permanently wedged start without restarting the syscall.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 14970177ac11..1d3bec0ff990 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -861,7 +861,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct i915_timeline *tl;
-	bool ret = false;
 
 	if (!test_bit(I915_WEDGED, &error->flags))
 		return true;
@@ -886,30 +885,20 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	mutex_lock(&i915->gt.timelines.mutex);
 	list_for_each_entry(tl, &i915->gt.timelines.active_list, link) {
 		struct i915_request *rq;
-		long timeout;
 
 		rq = i915_active_request_get_unlocked(&tl->last_request);
 		if (!rq)
 			continue;
 
 		/*
-		 * We can't use our normal waiter as we want to
-		 * avoid recursively trying to handle the current
-		 * reset. The basic dma_fence_default_wait() installs
-		 * a callback for dma_fence_signal(), which is
-		 * triggered by our nop handler (indirectly, the
-		 * callback enables the signaler thread which is
-		 * woken by the nop_submit_request() advancing the seqno
-		 * and when the seqno passes the fence, the signaler
-		 * then signals the fence waking us up).
+		 * All internal dependencies (i915_requests) will have
+		 * been flushed by the set-wedge, but we may be stuck waiting
+		 * for external fences. These should all be capped to 10s
+		 * (I915_FENCE_TIMEOUT) so this wait should not be unbounded
+		 * in the worst case.
 		 */
-		timeout = dma_fence_default_wait(&rq->fence, true,
-						 MAX_SCHEDULE_TIMEOUT);
+		dma_fence_default_wait(&rq->fence, false, MAX_SCHEDULE_TIMEOUT);
 		i915_request_put(rq);
-		if (timeout < 0) {
-			mutex_unlock(&i915->gt.timelines.mutex);
-			goto unlock;
-		}
 	}
 	mutex_unlock(&i915->gt.timelines.mutex);
 
@@ -930,11 +919,10 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 
 	smp_mb__before_atomic(); /* complete takeover before enabling execbuf */
 	clear_bit(I915_WEDGED, &i915->gpu_error.flags);
-	ret = true;
-unlock:
+
 	mutex_unlock(&i915->gpu_error.wedge_mutex);
 
-	return ret;
+	return true;
 }
 
 static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 05/46] drm/i915: Wait for old resets before applying debugfs/i915_wedged
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (3 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 04/46] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 06/46] drm/i915: Serialise resets with wedging Chris Wilson
                   ` (47 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Since we use the debugfs to recover the device after modifying the
i915.reset parameter, we need to be sure that we apply the reset and not
piggy-back onto a concurrent one in order for the parameter to take
effect.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index a6fd157b1637..8a488ffc8b7d 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -3874,13 +3874,9 @@ i915_wedged_set(void *data, u64 val)
 {
 	struct drm_i915_private *i915 = data;
 
-	/*
-	 * There is no safeguard against this debugfs entry colliding
-	 * with the hangcheck calling same i915_handle_error() in
-	 * parallel, causing an explosion. For now we assume that the
-	 * test harness is responsible enough not to inject gpu hangs
-	 * while it is writing to 'i915_wedged'
-	 */
+	/* Flush any previous reset before applying for a new one */
+	wait_event(i915->gpu_error.reset_queue,
+		   !test_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags));
 
 	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
 			  "Manually set wedged engine mask = %llx", val);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 06/46] drm/i915: Serialise resets with wedging
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (4 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 05/46] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 07/46] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
                   ` (46 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Prevent concurrent set-wedge with ongoing resets (and vice versa) by
taking the same wedge_mutex around both operations.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c | 68 ++++++++++++++++++-------------
 1 file changed, 40 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 1d3bec0ff990..b629f25a81f0 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -793,17 +793,14 @@ static void nop_submit_request(struct i915_request *request)
 	intel_engine_queue_breadcrumbs(engine);
 }
 
-void i915_gem_set_wedged(struct drm_i915_private *i915)
+static void __i915_gem_set_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 
-	mutex_lock(&error->wedge_mutex);
-	if (test_bit(I915_WEDGED, &error->flags)) {
-		mutex_unlock(&error->wedge_mutex);
+	if (test_bit(I915_WEDGED, &error->flags))
 		return;
-	}
 
 	if (GEM_SHOW_DEBUG() && !intel_engines_are_idle(i915)) {
 		struct drm_printer p = drm_debug_printer(__func__);
@@ -852,12 +849,18 @@ void i915_gem_set_wedged(struct drm_i915_private *i915)
 	set_bit(I915_WEDGED, &error->flags);
 
 	GEM_TRACE("end\n");
-	mutex_unlock(&error->wedge_mutex);
+}
 
-	wake_up_all(&error->reset_queue);
+void i915_gem_set_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+
+	mutex_lock(&error->wedge_mutex);
+	__i915_gem_set_wedged(i915);
+	mutex_unlock(&error->wedge_mutex);
 }
 
-bool i915_gem_unset_wedged(struct drm_i915_private *i915)
+static bool __i915_gem_unset_wedged(struct drm_i915_private *i915)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct i915_timeline *tl;
@@ -868,8 +871,6 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	if (!i915->gt.scratch) /* Never full initialised, recovery impossible */
 		return false;
 
-	mutex_lock(&error->wedge_mutex);
-
 	GEM_TRACE("start\n");
 
 	/*
@@ -920,11 +921,21 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	smp_mb__before_atomic(); /* complete takeover before enabling execbuf */
 	clear_bit(I915_WEDGED, &i915->gpu_error.flags);
 
-	mutex_unlock(&i915->gpu_error.wedge_mutex);
-
 	return true;
 }
 
+bool i915_gem_unset_wedged(struct drm_i915_private *i915)
+{
+	struct i915_gpu_error *error = &i915->gpu_error;
+	bool result;
+
+	mutex_lock(&error->wedge_mutex);
+	result = __i915_gem_unset_wedged(i915);
+	mutex_unlock(&error->wedge_mutex);
+
+	return result;
+}
+
 static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
 {
 	int err, i;
@@ -974,7 +985,7 @@ void i915_reset(struct drm_i915_private *i915,
 	GEM_BUG_ON(!test_bit(I915_RESET_BACKOFF, &error->flags));
 
 	/* Clear any previous failed attempts at recovery. Time to try again. */
-	if (!i915_gem_unset_wedged(i915))
+	if (!__i915_gem_unset_wedged(i915))
 		return;
 
 	if (reason)
@@ -1036,7 +1047,7 @@ void i915_reset(struct drm_i915_private *i915,
 	 */
 	add_taint(TAINT_WARN, LOCKDEP_STILL_OK);
 error:
-	i915_gem_set_wedged(i915);
+	__i915_gem_set_wedged(i915);
 	goto finish;
 }
 
@@ -1128,7 +1139,9 @@ static void i915_reset_device(struct drm_i915_private *i915,
 	i915_wedge_on_timeout(&w, i915, 5 * HZ) {
 		intel_prepare_reset(i915);
 
+		mutex_lock(&error->wedge_mutex);
 		i915_reset(i915, engine_mask, reason);
+		mutex_unlock(&error->wedge_mutex);
 
 		intel_finish_reset(i915);
 	}
@@ -1196,6 +1209,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 		       unsigned long flags,
 		       const char *fmt, ...)
 {
+	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
 	unsigned int tmp;
@@ -1232,20 +1246,19 @@ void i915_handle_error(struct drm_i915_private *i915,
 	 * Try engine reset when available. We fall back to full reset if
 	 * single reset fails.
 	 */
-	if (intel_has_reset_engine(i915) &&
-	    !i915_terminally_wedged(&i915->gpu_error)) {
+	if (intel_has_reset_engine(i915) && !i915_terminally_wedged(error)) {
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
 			BUILD_BUG_ON(I915_RESET_MODESET >= I915_RESET_ENGINE);
 			if (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					     &i915->gpu_error.flags))
+					     &error->flags))
 				continue;
 
 			if (i915_reset_engine(engine, msg) == 0)
 				engine_mask &= ~intel_engine_flag(engine);
 
 			clear_bit(I915_RESET_ENGINE + engine->id,
-				  &i915->gpu_error.flags);
-			wake_up_bit(&i915->gpu_error.flags,
+				  &error->flags);
+			wake_up_bit(&error->flags,
 				    I915_RESET_ENGINE + engine->id);
 		}
 	}
@@ -1254,10 +1267,9 @@ void i915_handle_error(struct drm_i915_private *i915,
 		goto out;
 
 	/* Full reset needs the mutex, stop any other user trying to do so. */
-	if (test_and_set_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags)) {
-		wait_event(i915->gpu_error.reset_queue,
-			   !test_bit(I915_RESET_BACKOFF,
-				     &i915->gpu_error.flags));
+	if (test_and_set_bit(I915_RESET_BACKOFF, &error->flags)) {
+		wait_event(error->reset_queue,
+			   !test_bit(I915_RESET_BACKOFF, &error->flags));
 		goto out; /* piggy-back on the other reset */
 	}
 
@@ -1267,8 +1279,8 @@ void i915_handle_error(struct drm_i915_private *i915,
 	/* Prevent any other reset-engine attempt. */
 	for_each_engine(engine, i915, tmp) {
 		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
-					&i915->gpu_error.flags))
-			wait_on_bit(&i915->gpu_error.flags,
+					&error->flags))
+			wait_on_bit(&error->flags,
 				    I915_RESET_ENGINE + engine->id,
 				    TASK_UNINTERRUPTIBLE);
 	}
@@ -1277,11 +1289,11 @@ void i915_handle_error(struct drm_i915_private *i915,
 
 	for_each_engine(engine, i915, tmp) {
 		clear_bit(I915_RESET_ENGINE + engine->id,
-			  &i915->gpu_error.flags);
+			  &error->flags);
 	}
 
-	clear_bit(I915_RESET_BACKOFF, &i915->gpu_error.flags);
-	wake_up_all(&i915->gpu_error.reset_queue);
+	clear_bit(I915_RESET_BACKOFF, &error->flags);
+	wake_up_all(&error->reset_queue);
 
 out:
 	intel_runtime_pm_put(i915, wakeref);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 07/46] drm/i915: Don't claim an unstarted request was guilty
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (5 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 06/46] drm/i915: Serialise resets with wedging Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
                   ` (45 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

If we haven't even begun executing the payload of the stalled request,
then we should not claim that its userspace context was guilty of
submitting a hanging batch.

v2: Check for context corruption before trying to restart.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/intel_lrc.c              | 34 ++++++++++++++++++-
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  9 ++++-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  6 ++++
 3 files changed, 47 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5e98fd79bd9d..5d5ce91a5dfa 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1878,6 +1878,23 @@ static void execlists_reset_prepare(struct intel_engine_cs *engine)
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
+static bool lrc_regs_ok(const struct i915_request *rq)
+{
+	const struct intel_ring *ring = rq->ring;
+	const u32 *regs = rq->hw_context->lrc_reg_state;
+
+	/* Quick spot check for the common signs of context corruption */
+
+	if (regs[CTX_RING_BUFFER_CONTROL + 1] !=
+	    (RING_CTL_SIZE(ring->size) | RING_VALID))
+		return false;
+
+	if (regs[CTX_RING_BUFFER_START + 1] != i915_ggtt_offset(ring->vma))
+		return false;
+
+	return true;
+}
+
 static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -1912,6 +1929,21 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	if (!rq)
 		goto out_unlock;
 
+	/*
+	 * If this request hasn't started yet, e.g. it is waiting on a
+	 * semaphore, we need to avoid skipping the request or else we
+	 * break the signaling chain. However, if the context is corrupt
+	 * the request will not restart and we will be stuck with a wedged
+	 * device. It is quite often the case that if we issue a reset
+	 * while the GPU is loading the context image, that context image
+	 * becomes corrupt.
+	 *
+	 * Otherwise, if we have not started yet, the request should replay
+	 * perfectly and we do not need to flag the result as being erroneous.
+	 */
+	if (!i915_request_started(rq) && lrc_regs_ok(rq))
+		goto out_unlock;
+
 	/*
 	 * If the request was innocent, we leave the request in the ELSP
 	 * and will try to replay it on restarting. The context image may
@@ -1924,7 +1956,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	 * image back to the expected values to skip over the guilty request.
 	 */
 	i915_reset_request(rq, stalled);
-	if (!stalled)
+	if (!stalled && lrc_regs_ok(rq))
 		goto out_unlock;
 
 	/*
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index 9ebd9225684e..86354e51bdd3 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -142,10 +142,17 @@ igt_spinner_create_request(struct igt_spinner *spin,
 	*batch++ = upper_32_bits(vma->node.start);
 	*batch++ = MI_BATCH_BUFFER_END; /* not reached */
 
-	i915_gem_chipset_flush(spin->i915);
+	if (engine->emit_init_breadcrumb &&
+	    rq->timeline->has_initial_breadcrumb) {
+		err = engine->emit_init_breadcrumb(rq);
+		if (err)
+			goto cancel_rq;
+	}
 
 	err = engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, 0);
 
+	i915_gem_chipset_flush(spin->i915);
+
 cancel_rq:
 	if (err) {
 		i915_request_skip(rq, err);
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 4886fac12628..36c17bfe05a7 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -246,6 +246,12 @@ hang_create_request(struct hang *h, struct intel_engine_cs *engine)
 	if (INTEL_GEN(vm->i915) <= 5)
 		flags |= I915_DISPATCH_SECURE;
 
+	if (rq->engine->emit_init_breadcrumb) {
+		err = rq->engine->emit_init_breadcrumb(rq);
+		if (err)
+			goto cancel_rq;
+	}
+
 	err = rq->engine->emit_bb_start(rq, vma->node.start, PAGE_SIZE, flags);
 
 cancel_rq:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (6 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 07/46] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 11:19   ` Tvrtko Ursulin
  2019-02-19 10:22   ` Matthew Auld
  2019-02-06 13:03 ` [PATCH 09/46] drm/i915/execlists: Suppress redundant preemption Chris Wilson
                   ` (44 subsequent siblings)
  52 siblings, 2 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

WAIT is occasionally suppressed by virtue of preempted requests being
promoted to NEWCLIENT if they have not all ready received that boost.
Make this consistent for all WAIT boosts that they are not allowed to
preempt executing contexts and are merely granted the right to be at the
front of the queue for the next execution slot. This is in keeping with
the desire that the WAIT boost be a minor tweak that does not give
excessive promotion to its user and open ourselves to trivial abuse.

The problem with the inconsistent WAIT preemption becomes more apparent
as the preemption is propagated across the engines, where one engine may
preempt and the other not, and we be relying on the exact execution
order being consistent across engines (e.g. using HW semaphores to
coordinate parallel execution).

v2: Also protect GuC submission from false preemption loops.
v3: Build bug safeguards and better debug messages for st.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c        |  12 ++
 drivers/gpu/drm/i915/i915_scheduler.h      |   2 +
 drivers/gpu/drm/i915/intel_lrc.c           |   9 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 161 +++++++++++++++++++++
 4 files changed, 183 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index c2a5c48c7541..35acef74b93a 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -372,12 +372,24 @@ void __i915_request_submit(struct i915_request *request)
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
+
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
+
 	request->global_seqno = seqno;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
+
+	/*
+	 * As we do not allow WAIT to preempt inflight requests,
+	 * once we have executed a request, along with triggering
+	 * any execution callbacks, we must preserve its ordering
+	 * within the non-preemptible FIFO.
+	 */
+	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
+	request->sched.attr.priority |= __NO_PREEMPTION;
+
 	spin_unlock(&request->lock);
 
 	engine->emit_fini_breadcrumb(request,
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index dbe9cb7ecd82..54bd6c89817e 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -33,6 +33,8 @@ enum {
 #define I915_PRIORITY_WAIT	((u8)BIT(0))
 #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
 
+#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
+
 struct i915_sched_attr {
 	/**
 	 * @priority: execution and service priority
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5d5ce91a5dfa..afd05e25f911 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
 	return rq->sched.attr.priority;
 }
 
+static int effective_prio(const struct i915_request *rq)
+{
+	/* Restrict mere WAIT boosts from triggering preemption */
+	return rq_prio(rq) | __NO_PREEMPTION;
+}
+
 static int queue_prio(const struct intel_engine_execlists *execlists)
 {
 	struct i915_priolist *p;
@@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 static inline bool need_preempt(const struct intel_engine_cs *engine,
 				const struct i915_request *rq)
 {
-	const int last_prio = rq_prio(rq);
+	int last_prio;
 
 	if (!intel_engine_has_preemption(engine))
 		return false;
@@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	 * preempt. If that hint is stale or we may be trying to preempt
 	 * ourselves, ignore the request.
 	 */
+	last_prio = effective_prio(rq);
 	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
 				      last_prio))
 		return false;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 58144e024751..263afd2f1596 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -407,6 +407,166 @@ static int live_suppress_self_preempt(void *arg)
 	goto err_client_b;
 }
 
+static int __i915_sw_fence_call
+dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
+{
+	return NOTIFY_DONE;
+}
+
+static struct i915_request *dummy_request(struct intel_engine_cs *engine)
+{
+	struct i915_request *rq;
+
+	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
+	if (!rq)
+		return NULL;
+
+	INIT_LIST_HEAD(&rq->active_list);
+	rq->engine = engine;
+
+	i915_sched_node_init(&rq->sched);
+
+	/* mark this request as permanently incomplete */
+	rq->fence.seqno = 1;
+	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
+	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
+	GEM_BUG_ON(i915_request_completed(rq));
+
+	i915_sw_fence_init(&rq->submit, dummy_notify);
+	i915_sw_fence_commit(&rq->submit);
+
+	return rq;
+}
+
+static void dummy_request_free(struct i915_request *dummy)
+{
+	i915_request_mark_complete(dummy);
+	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	kfree(dummy);
+}
+
+static int live_suppress_wait_preempt(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct preempt_client client[4];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = -ENOMEM;
+	int i;
+
+	/*
+	 * Waiters are given a little priority nudge, but not enough
+	 * to actually cause any preemption. Double check that we do
+	 * not needlessly generate preempt-to-idle cycles.
+	 */
+
+	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
+		goto err_unlock;
+	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
+		goto err_client_0;
+	if (preempt_client_init(i915, &client[2])) /* head of queue */
+		goto err_client_1;
+	if (preempt_client_init(i915, &client[3])) /* bystander */
+		goto err_client_2;
+
+	for_each_engine(engine, i915, id) {
+		int depth;
+
+		if (!engine->emit_init_breadcrumb)
+			continue;
+
+		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
+			struct i915_request *rq[ARRAY_SIZE(client)];
+			struct i915_request *dummy;
+
+			engine->execlists.preempt_hang.count = 0;
+
+			dummy = dummy_request(engine);
+			if (!dummy)
+				goto err_client_3;
+
+			for (i = 0; i < ARRAY_SIZE(client); i++) {
+				rq[i] = igt_spinner_create_request(&client[i].spin,
+								   client[i].ctx, engine,
+								   MI_NOOP);
+				if (IS_ERR(rq[i])) {
+					err = PTR_ERR(rq[i]);
+					goto err_wedged;
+				}
+
+				/* Disable NEWCLIENT promotion */
+				__i915_active_request_set(&rq[i]->timeline->last_request,
+							  dummy);
+				i915_request_add(rq[i]);
+			}
+
+			dummy_request_free(dummy);
+
+			GEM_BUG_ON(i915_request_completed(rq[0]));
+			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
+				pr_err("%s: First client failed to start\n",
+				       engine->name);
+				goto err_wedged;
+			}
+			GEM_BUG_ON(!i915_request_started(rq[0]));
+
+			if (i915_request_wait(rq[depth],
+					      I915_WAIT_LOCKED |
+					      I915_WAIT_PRIORITY,
+					      1) != -ETIME) {
+				pr_err("%s: Waiter depth:%d completed!\n",
+				       engine->name, depth);
+				goto err_wedged;
+			}
+
+			for (i = 0; i < ARRAY_SIZE(client); i++)
+				igt_spinner_end(&client[i].spin);
+
+			if (igt_flush_test(i915, I915_WAIT_LOCKED))
+				goto err_wedged;
+
+			if (engine->execlists.preempt_hang.count) {
+				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
+				       engine->name,
+				       engine->execlists.preempt_hang.count,
+				       depth);
+				err = -EINVAL;
+				goto err_client_3;
+			}
+		}
+	}
+
+	err = 0;
+err_client_3:
+	preempt_client_fini(&client[3]);
+err_client_2:
+	preempt_client_fini(&client[2]);
+err_client_1:
+	preempt_client_fini(&client[1]);
+err_client_0:
+	preempt_client_fini(&client[0]);
+err_unlock:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+
+err_wedged:
+	for (i = 0; i < ARRAY_SIZE(client); i++)
+		igt_spinner_end(&client[i].spin);
+	i915_gem_set_wedged(i915);
+	err = -EIO;
+	goto err_client_3;
+}
+
 static int live_chain_preempt(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -887,6 +1047,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt),
 		SUBTEST(live_late_preempt),
 		SUBTEST(live_suppress_self_preempt),
+		SUBTEST(live_suppress_wait_preempt),
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 09/46] drm/i915/execlists: Suppress redundant preemption
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (7 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 10/46] drm/i915: Make request allocation caches global Chris Wilson
                   ` (43 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

On unwinding the active request we give it a small (limited to internal
priority levels) boost to prevent it from being gazumped a second time.
However, this means that it can be promoted to above the request that
triggered the preemption request, causing a preempt-to-idle cycle for no
change. We can avoid this if we take the boost into account when
checking if the preemption request is valid.

v2: After preemption the active request will be after the preemptee if
they end up with equal priority.

v3: Tvrtko pointed out that this, the existing logic, makes
I915_PRIORITY_WAIT non-preemptible. Document this interesting quirk!

v4: Prove Tvrtko was right about WAIT being non-preemptible and test it.
v5: Except not all priorities were made equal, and the WAIT not preempting
is only if we start off as !NEWCLIENT.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 38 ++++++++++++++++++++++++++++----
 1 file changed, 34 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index afd05e25f911..58108aa290d8 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,6 +164,8 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
@@ -190,8 +192,30 @@ static inline int rq_prio(const struct i915_request *rq)
 
 static int effective_prio(const struct i915_request *rq)
 {
+	int prio = rq_prio(rq);
+
+	/*
+	 * On unwinding the active request, we give it a priority bump
+	 * equivalent to a freshly submitted request. This protects it from
+	 * being gazumped again, but it would be preferable if we didn't
+	 * let it be gazumped in the first place!
+	 *
+	 * See __unwind_incomplete_requests()
+	 */
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(rq)) {
+		/*
+		 * After preemption, we insert the active request at the
+		 * end of the new priority level. This means that we will be
+		 * _lower_ priority than the preemptee all things equal (and
+		 * so the preemption is valid), so adjust our comparison
+		 * accordingly.
+		 */
+		prio |= ACTIVE_PRIORITY;
+		prio--;
+	}
+
 	/* Restrict mere WAIT boosts from triggering preemption */
-	return rq_prio(rq) | __NO_PREEMPTION;
+	return prio | __NO_PREEMPTION;
 }
 
 static int queue_prio(const struct intel_engine_execlists *execlists)
@@ -360,7 +384,7 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 {
 	struct i915_request *rq, *rn, *active = NULL;
 	struct list_head *uninitialized_var(pl);
-	int prio = I915_PRIORITY_INVALID | I915_PRIORITY_NEWCLIENT;
+	int prio = I915_PRIORITY_INVALID | ACTIVE_PRIORITY;
 
 	lockdep_assert_held(&engine->timeline.lock);
 
@@ -391,9 +415,15 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	 * The active request is now effectively the start of a new client
 	 * stream, so give it the equivalent small priority bump to prevent
 	 * it being gazumped a second time by another peer.
+	 *
+	 * One consequence of this preemption boost is that we may jump
+	 * over lesser priorities (such as I915_PRIORITY_WAIT), effectively
+	 * making those priorities non-preemptible. They will be moved forward
+	 * in the priority queue, but they will not gain immediate access to
+	 * the GPU.
 	 */
-	if (!(prio & I915_PRIORITY_NEWCLIENT)) {
-		prio |= I915_PRIORITY_NEWCLIENT;
+	if (~prio & ACTIVE_PRIORITY && __i915_request_has_started(active)) {
+		prio |= ACTIVE_PRIORITY;
 		active->sched.attr.priority = prio;
 		list_move_tail(&active->sched.link,
 			       i915_sched_lookup_priolist(engine, prio));
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 10/46] drm/i915: Make request allocation caches global
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (8 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 09/46] drm/i915/execlists: Suppress redundant preemption Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 11:43   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 11/46] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
                   ` (42 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

As kmem_caches share the same properties (size, allocation/free behaviour)
for all potential devices, we can use global caches. While this
potential has worse fragmentation behaviour (one can argue that
different devices would have different activity lifetimes, but you can
also argue that activity is temporal across the system) it is the
default behaviour of the system at large to amalgamate matching caches.

The benefit for us is much reduced pointer dancing along the frequent
allocation paths.

v2: Defer shrinking until after a global grace period for futureproofing
multiple consumers of the slab caches, similar to the current strategy
for avoiding shrinking too early.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/i915_active.c            |   7 +-
 drivers/gpu/drm/i915/i915_active.h            |   1 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 -
 drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
 drivers/gpu/drm/i915/i915_globals.c           | 105 ++++++++++++++++++
 drivers/gpu/drm/i915/i915_globals.h           |  15 +++
 drivers/gpu/drm/i915/i915_pci.c               |   8 +-
 drivers/gpu/drm/i915/i915_request.c           |  53 +++++++--
 drivers/gpu/drm/i915/i915_request.h           |  10 ++
 drivers/gpu/drm/i915/i915_scheduler.c         |  66 ++++++++---
 drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
 drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  48 ++++----
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 -----
 drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
 drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
 20 files changed, 306 insertions(+), 152 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_globals.c
 create mode 100644 drivers/gpu/drm/i915/i915_globals.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 1787e1299b1b..a1d834068765 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -77,6 +77,7 @@ i915-y += \
 	  i915_gem_tiling.o \
 	  i915_gem_userptr.o \
 	  i915_gemfs.o \
+	  i915_globals.o \
 	  i915_query.o \
 	  i915_request.o \
 	  i915_scheduler.o \
diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
index 215b6ff8aa73..9026787ebdf8 100644
--- a/drivers/gpu/drm/i915/i915_active.c
+++ b/drivers/gpu/drm/i915/i915_active.c
@@ -280,7 +280,12 @@ int __init i915_global_active_init(void)
 	return 0;
 }
 
-void __exit i915_global_active_exit(void)
+void i915_global_active_shrink(void)
+{
+	kmem_cache_shrink(global.slab_cache);
+}
+
+void i915_global_active_exit(void)
 {
 	kmem_cache_destroy(global.slab_cache);
 }
diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 12b5c1d287d1..5fbd9102384b 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
 #endif
 
 int i915_global_active_init(void);
+void i915_global_active_shrink(void);
 void i915_global_active_exit(void);
 
 #endif /* _I915_ACTIVE_H_ */
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 37230ae7fbe6..a365b1a2ea9a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1459,9 +1459,6 @@ struct drm_i915_private {
 	struct kmem_cache *objects;
 	struct kmem_cache *vmas;
 	struct kmem_cache *luts;
-	struct kmem_cache *requests;
-	struct kmem_cache *dependencies;
-	struct kmem_cache *priorities;
 
 	const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
 	struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 1eb3a5f8654c..d18c4ccff370 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -42,6 +42,7 @@
 #include "i915_drv.h"
 #include "i915_gem_clflush.h"
 #include "i915_gemfs.h"
+#include "i915_globals.h"
 #include "i915_reset.h"
 #include "i915_trace.h"
 #include "i915_vgpu.h"
@@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
 	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
 		i915->gt.epoch = 1;
 
+	i915_globals_unpark();
+
 	intel_enable_gt_powersave(i915);
 	i915_update_gfx_val(i915);
 	if (INTEL_GEN(i915) >= 6)
@@ -2916,12 +2919,11 @@ static void shrink_caches(struct drm_i915_private *i915)
 	 * filled slabs to prioritise allocating from the mostly full slabs,
 	 * with the aim of reducing fragmentation.
 	 */
-	kmem_cache_shrink(i915->priorities);
-	kmem_cache_shrink(i915->dependencies);
-	kmem_cache_shrink(i915->requests);
 	kmem_cache_shrink(i915->luts);
 	kmem_cache_shrink(i915->vmas);
 	kmem_cache_shrink(i915->objects);
+
+	i915_globals_park();
 }
 
 struct sleep_rcu_work {
@@ -5264,23 +5266,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 	if (!dev_priv->luts)
 		goto err_vmas;
 
-	dev_priv->requests = KMEM_CACHE(i915_request,
-					SLAB_HWCACHE_ALIGN |
-					SLAB_RECLAIM_ACCOUNT |
-					SLAB_TYPESAFE_BY_RCU);
-	if (!dev_priv->requests)
-		goto err_luts;
-
-	dev_priv->dependencies = KMEM_CACHE(i915_dependency,
-					    SLAB_HWCACHE_ALIGN |
-					    SLAB_RECLAIM_ACCOUNT);
-	if (!dev_priv->dependencies)
-		goto err_requests;
-
-	dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
-	if (!dev_priv->priorities)
-		goto err_dependencies;
-
 	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
 	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
 
@@ -5305,12 +5290,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
 
 	return 0;
 
-err_dependencies:
-	kmem_cache_destroy(dev_priv->dependencies);
-err_requests:
-	kmem_cache_destroy(dev_priv->requests);
-err_luts:
-	kmem_cache_destroy(dev_priv->luts);
 err_vmas:
 	kmem_cache_destroy(dev_priv->vmas);
 err_objects:
@@ -5328,9 +5307,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
 
 	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
 
-	kmem_cache_destroy(dev_priv->priorities);
-	kmem_cache_destroy(dev_priv->dependencies);
-	kmem_cache_destroy(dev_priv->requests);
 	kmem_cache_destroy(dev_priv->luts);
 	kmem_cache_destroy(dev_priv->vmas);
 	kmem_cache_destroy(dev_priv->objects);
diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
new file mode 100644
index 000000000000..82ee6b1e7227
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_globals.c
@@ -0,0 +1,105 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include <linux/slab.h>
+#include <linux/workqueue.h>
+
+#include "i915_active.h"
+#include "i915_globals.h"
+#include "i915_request.h"
+#include "i915_scheduler.h"
+
+int __init i915_globals_init(void)
+{
+	int err;
+
+	err = i915_global_active_init();
+	if (err)
+		return err;
+
+	err = i915_global_request_init();
+	if (err)
+		goto err_active;
+
+	err = i915_global_scheduler_init();
+	if (err)
+		goto err_request;
+
+	return 0;
+
+err_request:
+	i915_global_request_exit();
+err_active:
+	i915_global_active_exit();
+	return err;
+}
+
+static void i915_globals_shrink(void)
+{
+	i915_global_active_shrink();
+	i915_global_request_shrink();
+	i915_global_scheduler_shrink();
+}
+
+static atomic_t active;
+static atomic_t epoch;
+struct park_work {
+	struct rcu_work work;
+	int epoch;
+};
+
+static void __i915_globals_park(struct work_struct *work)
+{
+	struct park_work *wrk = container_of(work, typeof(*wrk), work.work);
+
+	/* Confirm nothing woke up in the last grace period */
+	if (wrk->epoch == atomic_read(&epoch))
+		i915_globals_shrink();
+
+	kfree(wrk);
+}
+
+void i915_globals_park(void)
+{
+	struct park_work *wrk;
+
+	/*
+	 * Defer shrinking the global slab caches (and other work) until
+	 * after a RCU grace period has completed with no activity. This
+	 * is to try and reduce the latency impact on the consumers caused
+	 * by us shrinking the caches the same time as they are trying to
+	 * allocate, with the assumption being that if we idle long enough
+	 * for an RCU grace period to elapse since the last use, it is likely
+	 * to be longer until we need the caches again.
+	 */
+	if (!atomic_dec_and_test(&active))
+		return;
+
+	wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
+	if (!wrk)
+		return;
+
+	wrk->epoch = atomic_inc_return(&epoch);
+	INIT_RCU_WORK(&wrk->work, __i915_globals_park);
+	queue_rcu_work(system_wq, &wrk->work);
+}
+
+void i915_globals_unpark(void)
+{
+	atomic_inc(&epoch);
+	atomic_inc(&active);
+}
+
+void __exit i915_globals_exit(void)
+{
+	/* Flush any residual park_work */
+	rcu_barrier();
+	flush_scheduled_work();
+
+	i915_global_scheduler_exit();
+	i915_global_request_exit();
+	i915_global_active_exit();
+}
diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h
new file mode 100644
index 000000000000..e468f0413a73
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_globals.h
@@ -0,0 +1,15 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef _I915_GLOBALS_H_
+#define _I915_GLOBALS_H_
+
+int i915_globals_init(void);
+void i915_globals_park(void);
+void i915_globals_unpark(void);
+void i915_globals_exit(void);
+
+#endif /* _I915_GLOBALS_H_ */
diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
index 66f82f3f050f..b73e8d63b1af 100644
--- a/drivers/gpu/drm/i915/i915_pci.c
+++ b/drivers/gpu/drm/i915/i915_pci.c
@@ -28,8 +28,8 @@
 
 #include <drm/drm_drv.h>
 
-#include "i915_active.h"
 #include "i915_drv.h"
+#include "i915_globals.h"
 #include "i915_selftest.h"
 
 #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x)
@@ -801,7 +801,9 @@ static int __init i915_init(void)
 	bool use_kms = true;
 	int err;
 
-	i915_global_active_init();
+	err = i915_globals_init();
+	if (err)
+		return err;
 
 	err = i915_mock_selftests();
 	if (err)
@@ -834,7 +836,7 @@ static void __exit i915_exit(void)
 		return;
 
 	pci_unregister_driver(&i915_pci_driver);
-	i915_global_active_exit();
+	i915_globals_exit();
 }
 
 module_init(i915_init);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 35acef74b93a..174d15c9dd00 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -32,6 +32,11 @@
 #include "i915_active.h"
 #include "i915_reset.h"
 
+static struct i915_global_request {
+	struct kmem_cache *slab_requests;
+	struct kmem_cache *slab_dependencies;
+} global;
+
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
 {
 	return "i915";
@@ -84,7 +89,7 @@ static void i915_fence_release(struct dma_fence *fence)
 	 */
 	i915_sw_fence_fini(&rq->submit);
 
-	kmem_cache_free(rq->i915->requests, rq);
+	kmem_cache_free(global.slab_requests, rq);
 }
 
 const struct dma_fence_ops i915_fence_ops = {
@@ -296,7 +301,7 @@ static void i915_request_retire(struct i915_request *request)
 
 	unreserve_gt(request->i915);
 
-	i915_sched_node_fini(request->i915, &request->sched);
+	i915_sched_node_fini(&request->sched);
 	i915_request_put(request);
 }
 
@@ -530,7 +535,7 @@ i915_request_alloc_slow(struct intel_context *ce)
 	ring_retire_requests(ring);
 
 out:
-	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
 }
 
 static int add_timeline_barrier(struct i915_request *rq)
@@ -617,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	 *
 	 * Do not use kmem_cache_zalloc() here!
 	 */
-	rq = kmem_cache_alloc(i915->requests,
+	rq = kmem_cache_alloc(global.slab_requests,
 			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
 	if (unlikely(!rq)) {
 		rq = i915_request_alloc_slow(ce);
@@ -705,7 +710,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
-	kmem_cache_free(i915->requests, rq);
+	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
 	intel_context_unpin(ce);
@@ -724,9 +729,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		return 0;
 
 	if (to->engine->schedule) {
-		ret = i915_sched_node_add_dependency(to->i915,
-						     &to->sched,
-						     &from->sched);
+		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
 		if (ret < 0)
 			return ret;
 	}
@@ -1199,3 +1202,37 @@ void i915_retire_requests(struct drm_i915_private *i915)
 #include "selftests/mock_request.c"
 #include "selftests/i915_request.c"
 #endif
+
+int __init i915_global_request_init(void)
+{
+	global.slab_requests = KMEM_CACHE(i915_request,
+					  SLAB_HWCACHE_ALIGN |
+					  SLAB_RECLAIM_ACCOUNT |
+					  SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_requests)
+		return -ENOMEM;
+
+	global.slab_dependencies = KMEM_CACHE(i915_dependency,
+					      SLAB_HWCACHE_ALIGN |
+					      SLAB_RECLAIM_ACCOUNT);
+	if (!global.slab_dependencies)
+		goto err_requests;
+
+	return 0;
+
+err_requests:
+	kmem_cache_destroy(global.slab_requests);
+	return -ENOMEM;
+}
+
+void i915_global_request_shrink(void)
+{
+	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_requests);
+}
+
+void i915_global_request_exit(void)
+{
+	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_requests);
+}
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 40f3e8dcbdd5..071ff1064579 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -29,6 +29,7 @@
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "i915_selftest.h"
 #include "i915_sw_fence.h"
 
 #include <uapi/drm/i915_drm.h>
@@ -204,6 +205,11 @@ struct i915_request {
 	struct drm_i915_file_private *file_priv;
 	/** file_priv list entry for this request */
 	struct list_head client_link;
+
+	I915_SELFTEST_DECLARE(struct {
+		struct list_head link;
+		unsigned long delay;
+	} mock;)
 };
 
 #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
@@ -403,4 +409,8 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
 
 void i915_retire_requests(struct drm_i915_private *i915);
 
+int i915_global_request_init(void);
+void i915_global_request_shrink(void);
+void i915_global_request_exit(void);
+
 #endif /* I915_REQUEST_H */
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index d01683167c77..720cc91b4d10 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -10,6 +10,11 @@
 #include "i915_request.h"
 #include "i915_scheduler.h"
 
+static struct i915_global_scheduler {
+	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_priorities;
+} global;
+
 static DEFINE_SPINLOCK(schedule_lock);
 
 static const struct i915_request *
@@ -32,16 +37,15 @@ void i915_sched_node_init(struct i915_sched_node *node)
 }
 
 static struct i915_dependency *
-i915_dependency_alloc(struct drm_i915_private *i915)
+i915_dependency_alloc(void)
 {
-	return kmem_cache_alloc(i915->dependencies, GFP_KERNEL);
+	return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
 }
 
 static void
-i915_dependency_free(struct drm_i915_private *i915,
-		     struct i915_dependency *dep)
+i915_dependency_free(struct i915_dependency *dep)
 {
-	kmem_cache_free(i915->dependencies, dep);
+	kmem_cache_free(global.slab_dependencies, dep);
 }
 
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
@@ -68,25 +72,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 	return ret;
 }
 
-int i915_sched_node_add_dependency(struct drm_i915_private *i915,
-				   struct i915_sched_node *node,
+int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal)
 {
 	struct i915_dependency *dep;
 
-	dep = i915_dependency_alloc(i915);
+	dep = i915_dependency_alloc();
 	if (!dep)
 		return -ENOMEM;
 
 	if (!__i915_sched_node_add_dependency(node, signal, dep,
 					      I915_DEPENDENCY_ALLOC))
-		i915_dependency_free(i915, dep);
+		i915_dependency_free(dep);
 
 	return 0;
 }
 
-void i915_sched_node_fini(struct drm_i915_private *i915,
-			  struct i915_sched_node *node)
+void i915_sched_node_fini(struct i915_sched_node *node)
 {
 	struct i915_dependency *dep, *tmp;
 
@@ -106,7 +108,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
 
 		list_del(&dep->wait_link);
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(i915, dep);
+			i915_dependency_free(dep);
 	}
 
 	/* Remove ourselves from everyone who depends upon us */
@@ -116,7 +118,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
 
 		list_del(&dep->signal_link);
 		if (dep->flags & I915_DEPENDENCY_ALLOC)
-			i915_dependency_free(i915, dep);
+			i915_dependency_free(dep);
 	}
 
 	spin_unlock(&schedule_lock);
@@ -193,7 +195,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 	if (prio == I915_PRIORITY_NORMAL) {
 		p = &execlists->default_priolist;
 	} else {
-		p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC);
+		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
 		/* Convert an allocation failure to a priority bump */
 		if (unlikely(!p)) {
 			prio = I915_PRIORITY_NORMAL; /* recurses just once */
@@ -408,3 +410,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
 
 	spin_unlock_bh(&schedule_lock);
 }
+
+void __i915_priolist_free(struct i915_priolist *p)
+{
+	kmem_cache_free(global.slab_priorities, p);
+}
+
+int __init i915_global_scheduler_init(void)
+{
+	global.slab_dependencies = KMEM_CACHE(i915_dependency,
+					      SLAB_HWCACHE_ALIGN);
+	if (!global.slab_dependencies)
+		return -ENOMEM;
+
+	global.slab_priorities = KMEM_CACHE(i915_priolist,
+					    SLAB_HWCACHE_ALIGN);
+	if (!global.slab_priorities)
+		goto err_priorities;
+
+	return 0;
+
+err_priorities:
+	kmem_cache_destroy(global.slab_priorities);
+	return -ENOMEM;
+}
+
+void i915_global_scheduler_shrink(void)
+{
+	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_priorities);
+}
+
+void i915_global_scheduler_exit(void)
+{
+	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_priorities);
+}
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 54bd6c89817e..5196ce07b6c2 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -85,6 +85,23 @@ struct i915_dependency {
 #define I915_DEPENDENCY_ALLOC BIT(0)
 };
 
+struct i915_priolist {
+	struct list_head requests[I915_PRIORITY_COUNT];
+	struct rb_node node;
+	unsigned long used;
+	int priority;
+};
+
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
+
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)
+
 void i915_sched_node_init(struct i915_sched_node *node);
 
 bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
@@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 				      struct i915_dependency *dep,
 				      unsigned long flags);
 
-int i915_sched_node_add_dependency(struct drm_i915_private *i915,
-				   struct i915_sched_node *node,
+int i915_sched_node_add_dependency(struct i915_sched_node *node,
 				   struct i915_sched_node *signal);
 
-void i915_sched_node_fini(struct drm_i915_private *i915,
-			  struct i915_sched_node *node);
+void i915_sched_node_fini(struct i915_sched_node *node);
 
 void i915_schedule(struct i915_request *request,
 		   const struct i915_sched_attr *attr);
@@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
 struct list_head *
 i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
 
+void __i915_priolist_free(struct i915_priolist *p);
+static inline void i915_priolist_free(struct i915_priolist *p)
+{
+	if (p->priority != I915_PRIORITY_NORMAL)
+		__i915_priolist_free(p);
+}
+
+int i915_global_scheduler_init(void);
+void i915_global_scheduler_shrink(void);
+void i915_global_scheduler_exit(void);
+
 #endif /* _I915_SCHEDULER_H_ */
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 8bc8aa54aa35..4cf94513615d 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 done:
 	execlists->queue_priority_hint =
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 58108aa290d8..553371e654d7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -806,8 +806,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 
 done:
@@ -966,8 +965,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 		}
 
 		rb_erase_cached(&p->node, &execlists->queue);
-		if (p->priority != I915_PRIORITY_NORMAL)
-			kmem_cache_free(engine->i915->priorities, p);
+		i915_priolist_free(p);
 	}
 
 	intel_write_status_page(engine,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 710ffb221775..e7d85aaee415 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -185,23 +185,6 @@ enum intel_engine_id {
 #define _VECS(n) (VECS + (n))
 };
 
-struct i915_priolist {
-	struct list_head requests[I915_PRIORITY_COUNT];
-	struct rb_node node;
-	unsigned long used;
-	int priority;
-};
-
-#define priolist_for_each_request(it, plist, idx) \
-	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
-		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
-
-#define priolist_for_each_request_consume(it, n, plist, idx) \
-	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
-		list_for_each_entry_safe(it, n, \
-					 &(plist)->requests[idx - 1], \
-					 sched.link)
-
 struct st_preempt_hang {
 	struct completion completion;
 	unsigned int count;
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 263afd2f1596..1a3af4b4107d 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -441,7 +441,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
 static void dummy_request_free(struct i915_request *dummy)
 {
 	i915_request_mark_complete(dummy);
-	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
+	i915_sched_node_fini(&dummy->sched);
 	kfree(dummy);
 }
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 08f0cab02e0f..0d35af07867b 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -76,28 +76,27 @@ static void mock_ring_free(struct intel_ring *base)
 	kfree(ring);
 }
 
-static struct mock_request *first_request(struct mock_engine *engine)
+static struct i915_request *first_request(struct mock_engine *engine)
 {
 	return list_first_entry_or_null(&engine->hw_queue,
-					struct mock_request,
-					link);
+					struct i915_request,
+					mock.link);
 }
 
-static void advance(struct mock_request *request)
+static void advance(struct i915_request *request)
 {
-	list_del_init(&request->link);
-	intel_engine_write_global_seqno(request->base.engine,
-					request->base.global_seqno);
-	i915_request_mark_complete(&request->base);
-	GEM_BUG_ON(!i915_request_completed(&request->base));
+	list_del_init(&request->mock.link);
+	intel_engine_write_global_seqno(request->engine, request->global_seqno);
+	i915_request_mark_complete(request);
+	GEM_BUG_ON(!i915_request_completed(request));
 
-	intel_engine_queue_breadcrumbs(request->base.engine);
+	intel_engine_queue_breadcrumbs(request->engine);
 }
 
 static void hw_delay_complete(struct timer_list *t)
 {
 	struct mock_engine *engine = from_timer(engine, t, hw_delay);
-	struct mock_request *request;
+	struct i915_request *request;
 	unsigned long flags;
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
@@ -112,8 +111,9 @@ static void hw_delay_complete(struct timer_list *t)
 	 * requeue the timer for the next delayed request.
 	 */
 	while ((request = first_request(engine))) {
-		if (request->delay) {
-			mod_timer(&engine->hw_delay, jiffies + request->delay);
+		if (request->mock.delay) {
+			mod_timer(&engine->hw_delay,
+				  jiffies + request->mock.delay);
 			break;
 		}
 
@@ -171,10 +171,8 @@ mock_context_pin(struct intel_engine_cs *engine,
 
 static int mock_request_alloc(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
-
-	INIT_LIST_HEAD(&mock->link);
-	mock->delay = 0;
+	INIT_LIST_HEAD(&request->mock.link);
+	request->mock.delay = 0;
 
 	return 0;
 }
@@ -192,7 +190,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs)
 
 static void mock_submit_request(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
 	unsigned long flags;
@@ -201,12 +198,13 @@ static void mock_submit_request(struct i915_request *request)
 	GEM_BUG_ON(!request->global_seqno);
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
-	list_add_tail(&mock->link, &engine->hw_queue);
-	if (mock->link.prev == &engine->hw_queue) {
-		if (mock->delay)
-			mod_timer(&engine->hw_delay, jiffies + mock->delay);
+	list_add_tail(&request->mock.link, &engine->hw_queue);
+	if (list_is_first(&request->mock.link, &engine->hw_queue)) {
+		if (request->mock.delay)
+			mod_timer(&engine->hw_delay,
+				  jiffies + request->mock.delay);
 		else
-			advance(mock);
+			advance(request);
 	}
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
@@ -266,12 +264,12 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 {
 	struct mock_engine *mock =
 		container_of(engine, typeof(*mock), base);
-	struct mock_request *request, *rn;
+	struct i915_request *request, *rn;
 
 	del_timer_sync(&mock->hw_delay);
 
 	spin_lock_irq(&mock->hw_lock);
-	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
+	list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link)
 		advance(request);
 	spin_unlock_irq(&mock->hw_lock);
 }
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index fc516a2970f4..5a98caba6d69 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev)
 
 	destroy_workqueue(i915->wq);
 
-	kmem_cache_destroy(i915->priorities);
-	kmem_cache_destroy(i915->dependencies);
-	kmem_cache_destroy(i915->requests);
 	kmem_cache_destroy(i915->vmas);
 	kmem_cache_destroy(i915->objects);
 
@@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void)
 	if (!i915->vmas)
 		goto err_objects;
 
-	i915->requests = KMEM_CACHE(mock_request,
-				    SLAB_HWCACHE_ALIGN |
-				    SLAB_RECLAIM_ACCOUNT |
-				    SLAB_TYPESAFE_BY_RCU);
-	if (!i915->requests)
-		goto err_vmas;
-
-	i915->dependencies = KMEM_CACHE(i915_dependency,
-					SLAB_HWCACHE_ALIGN |
-					SLAB_RECLAIM_ACCOUNT);
-	if (!i915->dependencies)
-		goto err_requests;
-
-	i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
-	if (!i915->priorities)
-		goto err_dependencies;
-
 	i915_timelines_init(i915);
 
 	INIT_LIST_HEAD(&i915->gt.active_rings);
@@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void)
 err_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 	i915_timelines_fini(i915);
-	kmem_cache_destroy(i915->priorities);
-err_dependencies:
-	kmem_cache_destroy(i915->dependencies);
-err_requests:
-	kmem_cache_destroy(i915->requests);
-err_vmas:
 	kmem_cache_destroy(i915->vmas);
 err_objects:
 	kmem_cache_destroy(i915->objects);
diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c
index 0dc29e242597..d1a7c9608712 100644
--- a/drivers/gpu/drm/i915/selftests/mock_request.c
+++ b/drivers/gpu/drm/i915/selftests/mock_request.c
@@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine,
 	     unsigned long delay)
 {
 	struct i915_request *request;
-	struct mock_request *mock;
 
 	/* NB the i915->requests slab cache is enlarged to fit mock_request */
 	request = i915_request_alloc(engine, context);
 	if (IS_ERR(request))
 		return NULL;
 
-	mock = container_of(request, typeof(*mock), base);
-	mock->delay = delay;
-
-	return &mock->base;
+	request->mock.delay = delay;
+	return request;
 }
 
 bool mock_cancel_request(struct i915_request *request)
 {
-	struct mock_request *mock = container_of(request, typeof(*mock), base);
 	struct mock_engine *engine =
 		container_of(request->engine, typeof(*engine), base);
 	bool was_queued;
 
 	spin_lock_irq(&engine->hw_lock);
-	was_queued = !list_empty(&mock->link);
-	list_del_init(&mock->link);
+	was_queued = !list_empty(&request->mock.link);
+	list_del_init(&request->mock.link);
 	spin_unlock_irq(&engine->hw_lock);
 
 	if (was_queued)
diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h
index 995fb728380c..4acf0211df20 100644
--- a/drivers/gpu/drm/i915/selftests/mock_request.h
+++ b/drivers/gpu/drm/i915/selftests/mock_request.h
@@ -29,13 +29,6 @@
 
 #include "../i915_request.h"
 
-struct mock_request {
-	struct i915_request base;
-
-	struct list_head link;
-	unsigned long delay;
-};
-
 struct i915_request *
 mock_request(struct intel_engine_cs *engine,
 	     struct i915_gem_context *context,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 11/46] drm/i915: Keep timeline HWSP allocated until idle across the system
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (9 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 10/46] drm/i915: Make request allocation caches global Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 12/46] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
                   ` (41 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In preparation for enabling HW semaphores, we need to keep in flight
timeline HWSP alive until its use across entire system has completed,
as any other timeline active on the GPU may still refer back to the
already retired timeline. We both have to delay recycling available
cachelines and unpinning old HWSP until the next idle point.

An easy option would be to simply keep all used HWSP until the system as
a whole was idle, i.e. we could release them all at once on parking.
However, on a busy system, we may never see a global idle point,
essentially meaning the resource will be leaked until we are forced to
do a GC pass. We already employ a fine-grained idle detection mechanism
for vma, which we can reuse here so that each cacheline can be freed
immediately after the last request using it is retired.

v3: Keep track of the activity of each cacheline.
v4: cacheline_free() on canceling the seqno tracking
v5: Finally with a testcase to exercise wraparound

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c           |  30 +-
 drivers/gpu/drm/i915/i915_timeline.c          | 264 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_timeline.h          |   9 +-
 .../gpu/drm/i915/selftests/i915_timeline.c    | 113 ++++++++
 4 files changed, 377 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 174d15c9dd00..13e0388791b6 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -331,11 +331,6 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
-static u32 timeline_get_seqno(struct i915_timeline *tl)
-{
-	return tl->seqno += 1 + tl->has_initial_breadcrumb;
-}
-
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -556,8 +551,10 @@ struct i915_request *
 i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 {
 	struct drm_i915_private *i915 = engine->i915;
-	struct i915_request *rq;
 	struct intel_context *ce;
+	struct i915_timeline *tl;
+	struct i915_request *rq;
+	u32 seqno;
 	int ret;
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
@@ -632,24 +629,26 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 		}
 	}
 
-	rq->rcustate = get_state_synchronize_rcu();
-
 	INIT_LIST_HEAD(&rq->active_list);
+
+	tl = ce->ring->timeline;
+	ret = i915_timeline_get_seqno(tl, rq, &seqno);
+	if (ret)
+		goto err_free;
+
 	rq->i915 = i915;
 	rq->engine = engine;
 	rq->gem_context = ctx;
 	rq->hw_context = ce;
 	rq->ring = ce->ring;
-	rq->timeline = ce->ring->timeline;
+	rq->timeline = tl;
 	GEM_BUG_ON(rq->timeline == &engine->timeline);
-	rq->hwsp_seqno = rq->timeline->hwsp_seqno;
+	rq->hwsp_seqno = tl->hwsp_seqno;
+	rq->rcustate = get_state_synchronize_rcu(); /* acts as smp_mb() */
 
 	spin_lock_init(&rq->lock);
-	dma_fence_init(&rq->fence,
-		       &i915_fence_ops,
-		       &rq->lock,
-		       rq->timeline->fence_context,
-		       timeline_get_seqno(rq->timeline));
+	dma_fence_init(&rq->fence, &i915_fence_ops, &rq->lock,
+		       tl->fence_context, seqno);
 
 	/* We bump the ref for the fence chain */
 	i915_sw_fence_init(&i915_request_get(rq)->submit, submit_notify);
@@ -710,6 +709,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
 	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
 
+err_free:
 	kmem_cache_free(global.slab_requests, rq);
 err_unreserve:
 	unreserve_gt(i915);
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index b2202d2e58a2..3608e544012f 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -6,19 +6,29 @@
 
 #include "i915_drv.h"
 
-#include "i915_timeline.h"
+#include "i915_active.h"
 #include "i915_syncmap.h"
+#include "i915_timeline.h"
 
 struct i915_timeline_hwsp {
-	struct i915_vma *vma;
+	struct i915_gt_timelines *gt;
 	struct list_head free_link;
+	struct i915_vma *vma;
 	u64 free_bitmap;
 };
 
-static inline struct i915_timeline_hwsp *
-i915_timeline_hwsp(const struct i915_timeline *tl)
+struct i915_timeline_cacheline {
+	struct i915_active active;
+	struct i915_timeline_hwsp *hwsp;
+	void *vaddr;
+	unsigned int cacheline : 6;
+	unsigned int free : 1;
+};
+
+static inline struct drm_i915_private *
+hwsp_to_i915(struct i915_timeline_hwsp *hwsp)
 {
-	return tl->hwsp_ggtt->private;
+	return container_of(hwsp->gt, struct drm_i915_private, gt.timelines);
 }
 
 static struct i915_vma *__hwsp_alloc(struct drm_i915_private *i915)
@@ -71,6 +81,7 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 		vma->private = hwsp;
 		hwsp->vma = vma;
 		hwsp->free_bitmap = ~0ull;
+		hwsp->gt = gt;
 
 		spin_lock(&gt->hwsp_lock);
 		list_add(&hwsp->free_link, &gt->hwsp_free_list);
@@ -88,14 +99,9 @@ hwsp_alloc(struct i915_timeline *timeline, unsigned int *cacheline)
 	return hwsp->vma;
 }
 
-static void hwsp_free(struct i915_timeline *timeline)
+static void __idle_hwsp_free(struct i915_timeline_hwsp *hwsp, int cacheline)
 {
-	struct i915_gt_timelines *gt = &timeline->i915->gt.timelines;
-	struct i915_timeline_hwsp *hwsp;
-
-	hwsp = i915_timeline_hwsp(timeline);
-	if (!hwsp) /* leave global HWSP alone! */
-		return;
+	struct i915_gt_timelines *gt = hwsp->gt;
 
 	spin_lock(&gt->hwsp_lock);
 
@@ -103,7 +109,8 @@ static void hwsp_free(struct i915_timeline *timeline)
 	if (!hwsp->free_bitmap)
 		list_add_tail(&hwsp->free_link, &gt->hwsp_free_list);
 
-	hwsp->free_bitmap |= BIT_ULL(timeline->hwsp_offset / CACHELINE_BYTES);
+	GEM_BUG_ON(cacheline >= BITS_PER_TYPE(hwsp->free_bitmap));
+	hwsp->free_bitmap |= BIT_ULL(cacheline);
 
 	/* And if no one is left using it, give the page back to the system */
 	if (hwsp->free_bitmap == ~0ull) {
@@ -115,6 +122,78 @@ static void hwsp_free(struct i915_timeline *timeline)
 	spin_unlock(&gt->hwsp_lock);
 }
 
+static void __idle_cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(!i915_active_is_idle(&cl->active));
+
+	i915_gem_object_unpin_map(cl->hwsp->vma->obj);
+	i915_vma_put(cl->hwsp->vma);
+	__idle_hwsp_free(cl->hwsp, cl->cacheline);
+
+	i915_active_fini(&cl->active);
+	kfree(cl);
+}
+
+static void __cacheline_retire(struct i915_active *active)
+{
+	struct i915_timeline_cacheline *cl =
+		container_of(active, typeof(*cl), active);
+
+	i915_vma_unpin(cl->hwsp->vma);
+	if (cl->free)
+		__idle_cacheline_free(cl);
+}
+
+static struct i915_timeline_cacheline *
+cacheline_alloc(struct i915_timeline_hwsp *hwsp, unsigned int cacheline)
+{
+	struct i915_timeline_cacheline *cl;
+	void *vaddr;
+
+	GEM_BUG_ON(cacheline >= 64);
+
+	cl = kmalloc(sizeof(*cl), GFP_KERNEL);
+	if (!cl)
+		return ERR_PTR(-ENOMEM);
+
+	vaddr = i915_gem_object_pin_map(hwsp->vma->obj, I915_MAP_WB);
+	if (IS_ERR(vaddr)) {
+		kfree(cl);
+		return ERR_CAST(vaddr);
+	}
+
+	i915_vma_get(hwsp->vma);
+	cl->hwsp = hwsp;
+	cl->vaddr = vaddr;
+	cl->cacheline = cacheline;
+	cl->free = false;
+
+	i915_active_init(hwsp_to_i915(hwsp), &cl->active, __cacheline_retire);
+
+	return cl;
+}
+
+static void cacheline_acquire(struct i915_timeline_cacheline *cl)
+{
+	if (cl && i915_active_acquire(&cl->active))
+		__i915_vma_pin(cl->hwsp->vma);
+}
+
+static void cacheline_release(struct i915_timeline_cacheline *cl)
+{
+	if (cl)
+		i915_active_release(&cl->active);
+}
+
+static void cacheline_free(struct i915_timeline_cacheline *cl)
+{
+	GEM_BUG_ON(cl->free);
+	cl->free = true;
+
+	if (i915_active_is_idle(&cl->active))
+		__idle_cacheline_free(cl);
+}
+
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
 		       const char *name,
@@ -136,29 +215,40 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
+	timeline->hwsp_cacheline = NULL;
 
-	timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
 	if (!hwsp) {
+		struct i915_timeline_cacheline *cl;
 		unsigned int cacheline;
 
 		hwsp = hwsp_alloc(timeline, &cacheline);
 		if (IS_ERR(hwsp))
 			return PTR_ERR(hwsp);
 
+		cl = cacheline_alloc(hwsp->private, cacheline);
+		if (IS_ERR(cl)) {
+			__idle_hwsp_free(hwsp->private, cacheline);
+			return PTR_ERR(cl);
+		}
+
+		timeline->hwsp_cacheline = cl;
 		timeline->hwsp_offset = cacheline * CACHELINE_BYTES;
-	}
-	timeline->hwsp_ggtt = i915_vma_get(hwsp);
 
-	vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
-	if (IS_ERR(vaddr)) {
-		hwsp_free(timeline);
-		i915_vma_put(hwsp);
-		return PTR_ERR(vaddr);
+		vaddr = cl->vaddr;
+	} else {
+		timeline->hwsp_offset = I915_GEM_HWS_SEQNO_ADDR;
+
+		vaddr = i915_gem_object_pin_map(hwsp->obj, I915_MAP_WB);
+		if (IS_ERR(vaddr))
+			return PTR_ERR(vaddr);
 	}
 
 	timeline->hwsp_seqno =
 		memset(vaddr + timeline->hwsp_offset, 0, CACHELINE_BYTES);
 
+	timeline->hwsp_ggtt = i915_vma_get(hwsp);
+	GEM_BUG_ON(timeline->hwsp_offset >= hwsp->size);
+
 	timeline->fence_context = dma_fence_context_alloc(1);
 
 	spin_lock_init(&timeline->lock);
@@ -239,9 +329,12 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 	GEM_BUG_ON(i915_active_request_isset(&timeline->barrier));
 
 	i915_syncmap_free(&timeline->sync);
-	hwsp_free(timeline);
 
-	i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+	if (timeline->hwsp_cacheline)
+		cacheline_free(timeline->hwsp_cacheline);
+	else
+		i915_gem_object_unpin_map(timeline->hwsp_ggtt->obj);
+
 	i915_vma_put(timeline->hwsp_ggtt);
 }
 
@@ -284,6 +377,7 @@ int i915_timeline_pin(struct i915_timeline *tl)
 		i915_ggtt_offset(tl->hwsp_ggtt) +
 		offset_in_page(tl->hwsp_offset);
 
+	cacheline_acquire(tl->hwsp_cacheline);
 	timeline_add_to_active(tl);
 
 	return 0;
@@ -293,6 +387,129 @@ int i915_timeline_pin(struct i915_timeline *tl)
 	return err;
 }
 
+static u32 timeline_advance(struct i915_timeline *tl)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(tl->seqno & tl->has_initial_breadcrumb);
+
+	return tl->seqno += 1 + tl->has_initial_breadcrumb;
+}
+
+static void timeline_rollback(struct i915_timeline *tl)
+{
+	tl->seqno -= 1 + tl->has_initial_breadcrumb;
+}
+
+static noinline int
+__i915_timeline_get_seqno(struct i915_timeline *tl,
+			  struct i915_request *rq,
+			  u32 *seqno)
+{
+	struct i915_timeline_cacheline *cl;
+	struct i915_vma *vma;
+	unsigned int cacheline;
+	int err;
+
+	/*
+	 * If there is an outstanding GPU reference to this cacheline,
+	 * such as it being sampled by a HW semaphore on another timeline,
+	 * we cannot wraparound our seqno value (the HW semaphore does
+	 * a strict greater-than-or-equals compare, not i915_seqno_passed).
+	 * So if the cacheline is still busy, we must detach ourselves
+	 * from it and leave it inflight alongside its users.
+	 *
+	 * However, if nobody is watching and we can guarantee that nobody
+	 * will, we could simply reuse the same cacheline.
+	 *
+	 * if (i915_active_request_is_signaled(&tl->last_request) &&
+	 *     i915_active_is_signaled(&tl->hwsp_cacheline->active))
+	 *	return 0;
+	 *
+	 * That seems unlikely for a busy timeline that needed to wrap in
+	 * the first place, so just replace the cacheline.
+	 */
+
+	vma = hwsp_alloc(tl, &cacheline);
+	if (IS_ERR(vma)) {
+		err = PTR_ERR(vma);
+		goto err_rollback;
+	}
+
+	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
+	if (err) {
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_rollback;
+	}
+
+	cl = cacheline_alloc(vma->private, cacheline);
+	if (IS_ERR(cl)) {
+		err = PTR_ERR(cl);
+		__idle_hwsp_free(vma->private, cacheline);
+		goto err_unpin;
+	}
+	GEM_BUG_ON(cl->hwsp->vma != vma);
+
+	/*
+	 * Attach the old cacheline to the current request, so that we only
+	 * free it after the current request is retired, which ensures that
+	 * all writes into the cacheline from previous requests are complete.
+	 */
+	err = i915_active_ref(&tl->hwsp_cacheline->active,
+			      tl->fence_context, rq);
+	if (err)
+		goto err_cacheline;
+
+	cacheline_release(tl->hwsp_cacheline); /* ownership now xfered to rq */
+	cacheline_free(tl->hwsp_cacheline);
+
+	i915_vma_unpin(tl->hwsp_ggtt); /* binding kept alive by old cacheline */
+	i915_vma_put(tl->hwsp_ggtt);
+
+	tl->hwsp_ggtt = i915_vma_get(vma);
+
+	tl->hwsp_offset = cacheline * CACHELINE_BYTES;
+	tl->hwsp_seqno =
+		memset(cl->vaddr + tl->hwsp_offset, 0, CACHELINE_BYTES);
+
+	tl->hwsp_offset += i915_ggtt_offset(vma);
+
+	cacheline_acquire(cl);
+	tl->hwsp_cacheline = cl;
+
+	*seqno = timeline_advance(tl);
+	GEM_BUG_ON(i915_seqno_passed(*tl->hwsp_seqno, *seqno));
+	return 0;
+
+err_cacheline:
+	cacheline_free(cl);
+err_unpin:
+	i915_vma_unpin(vma);
+err_rollback:
+	timeline_rollback(tl);
+	return err;
+}
+
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno)
+{
+	*seqno = timeline_advance(tl);
+
+	/* Replace the HWSP on wraparound for HW semaphores */
+	if (unlikely(!*seqno && tl->hwsp_cacheline))
+		return __i915_timeline_get_seqno(tl, rq, seqno);
+
+	return 0;
+}
+
+int i915_timeline_read_lock(struct i915_timeline *tl, struct i915_request *rq)
+{
+	GEM_BUG_ON(!tl->pin_count);
+	GEM_BUG_ON(!tl->hwsp_cacheline);
+	return i915_active_ref(&tl->hwsp_cacheline->active,
+			       rq->fence.context, rq);
+}
+
 void i915_timeline_unpin(struct i915_timeline *tl)
 {
 	GEM_BUG_ON(!tl->pin_count);
@@ -300,6 +517,7 @@ void i915_timeline_unpin(struct i915_timeline *tl)
 		return;
 
 	timeline_remove_from_active(tl);
+	cacheline_release(tl->hwsp_cacheline);
 
 	/*
 	 * Since this timeline is idle, all bariers upon which we were waiting
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 7bec7d2e45bf..d78ec6fbc000 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -34,7 +34,7 @@
 #include "i915_utils.h"
 
 struct i915_vma;
-struct i915_timeline_hwsp;
+struct i915_timeline_cacheline;
 
 struct i915_timeline {
 	u64 fence_context;
@@ -49,6 +49,8 @@ struct i915_timeline {
 	struct i915_vma *hwsp_ggtt;
 	u32 hwsp_offset;
 
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
 	bool has_initial_breadcrumb;
 
 	/**
@@ -160,6 +162,11 @@ static inline bool i915_timeline_sync_is_later(struct i915_timeline *tl,
 }
 
 int i915_timeline_pin(struct i915_timeline *tl);
+int i915_timeline_get_seqno(struct i915_timeline *tl,
+			    struct i915_request *rq,
+			    u32 *seqno);
+int i915_timeline_read_lock(struct i915_timeline *tl,
+			    struct i915_request *rq);
 void i915_timeline_unpin(struct i915_timeline *tl);
 
 void i915_timelines_init(struct drm_i915_private *i915);
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 12ea69b1a1e5..844701759ffc 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -641,6 +641,118 @@ static int live_hwsp_alternate(void *arg)
 #undef NUM_TIMELINES
 }
 
+static int live_hwsp_wrap(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	struct i915_timeline *tl;
+	enum intel_engine_id id;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	/*
+	 * Across a seqno wrap, we need to keep the old cacheline alive for
+	 * foreign GPU references.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+	wakeref = intel_runtime_pm_get(i915);
+
+	tl = i915_timeline_create(i915, __func__, NULL);
+	if (IS_ERR(tl)) {
+		err = PTR_ERR(tl);
+		goto out_rpm;
+	}
+	if (!tl->has_initial_breadcrumb || !tl->hwsp_cacheline)
+		goto out_free;
+
+	err = i915_timeline_pin(tl);
+	if (err)
+		goto out_free;
+
+	for_each_engine(engine, i915, id) {
+		const u32 *hwsp_seqno[2];
+		struct i915_request *rq;
+		u32 seqno[2];
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		rq = i915_request_alloc(engine, i915->kernel_context);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			goto out;
+		}
+
+		tl->seqno = -4u;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[0]:%08x, hwsp_offset:%08x\n",
+			 seqno[0], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[0]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[0] = tl->hwsp_seqno;
+
+		err = i915_timeline_get_seqno(tl, rq, &seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		pr_debug("seqno[1]:%08x, hwsp_offset:%08x\n",
+			 seqno[1], tl->hwsp_offset);
+
+		err = emit_ggtt_store_dw(rq, tl->hwsp_offset, seqno[1]);
+		if (err) {
+			i915_request_add(rq);
+			goto out;
+		}
+		hwsp_seqno[1] = tl->hwsp_seqno;
+
+		/* With wrap should come a new hwsp */
+		GEM_BUG_ON(seqno[1] >= seqno[0]);
+		GEM_BUG_ON(hwsp_seqno[0] == hwsp_seqno[1]);
+
+		i915_request_add(rq);
+
+		if (i915_request_wait(rq, I915_WAIT_LOCKED, HZ / 5) < 0) {
+			pr_err("Wait for timeline writes timed out!\n");
+			err = -EIO;
+			goto out;
+		}
+
+		if (*hwsp_seqno[0] != seqno[0] || *hwsp_seqno[1] != seqno[1]) {
+			pr_err("Bad timeline values: found (%x, %x), expected (%x, %x)\n",
+			       *hwsp_seqno[0], *hwsp_seqno[1],
+			       seqno[0], seqno[1]);
+			err = -EINVAL;
+			goto out;
+		}
+
+		i915_retire_requests(i915); /* recycle HWSP */
+	}
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	i915_timeline_unpin(tl);
+out_free:
+	i915_timeline_put(tl);
+out_rpm:
+	intel_runtime_pm_put(i915, wakeref);
+	mutex_unlock(&i915->drm.struct_mutex);
+
+	return err;
+}
+
 static int live_hwsp_recycle(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
@@ -723,6 +835,7 @@ int i915_timeline_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_hwsp_recycle),
 		SUBTEST(live_hwsp_engine),
 		SUBTEST(live_hwsp_alternate),
+		SUBTEST(live_hwsp_wrap),
 	};
 
 	return i915_subtests(tests, i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 12/46] drm/i915/execlists: Refactor out can_merge_rq()
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (10 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 11/46] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 13/46] drm/i915: Compute the global scheduler caps Chris Wilson
                   ` (40 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we add another user that wants to check whether
requests can be merge into a single HW execution, and in the future we
want to add more conditions under which requests from the same context
cannot be merge. In preparation, extract out can_merge_rq().

v2: Reorder tests to decide if we can continue filling ELSP and bonus
comments.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 35 ++++++++++++++++++++++----------
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 553371e654d7..da5120283263 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -285,12 +285,11 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 }
 
 __maybe_unused static inline bool
-assert_priority_queue(const struct intel_engine_execlists *execlists,
-		      const struct i915_request *prev,
+assert_priority_queue(const struct i915_request *prev,
 		      const struct i915_request *next)
 {
-	if (!prev)
-		return true;
+	const struct intel_engine_execlists *execlists =
+		&prev->engine->execlists;
 
 	/*
 	 * Without preemption, the prev may refer to the still active element
@@ -601,6 +600,17 @@ static bool can_merge_ctx(const struct intel_context *prev,
 	return true;
 }
 
+static bool can_merge_rq(const struct i915_request *prev,
+			 const struct i915_request *next)
+{
+	GEM_BUG_ON(!assert_priority_queue(prev, next));
+
+	if (!can_merge_ctx(prev->hw_context, next->hw_context))
+		return false;
+
+	return true;
+}
+
 static void port_assign(struct execlist_port *port, struct i915_request *rq)
 {
 	GEM_BUG_ON(rq == port_request(port));
@@ -753,8 +763,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		int i;
 
 		priolist_for_each_request_consume(rq, rn, p, i) {
-			GEM_BUG_ON(!assert_priority_queue(execlists, last, rq));
-
 			/*
 			 * Can we combine this request with the current port?
 			 * It has to be the same context/ringbuffer and not
@@ -766,8 +774,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			 * second request, and so we never need to tell the
 			 * hardware about the first.
 			 */
-			if (last &&
-			    !can_merge_ctx(rq->hw_context, last->hw_context)) {
+			if (last && !can_merge_rq(last, rq)) {
 				/*
 				 * If we are on the second port and cannot
 				 * combine this request with the last, then we
@@ -776,6 +783,14 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				if (port == last_port)
 					goto done;
 
+				/*
+				 * We must not populate both ELSP[] with the
+				 * same LRCA, i.e. we must submit 2 different
+				 * contexts if we submit 2 ELSP.
+				 */
+				if (last->hw_context == rq->hw_context)
+					goto done;
+
 				/*
 				 * If GVT overrides us we only ever submit
 				 * port[0], leaving port[1] empty. Note that we
@@ -787,7 +802,6 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 				    ctx_single_port_submission(rq->hw_context))
 					goto done;
 
-				GEM_BUG_ON(last->hw_context == rq->hw_context);
 
 				if (submit)
 					port_assign(port, last);
@@ -826,8 +840,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * request triggering preemption on the next dequeue (or subsequent
 	 * interrupt for secondary ports).
 	 */
-	execlists->queue_priority_hint =
-		port != execlists->port ? rq_prio(last) : INT_MIN;
+	execlists->queue_priority_hint = queue_prio(execlists);
 
 	if (submit) {
 		port_assign(port, last);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 13/46] drm/i915: Compute the global scheduler caps
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (11 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 12/46] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 12:24   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 14/46] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
                   ` (39 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Do a pass over all the engines upon starting to determine the global
scheduler capability flags (those that are agreed upon by all).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         |  2 ++
 drivers/gpu/drm/i915/intel_engine_cs.c  | 39 +++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  6 ----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
 4 files changed, 43 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d18c4ccff370..04fa184fdff5 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4728,6 +4728,8 @@ static int __i915_gem_restart_engines(void *data)
 		}
 	}
 
+	intel_engines_set_scheduler_caps(i915);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 49fa43ff02ba..02ee86159adc 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -614,6 +614,45 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
 	return err;
 }
 
+void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
+{
+	static const struct {
+		u8 engine;
+		u8 sched;
+	} map[] = {
+#define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
+		MAP(PREEMPTION, PREEMPTION),
+#undef MAP
+	};
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	u32 enabled, disabled;
+
+	enabled = 0;
+	disabled = 0;
+	for_each_engine(engine, i915, id) { /* all engines must agree! */
+		int i;
+
+		if (engine->schedule)
+			enabled |= (I915_SCHEDULER_CAP_ENABLED |
+				    I915_SCHEDULER_CAP_PRIORITY);
+		else
+			disabled |= (I915_SCHEDULER_CAP_ENABLED |
+				     I915_SCHEDULER_CAP_PRIORITY);
+
+		for (i = 0; i < ARRAY_SIZE(map); i++) {
+			if (engine->flags & BIT(map[i].engine))
+				enabled |= BIT(map[i].sched);
+			else
+				disabled |= BIT(map[i].sched);
+		}
+	}
+
+	i915->caps.scheduler = enabled & ~disabled;
+	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED))
+		i915->caps.scheduler = 0;
+}
+
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index da5120283263..59891cca35c1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2341,12 +2341,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
-
-	engine->i915->caps.scheduler =
-		I915_SCHEDULER_CAP_ENABLED |
-		I915_SCHEDULER_CAP_PRIORITY;
-	if (intel_engine_has_preemption(engine))
-		engine->i915->caps.scheduler |= I915_SCHEDULER_CAP_PREEMPTION;
 }
 
 static void
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index e7d85aaee415..19faa19f2529 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -574,6 +574,8 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
+
 static inline bool __execlists_need_preempt(int prio, int last)
 {
 	/*
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 14/46] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (12 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 13/46] drm/i915: Compute the global scheduler caps Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 15/46] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
                   ` (38 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Having introduced per-context seqno, we now have a means to identity
progress across the system without feel of rollback as befell the
global_seqno. That is we can program a MI_SEMAPHORE_WAIT operation in
advance of submission safe in the knowledge that our target seqno and
address is stable.

However, since we are telling the GPU to busy-spin on the target address
until it matches the signaling seqno, we only want to do so when we are
sure that busy-spin will be completed quickly. To achieve this we only
submit the request to HW once the signaler is itself executing (modulo
preemption causing us to wait longer), and we only do so for default and
above priority requests (so that idle priority tasks never themselves
hog the GPU waiting for others).

As might be reasonably expected, HW semaphores excel in inter-engine
synchronisation microbenchmarks (where the reduced latency / increased
throughput more than offset the power cost of spinning on a second ring)
and have significant improvement for single clients that utilize multiple
engines (typically media players), without regressing multiple clients
that can saturate the system.

v3: Drop the older NEQ branch, now we pin the signaler's HWSP anyway.
v4: Tell the world and include it as part of scheduler caps.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c           |   2 +-
 drivers/gpu/drm/i915/i915_request.c       | 136 +++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_request.h       |   1 +
 drivers/gpu/drm/i915/i915_sw_fence.c      |   4 +-
 drivers/gpu/drm/i915/i915_sw_fence.h      |   3 +
 drivers/gpu/drm/i915/intel_engine_cs.c    |   1 +
 drivers/gpu/drm/i915/intel_gpu_commands.h |   5 +
 drivers/gpu/drm/i915/intel_lrc.c          |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h   |   7 ++
 include/uapi/drm/i915_drm.h               |   1 +
 10 files changed, 156 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 7de90701f6f1..f5a3558e00fd 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -349,7 +349,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 		value = min_t(int, INTEL_PPGTT(dev_priv), I915_GEM_PPGTT_FULL);
 		break;
 	case I915_PARAM_HAS_SEMAPHORES:
-		value = 0;
+		value = !!(dev_priv->caps.scheduler & I915_SCHEDULER_CAP_SEMAPHORES);
 		break;
 	case I915_PARAM_HAS_SECURE_BATCHES:
 		value = capable(CAP_SYS_ADMIN);
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 13e0388791b6..510b3c444413 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -22,8 +22,9 @@
  *
  */
 
-#include <linux/prefetch.h>
 #include <linux/dma-fence-array.h>
+#include <linux/irq_work.h>
+#include <linux/prefetch.h>
 #include <linux/sched.h>
 #include <linux/sched/clock.h>
 #include <linux/sched/signal.h>
@@ -32,9 +33,16 @@
 #include "i915_active.h"
 #include "i915_reset.h"
 
+struct execute_cb {
+	struct list_head link;
+	struct irq_work work;
+	struct i915_sw_fence *fence;
+};
+
 static struct i915_global_request {
 	struct kmem_cache *slab_requests;
 	struct kmem_cache *slab_dependencies;
+	struct kmem_cache *slab_execute_cbs;
 } global;
 
 static const char *i915_fence_get_driver_name(struct dma_fence *fence)
@@ -331,6 +339,69 @@ void i915_request_retire_upto(struct i915_request *rq)
 	} while (tmp != rq);
 }
 
+static void irq_execute_cb(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	i915_sw_fence_complete(cb->fence);
+	kmem_cache_free(global.slab_execute_cbs, cb);
+}
+
+static void __notify_execute_cb(struct i915_request *rq)
+{
+	struct execute_cb *cb;
+
+	lockdep_assert_held(&rq->lock);
+
+	if (list_empty(&rq->execute_cb))
+		return;
+
+	list_for_each_entry(cb, &rq->execute_cb, link)
+		irq_work_queue(&cb->work);
+
+	/*
+	 * XXX Rollback on __i915_request_unsubmit()
+	 *
+	 * In the future, perhaps when we have an active time-slicing scheduler,
+	 * it will be interesting to unsubmit parallel execution and remove
+	 * busywaits from the GPU until their master is restarted. This is
+	 * quite hairy, we have to carefully rollback the fence and do a
+	 * preempt-to-idle cycle on the target engine, all the while the
+	 * master execute_cb may refire.
+	 */
+	INIT_LIST_HEAD(&rq->execute_cb);
+}
+
+static int
+i915_request_await_execution(struct i915_request *rq,
+			     struct i915_request *signal,
+			     gfp_t gfp)
+{
+	struct execute_cb *cb;
+
+	if (i915_request_is_active(signal))
+		return 0;
+
+	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
+	if (!cb)
+		return -ENOMEM;
+
+	cb->fence = &rq->submit;
+	i915_sw_fence_await(cb->fence);
+	init_irq_work(&cb->work, irq_execute_cb);
+
+	spin_lock_irq(&signal->lock);
+	if (i915_request_is_active(signal)) {
+		i915_sw_fence_complete(cb->fence);
+		kmem_cache_free(global.slab_execute_cbs, cb);
+	} else {
+		list_add_tail(&cb->link, &signal->execute_cb);
+	}
+	spin_unlock_irq(&signal->lock);
+
+	return 0;
+}
+
 static void move_to_timeline(struct i915_request *request,
 			     struct i915_timeline *timeline)
 {
@@ -389,6 +460,7 @@ void __i915_request_submit(struct i915_request *request)
 	 */
 	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
 	request->sched.attr.priority |= __NO_PREEMPTION;
+	__notify_execute_cb(request);
 
 	spin_unlock(&request->lock);
 
@@ -630,6 +702,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	}
 
 	INIT_LIST_HEAD(&rq->active_list);
+	INIT_LIST_HEAD(&rq->execute_cb);
 
 	tl = ce->ring->timeline;
 	ret = i915_timeline_get_seqno(tl, rq, &seqno);
@@ -717,6 +790,51 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	return ERR_PTR(ret);
 }
 
+static int
+emit_semaphore_wait(struct i915_request *to,
+		    struct i915_request *from,
+		    gfp_t gfp)
+{
+	u32 *cs;
+	int err;
+
+	GEM_BUG_ON(!from->timeline->has_initial_breadcrumb);
+	GEM_BUG_ON(INTEL_GEN(to->i915) < 8);
+
+	/* We need to pin the signaler's HWSP until we are finished reading. */
+	err = i915_timeline_read_lock(from->timeline, to);
+	if (err)
+		return err;
+
+	/* Only submit our spinner after the signaler is running! */
+	err = i915_request_await_execution(to, from, gfp);
+	if (err)
+		return err;
+
+	cs = intel_ring_begin(to, 4);
+	if (IS_ERR(cs))
+		return PTR_ERR(cs);
+
+	/*
+	 * Using greater-than-or-equal here means we have to worry
+	 * about seqno wraparound. To side step that issue, we swap
+	 * the timeline HWSP upon wrapping, so that everyone listening
+	 * for the old (pre-wrap) values do not see the much smaller
+	 * (post-wrap) values than they were expecting (and so wait
+	 * forever).
+	 */
+	*cs++ = MI_SEMAPHORE_WAIT |
+		MI_SEMAPHORE_GLOBAL_GTT |
+		MI_SEMAPHORE_POLL |
+		MI_SEMAPHORE_SAD_GTE_SDD;
+	*cs++ = from->fence.seqno;
+	*cs++ = from->timeline->hwsp_offset;
+	*cs++ = 0;
+
+	intel_ring_advance(to, cs);
+	return 0;
+}
+
 static int
 i915_request_await_request(struct i915_request *to, struct i915_request *from)
 {
@@ -738,6 +856,9 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
 		ret = i915_sw_fence_await_sw_fence_gfp(&to->submit,
 						       &from->submit,
 						       I915_FENCE_GFP);
+	} else if (intel_engine_has_semaphores(to->engine) &&
+		   to->gem_context->sched.priority >= I915_PRIORITY_NORMAL) {
+		ret = emit_semaphore_wait(to, from, I915_FENCE_GFP);
 	} else {
 		ret = i915_sw_fence_await_dma_fence(&to->submit,
 						    &from->fence, 0,
@@ -1212,14 +1333,23 @@ int __init i915_global_request_init(void)
 	if (!global.slab_requests)
 		return -ENOMEM;
 
+	global.slab_execute_cbs = KMEM_CACHE(execute_cb,
+					     SLAB_HWCACHE_ALIGN |
+					     SLAB_RECLAIM_ACCOUNT |
+					     SLAB_TYPESAFE_BY_RCU);
+	if (!global.slab_execute_cbs)
+		goto err_requests;
+
 	global.slab_dependencies = KMEM_CACHE(i915_dependency,
 					      SLAB_HWCACHE_ALIGN |
 					      SLAB_RECLAIM_ACCOUNT);
 	if (!global.slab_dependencies)
-		goto err_requests;
+		goto err_execute_cbs;
 
 	return 0;
 
+err_execute_cbs:
+	kmem_cache_destroy(global.slab_execute_cbs);
 err_requests:
 	kmem_cache_destroy(global.slab_requests);
 	return -ENOMEM;
@@ -1228,11 +1358,13 @@ int __init i915_global_request_init(void)
 void i915_global_request_shrink(void)
 {
 	kmem_cache_shrink(global.slab_dependencies);
+	kmem_cache_shrink(global.slab_execute_cbs);
 	kmem_cache_shrink(global.slab_requests);
 }
 
 void i915_global_request_exit(void)
 {
 	kmem_cache_destroy(global.slab_dependencies);
+	kmem_cache_destroy(global.slab_execute_cbs);
 	kmem_cache_destroy(global.slab_requests);
 }
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 071ff1064579..df52776b26cf 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -128,6 +128,7 @@ struct i915_request {
 	 */
 	struct i915_sw_fence submit;
 	wait_queue_entry_t submitq;
+	struct list_head execute_cb;
 
 	/*
 	 * A list of everyone we wait upon, and everyone who waits upon us.
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 7c58b049ecb5..8d1400d378d7 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -192,7 +192,7 @@ static void __i915_sw_fence_complete(struct i915_sw_fence *fence,
 	__i915_sw_fence_notify(fence, FENCE_FREE);
 }
 
-static void i915_sw_fence_complete(struct i915_sw_fence *fence)
+void i915_sw_fence_complete(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 
@@ -202,7 +202,7 @@ static void i915_sw_fence_complete(struct i915_sw_fence *fence)
 	__i915_sw_fence_complete(fence, NULL);
 }
 
-static void i915_sw_fence_await(struct i915_sw_fence *fence)
+void i915_sw_fence_await(struct i915_sw_fence *fence)
 {
 	debug_fence_assert(fence);
 	WARN_ON(atomic_inc_return(&fence->pending) <= 1);
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 0e055ea0179f..6dec9e1d1102 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -79,6 +79,9 @@ int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    unsigned long timeout,
 				    gfp_t gfp);
 
+void i915_sw_fence_await(struct i915_sw_fence *fence);
+void i915_sw_fence_complete(struct i915_sw_fence *fence);
+
 static inline bool i915_sw_fence_signaled(const struct i915_sw_fence *fence)
 {
 	return atomic_read(&fence->pending) <= 0;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 02ee86159adc..57cfc4c551c9 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -622,6 +622,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 	} map[] = {
 #define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
 		MAP(PREEMPTION, PREEMPTION),
+		MAP(SEMAPHORES, SEMAPHORES),
 #undef MAP
 	};
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/intel_gpu_commands.h b/drivers/gpu/drm/i915/intel_gpu_commands.h
index b96a31bc1080..0efaadd3bc32 100644
--- a/drivers/gpu/drm/i915/intel_gpu_commands.h
+++ b/drivers/gpu/drm/i915/intel_gpu_commands.h
@@ -106,7 +106,12 @@
 #define   MI_SEMAPHORE_TARGET(engine)	((engine)<<15)
 #define MI_SEMAPHORE_WAIT	MI_INSTR(0x1c, 2) /* GEN8+ */
 #define   MI_SEMAPHORE_POLL		(1<<15)
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
 #define   MI_SEMAPHORE_SAD_GTE_SDD	(1<<12)
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
 #define MI_STORE_DWORD_IMM	MI_INSTR(0x20, 1)
 #define MI_STORE_DWORD_IMM_GEN4	MI_INSTR(0x20, 2)
 #define   MI_MEM_VIRTUAL	(1 << 22) /* 945,g33,965 */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 59891cca35c1..69a0696d7d7e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2338,6 +2338,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 	engine->park = NULL;
 	engine->unpark = NULL;
 
+	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
 	if (engine->i915->preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 19faa19f2529..493a72ed01af 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -497,6 +497,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
 	unsigned int flags;
 
 	/*
@@ -574,6 +575,12 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
 }
 
+static inline bool
+intel_engine_has_semaphores(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
+}
+
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
 static inline bool __execlists_need_preempt(int prio, int last)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 397810fa2d33..05f79e1a35d1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -476,6 +476,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_ENABLED	(1ul << 0)
 #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
+#define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 15/46] drm/i915: Prioritise non-busywait semaphore workloads
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (13 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 14/46] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 16/46] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
                   ` (37 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

We don't want to busywait on the GPU if we have other work to do. If we
give non-busywaiting workloads higher (initial) priority than workloads
that require a busywait, we will prioritise work that is ready to run
immediately. We then also have to be careful that we don't give earlier
semaphores an accidental boost because later work doesn't wait on other
rings, hence we keep a history of semaphore usage of the dependency chain.

Testcase: igt/gem_exec_schedule/semaphore
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c   | 16 ++++++++++++++++
 drivers/gpu/drm/i915/i915_scheduler.c | 10 ++++++++++
 drivers/gpu/drm/i915/i915_scheduler.h |  9 ++++++---
 drivers/gpu/drm/i915/intel_lrc.c      |  2 +-
 4 files changed, 33 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 510b3c444413..678da705e222 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -832,6 +832,7 @@ emit_semaphore_wait(struct i915_request *to,
 	*cs++ = 0;
 
 	intel_ring_advance(to, cs);
+	to->sched.semaphore |= I915_SCHED_HAS_SEMAPHORE;
 	return 0;
 }
 
@@ -1102,6 +1103,21 @@ void i915_request_add(struct i915_request *request)
 	if (engine->schedule) {
 		struct i915_sched_attr attr = request->gem_context->sched;
 
+		/*
+		 * Boost actual workloads past semaphores!
+		 *
+		 * With semaphores we spin on one engine waiting for another,
+		 * simply to reduce the latency of starting our work when
+		 * the signaler completes. However, if there is any other
+		 * work that we could be doing on this engine instead, that
+		 * is better utilisation and will reduce the overall duration
+		 * of the current work. To avoid PI boosting a semaphore
+		 * far in the distance past over useful work, we keep a history
+		 * of any semaphore use along our dependency chain.
+		 */
+		if (!request->sched.semaphore)
+			attr.priority |= I915_PRIORITY_NOSEMAPHORE;
+
 		/*
 		 * Boost priorities to new clients (new request flows).
 		 *
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index 720cc91b4d10..d579e21c0cf4 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -28,12 +28,18 @@ static inline bool node_signaled(const struct i915_sched_node *node)
 	return i915_request_completed(node_to_request(node));
 }
 
+static inline bool node_started(const struct i915_sched_node *node)
+{
+	return i915_request_started(node_to_request(node));
+}
+
 void i915_sched_node_init(struct i915_sched_node *node)
 {
 	INIT_LIST_HEAD(&node->signalers_list);
 	INIT_LIST_HEAD(&node->waiters_list);
 	INIT_LIST_HEAD(&node->link);
 	node->attr.priority = I915_PRIORITY_INVALID;
+	node->semaphore = 0;
 }
 
 static struct i915_dependency *
@@ -64,6 +70,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
 		dep->signaler = signal;
 		dep->flags = flags;
 
+		/* Keep track of whether anyone on this chain has a semaphore */
+		if (signal->semaphore && !node_started(signal))
+			node->semaphore |= signal->semaphore << 1;
+
 		ret = true;
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 5196ce07b6c2..24c2c027fd2c 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -24,14 +24,15 @@ enum {
 	I915_PRIORITY_INVALID = INT_MIN
 };
 
-#define I915_USER_PRIORITY_SHIFT 2
+#define I915_USER_PRIORITY_SHIFT 3
 #define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
 
 #define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
 #define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
 
-#define I915_PRIORITY_WAIT	((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
 
 #define __NO_PREEMPTION (I915_PRIORITY_WAIT)
 
@@ -74,6 +75,8 @@ struct i915_sched_node {
 	struct list_head waiters_list; /* those after us, they depend upon us */
 	struct list_head link;
 	struct i915_sched_attr attr;
+	unsigned long semaphore;
+#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
 };
 
 struct i915_dependency {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 69a0696d7d7e..d105187070d4 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -164,7 +164,7 @@
 #define WA_TAIL_DWORDS 2
 #define WA_TAIL_BYTES (sizeof(u32) * WA_TAIL_DWORDS)
 
-#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT)
+#define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 16/46] drm/i915: Show support for accurate sw PMU busyness tracking
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (14 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 15/46] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() Chris Wilson
                   ` (36 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Expose whether or not we support the PMU software tracking in our
scheduler capabilities, so userspace can query at runtime.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 include/uapi/drm/i915_drm.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 05f79e1a35d1..b641c55420b6 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -477,6 +477,7 @@ typedef struct drm_i915_irq_wait {
 #define   I915_SCHEDULER_CAP_PRIORITY	(1ul << 1)
 #define   I915_SCHEDULER_CAP_PREEMPTION	(1ul << 2)
 #define   I915_SCHEDULER_CAP_SEMAPHORES	(1ul << 3)
+#define   I915_SCHEDULER_CAP_PMU	(1ul << 4)
 
 #define I915_PARAM_HUC_STATUS		 42
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (15 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 16/46] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 18:06   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
                   ` (35 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: Eero Tamminen

As time goes by, usage of generic ioctls such as drm_syncobj and
sync_file are on the increase bypassing i915-specific ioctls like
GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
we track the file/client and account the waitboosting to them. However,
since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for
signaling threads"), we no longer have been applying the client
ratelimiting on waitboosts and so that information has only been used
for debug tracking.

Push the application of waitboosting down to the common
i915_request_wait, and apply it to all foreign fence waits as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Eero Tamminen <eero.t.tamminen@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  | 19 +-----
 drivers/gpu/drm/i915/i915_drv.h      |  7 +--
 drivers/gpu/drm/i915/i915_gem.c      | 86 ++++++----------------------
 drivers/gpu/drm/i915/i915_request.c  | 21 ++++++-
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_drv.h     |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  5 +-
 7 files changed, 44 insertions(+), 98 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 8a488ffc8b7d..af53a2d07f6b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2020,11 +2020,9 @@ static const char *rps_power_to_str(unsigned int power)
 static int i915_rps_boost_info(struct seq_file *m, void *data)
 {
 	struct drm_i915_private *dev_priv = node_to_i915(m->private);
-	struct drm_device *dev = &dev_priv->drm;
 	struct intel_rps *rps = &dev_priv->gt_pm.rps;
 	u32 act_freq = rps->cur_freq;
 	intel_wakeref_t wakeref;
-	struct drm_file *file;
 
 	with_intel_runtime_pm_if_in_use(dev_priv, wakeref) {
 		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
@@ -2058,22 +2056,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
 		   intel_gpu_freq(dev_priv, rps->efficient_freq),
 		   intel_gpu_freq(dev_priv, rps->boost_freq));
 
-	mutex_lock(&dev->filelist_mutex);
-	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
-		struct drm_i915_file_private *file_priv = file->driver_priv;
-		struct task_struct *task;
-
-		rcu_read_lock();
-		task = pid_task(file->pid, PIDTYPE_PID);
-		seq_printf(m, "%s [%d]: %d boosts\n",
-			   task ? task->comm : "<unknown>",
-			   task ? task->pid : -1,
-			   atomic_read(&file_priv->rps_client.boosts));
-		rcu_read_unlock();
-	}
-	seq_printf(m, "Kernel (anonymous) boosts: %d\n",
-		   atomic_read(&rps->boosts));
-	mutex_unlock(&dev->filelist_mutex);
+	seq_printf(m, "Wait boosts: %d\n", atomic_read(&rps->boosts));
 
 	if (INTEL_GEN(dev_priv) >= 6 &&
 	    rps->enabled &&
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a365b1a2ea9a..4d697b1002af 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -217,10 +217,6 @@ struct drm_i915_file_private {
 	} mm;
 	struct idr context_idr;
 
-	struct intel_rps_client {
-		atomic_t boosts;
-	} rps_client;
-
 	unsigned int bsd_engine;
 
 /*
@@ -3041,8 +3037,7 @@ void i915_gem_resume(struct drm_i915_private *dev_priv);
 vm_fault_t i915_gem_fault(struct vm_fault *vmf);
 int i915_gem_object_wait(struct drm_i915_gem_object *obj,
 			 unsigned int flags,
-			 long timeout,
-			 struct intel_rps_client *rps);
+			 long timeout);
 int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
 				  unsigned int flags,
 				  const struct i915_sched_attr *attr);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 04fa184fdff5..78b9aa57932d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -421,8 +421,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
 static long
 i915_gem_object_wait_fence(struct dma_fence *fence,
 			   unsigned int flags,
-			   long timeout,
-			   struct intel_rps_client *rps_client)
+			   long timeout)
 {
 	struct i915_request *rq;
 
@@ -440,27 +439,6 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
 	if (i915_request_completed(rq))
 		goto out;
 
-	/*
-	 * This client is about to stall waiting for the GPU. In many cases
-	 * this is undesirable and limits the throughput of the system, as
-	 * many clients cannot continue processing user input/output whilst
-	 * blocked. RPS autotuning may take tens of milliseconds to respond
-	 * to the GPU load and thus incurs additional latency for the client.
-	 * We can circumvent that by promoting the GPU frequency to maximum
-	 * before we wait. This makes the GPU throttle up much more quickly
-	 * (good for benchmarks and user experience, e.g. window animations),
-	 * but at a cost of spending more power processing the workload
-	 * (bad for battery). Not all clients even want their results
-	 * immediately and for them we should just let the GPU select its own
-	 * frequency to maximise efficiency. To prevent a single client from
-	 * forcing the clocks too high for the whole system, we only allow
-	 * each client to waitboost once in a busy period.
-	 */
-	if (rps_client && !i915_request_started(rq)) {
-		if (INTEL_GEN(rq->i915) >= 6)
-			gen6_rps_boost(rq, rps_client);
-	}
-
 	timeout = i915_request_wait(rq, flags, timeout);
 
 out:
@@ -473,8 +451,7 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
 static long
 i915_gem_object_wait_reservation(struct reservation_object *resv,
 				 unsigned int flags,
-				 long timeout,
-				 struct intel_rps_client *rps_client)
+				 long timeout)
 {
 	unsigned int seq = __read_seqcount_begin(&resv->seq);
 	struct dma_fence *excl;
@@ -492,8 +469,7 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
 
 		for (i = 0; i < count; i++) {
 			timeout = i915_gem_object_wait_fence(shared[i],
-							     flags, timeout,
-							     rps_client);
+							     flags, timeout);
 			if (timeout < 0)
 				break;
 
@@ -519,8 +495,7 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
 	}
 
 	if (excl && timeout >= 0)
-		timeout = i915_gem_object_wait_fence(excl, flags, timeout,
-						     rps_client);
+		timeout = i915_gem_object_wait_fence(excl, flags, timeout);
 
 	dma_fence_put(excl);
 
@@ -614,30 +589,19 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
  * @obj: i915 gem object
  * @flags: how to wait (under a lock, for all rendering or just for writes etc)
  * @timeout: how long to wait
- * @rps_client: client (user process) to charge for any waitboosting
  */
 int
 i915_gem_object_wait(struct drm_i915_gem_object *obj,
 		     unsigned int flags,
-		     long timeout,
-		     struct intel_rps_client *rps_client)
+		     long timeout)
 {
 	might_sleep();
 	GEM_BUG_ON(timeout < 0);
 
-	timeout = i915_gem_object_wait_reservation(obj->resv,
-						   flags, timeout,
-						   rps_client);
+	timeout = i915_gem_object_wait_reservation(obj->resv, flags, timeout);
 	return timeout < 0 ? timeout : 0;
 }
 
-static struct intel_rps_client *to_rps_client(struct drm_file *file)
-{
-	struct drm_i915_file_private *fpriv = file->driver_priv;
-
-	return &fpriv->rps_client;
-}
-
 static int
 i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
 		     struct drm_i915_gem_pwrite *args,
@@ -843,8 +807,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED,
-				   MAX_SCHEDULE_TIMEOUT,
-				   NULL);
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		return ret;
 
@@ -896,8 +859,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
 				   I915_WAIT_ALL,
-				   MAX_SCHEDULE_TIMEOUT,
-				   NULL);
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		return ret;
 
@@ -1159,8 +1121,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
 
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE,
-				   MAX_SCHEDULE_TIMEOUT,
-				   to_rps_client(file));
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		goto out;
 
@@ -1459,8 +1420,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_ALL,
-				   MAX_SCHEDULE_TIMEOUT,
-				   to_rps_client(file));
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		goto err;
 
@@ -1558,8 +1518,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_PRIORITY |
 				   (write_domain ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT,
-				   to_rps_client(file));
+				   MAX_SCHEDULE_TIMEOUT);
 	if (err)
 		goto out;
 
@@ -1850,8 +1809,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 	 */
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE,
-				   MAX_SCHEDULE_TIMEOUT,
-				   NULL);
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		goto err;
 
@@ -3181,8 +3139,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_PRIORITY |
 				   I915_WAIT_ALL,
-				   to_wait_timeout(args->timeout_ns),
-				   to_rps_client(file));
+				   to_wait_timeout(args->timeout_ns));
 
 	if (args->timeout_ns > 0) {
 		args->timeout_ns -= ktime_to_ns(ktime_sub(ktime_get(), start));
@@ -3251,7 +3208,7 @@ wait_for_timelines(struct drm_i915_private *i915,
 		 * stalls, so allow the gpu to boost to maximum clocks.
 		 */
 		if (flags & I915_WAIT_FOR_IDLE_BOOST)
-			gen6_rps_boost(rq, NULL);
+			gen6_rps_boost(rq);
 
 		timeout = i915_request_wait(rq, flags, timeout);
 		i915_request_put(rq);
@@ -3346,8 +3303,7 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
 				   (write ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT,
-				   NULL);
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		return ret;
 
@@ -3409,8 +3365,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
 				   (write ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT,
-				   NULL);
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		return ret;
 
@@ -3525,8 +3480,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
 					   I915_WAIT_INTERRUPTIBLE |
 					   I915_WAIT_LOCKED |
 					   I915_WAIT_ALL,
-					   MAX_SCHEDULE_TIMEOUT,
-					   NULL);
+					   MAX_SCHEDULE_TIMEOUT);
 		if (ret)
 			return ret;
 
@@ -3664,8 +3618,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
 
 	ret = i915_gem_object_wait(obj,
 				   I915_WAIT_INTERRUPTIBLE,
-				   MAX_SCHEDULE_TIMEOUT,
-				   to_rps_client(file));
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		goto out;
 
@@ -3791,8 +3744,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
 				   I915_WAIT_INTERRUPTIBLE |
 				   I915_WAIT_LOCKED |
 				   (write ? I915_WAIT_ALL : 0),
-				   MAX_SCHEDULE_TIMEOUT,
-				   NULL);
+				   MAX_SCHEDULE_TIMEOUT);
 	if (ret)
 		return ret;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 678da705e222..eed66d3606d9 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -81,7 +81,9 @@ static signed long i915_fence_wait(struct dma_fence *fence,
 				   bool interruptible,
 				   signed long timeout)
 {
-	return i915_request_wait(to_request(fence), interruptible, timeout);
+	return i915_request_wait(to_request(fence),
+				 interruptible | I915_WAIT_PRIORITY,
+				 timeout);
 }
 
 static void i915_fence_release(struct dma_fence *fence)
@@ -1288,8 +1290,23 @@ long i915_request_wait(struct i915_request *rq,
 	if (__i915_spin_request(rq, state, 5))
 		goto out;
 
-	if (flags & I915_WAIT_PRIORITY)
+	/*
+	 * This client is about to stall waiting for the GPU. In many cases
+	 * this is undesirable and limits the throughput of the system, as
+	 * many clients cannot continue processing user input/output whilst
+	 * blocked. RPS autotuning may take tens of milliseconds to respond
+	 * to the GPU load and thus incurs additional latency for the client.
+	 * We can circumvent that by promoting the GPU frequency to maximum
+	 * before we sleep. This makes the GPU throttle up much more quickly
+	 * (good for benchmarks and user experience, e.g. window animations),
+	 * but at a cost of spending more power processing the workload
+	 * (bad for battery).
+	 */
+	if (flags & I915_WAIT_PRIORITY) {
+		if (!i915_request_started(rq) && INTEL_GEN(rq->i915) >= 6)
+			gen6_rps_boost(rq);
 		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
+	}
 
 	wait.tsk = current;
 	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 4d5ec929f987..f8657e53fe68 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -13396,7 +13396,7 @@ static int do_rps_boost(struct wait_queue_entry *_wait,
 	 * vblank without our intervention, so leave RPS alone.
 	 */
 	if (!i915_request_started(rq))
-		gen6_rps_boost(rq, NULL);
+		gen6_rps_boost(rq);
 	i915_request_put(rq);
 
 	drm_crtc_vblank_put(wait->crtc);
diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
index 90ba5436370e..47b4f88da7eb 100644
--- a/drivers/gpu/drm/i915/intel_drv.h
+++ b/drivers/gpu/drm/i915/intel_drv.h
@@ -2249,7 +2249,7 @@ void intel_suspend_gt_powersave(struct drm_i915_private *dev_priv);
 void gen6_rps_busy(struct drm_i915_private *dev_priv);
 void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
 void gen6_rps_idle(struct drm_i915_private *dev_priv);
-void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
+void gen6_rps_boost(struct i915_request *rq);
 void g4x_wm_get_hw_state(struct drm_i915_private *dev_priv);
 void vlv_wm_get_hw_state(struct drm_i915_private *dev_priv);
 void ilk_wm_get_hw_state(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 737005bf6816..58514a17f134 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6693,8 +6693,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
 	mutex_unlock(&dev_priv->pcu_lock);
 }
 
-void gen6_rps_boost(struct i915_request *rq,
-		    struct intel_rps_client *rps_client)
+void gen6_rps_boost(struct i915_request *rq)
 {
 	struct intel_rps *rps = &rq->i915->gt_pm.rps;
 	unsigned long flags;
@@ -6723,7 +6722,7 @@ void gen6_rps_boost(struct i915_request *rq,
 	if (READ_ONCE(rps->cur_freq) < rps->boost_freq)
 		schedule_work(&rps->work);
 
-	atomic_inc(rps_client ? &rps_client->boosts : &rps->boosts);
+	atomic_inc(&rps->boosts);
 }
 
 int intel_set_rps(struct drm_i915_private *dev_priv, u8 val)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (16 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 12:40   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer Chris Wilson
                   ` (34 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

To determine whether an engine has 'stuck', we simply check whether or
not is still on the same seqno for several seconds. To keep this simple
mechanism intact over the loss of a global seqno, we can simply add a
new global heartbeat seqno instead. As we cannot know the sequence in
which requests will then be completed, we use a primitive random number
generator instead (with a cycle long enough to not matter over an
interval of a few thousand requests between hangcheck samples).

The alternative to using a dedicated seqno on every request is to issue
a heartbeat request and query its progress through the system. Sadly
this requires us to reduce struct_mutex so that we can issue requests
without requiring that bkl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  7 ++---
 drivers/gpu/drm/i915/intel_engine_cs.c  |  5 ++--
 drivers/gpu/drm/i915/intel_hangcheck.c  |  6 ++---
 drivers/gpu/drm/i915/intel_lrc.c        | 15 +++++++++++
 drivers/gpu/drm/i915/intel_ringbuffer.c | 36 +++++++++++++++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.h | 19 ++++++++++++-
 6 files changed, 77 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index af53a2d07f6b..846bd0de3cfa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1295,7 +1295,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	with_intel_runtime_pm(dev_priv, wakeref) {
 		for_each_engine(engine, dev_priv, id) {
 			acthd[id] = intel_engine_get_active_head(engine);
-			seqno[id] = intel_engine_get_seqno(engine);
+			seqno[id] = intel_engine_get_hangcheck_seqno(engine);
 		}
 
 		intel_engine_get_instdone(dev_priv->engine[RCS], &instdone);
@@ -1315,8 +1315,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
 	for_each_engine(engine, dev_priv, id) {
 		seq_printf(m, "%s:\n", engine->name);
 		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
-			   engine->hangcheck.seqno, seqno[id],
-			   intel_engine_last_submit(engine),
+			   engine->hangcheck.last_seqno,
+			   seqno[id],
+			   engine->hangcheck.next_seqno,
 			   jiffies_to_msecs(jiffies -
 					    engine->hangcheck.action_timestamp));
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 57cfc4c551c9..e1e54b7448b4 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1538,10 +1538,11 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
+	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x/%x [%d ms]\n",
 		   intel_engine_get_seqno(engine),
 		   intel_engine_last_submit(engine),
-		   engine->hangcheck.seqno,
+		   engine->hangcheck.last_seqno,
+		   engine->hangcheck.next_seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
 	drm_printf(m, "\tReset count: %d (global %d)\n",
 		   i915_reset_engine_count(error, engine),
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index a219c796e56d..e04b2560369e 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -133,21 +133,21 @@ static void hangcheck_load_sample(struct intel_engine_cs *engine,
 				  struct hangcheck *hc)
 {
 	hc->acthd = intel_engine_get_active_head(engine);
-	hc->seqno = intel_engine_get_seqno(engine);
+	hc->seqno = intel_engine_get_hangcheck_seqno(engine);
 }
 
 static void hangcheck_store_sample(struct intel_engine_cs *engine,
 				   const struct hangcheck *hc)
 {
 	engine->hangcheck.acthd = hc->acthd;
-	engine->hangcheck.seqno = hc->seqno;
+	engine->hangcheck.last_seqno = hc->seqno;
 }
 
 static enum intel_engine_hangcheck_action
 hangcheck_get_action(struct intel_engine_cs *engine,
 		     const struct hangcheck *hc)
 {
-	if (engine->hangcheck.seqno != hc->seqno)
+	if (engine->hangcheck.last_seqno != hc->seqno)
 		return ENGINE_ACTIVE_SEQNO;
 
 	if (intel_engine_is_idle(engine))
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d105187070d4..342d3a91be03 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -180,6 +180,12 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 		I915_GEM_HWS_INDEX_ADDR);
 }
 
+static inline u32 intel_hws_hangcheck_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_HANGCHECK_ADDR);
+}
+
 static inline struct i915_priolist *to_priolist(struct rb_node *rb)
 {
 	return rb_entry(rb, struct i915_priolist, node);
@@ -2235,6 +2241,10 @@ static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 				  request->fence.seqno,
 				  request->timeline->hwsp_offset);
 
+	cs = gen8_emit_ggtt_write(cs,
+				  intel_engine_next_hangcheck_seqno(request->engine),
+				  intel_hws_hangcheck_address(request->engine));
+
 	cs = gen8_emit_ggtt_write(cs,
 				  request->global_seqno,
 				  intel_hws_seqno_address(request->engine));
@@ -2259,6 +2269,11 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				      PIPE_CONTROL_FLUSH_ENABLE |
 				      PIPE_CONTROL_CS_STALL);
 
+	cs = gen8_emit_ggtt_write_rcs(cs,
+				      intel_engine_next_hangcheck_seqno(request->engine),
+				      intel_hws_hangcheck_address(request->engine),
+				      PIPE_CONTROL_CS_STALL);
+
 	cs = gen8_emit_ggtt_write_rcs(cs,
 				      request->global_seqno,
 				      intel_hws_seqno_address(request->engine),
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7f841dba87b3..870184bbd169 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -43,6 +43,12 @@
  */
 #define LEGACY_REQUEST_SIZE 200
 
+static inline u32 hws_hangcheck_address(struct intel_engine_cs *engine)
+{
+	return (i915_ggtt_offset(engine->status_page.vma) +
+		I915_GEM_HWS_HANGCHECK_ADDR);
+}
+
 static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
 {
 	return (i915_ggtt_offset(engine->status_page.vma) +
@@ -316,6 +322,11 @@ static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = rq->timeline->hwsp_offset | PIPE_CONTROL_GLOBAL_GTT;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_QW_WRITE;
+	*cs++ = hws_hangcheck_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = GFX_OP_PIPE_CONTROL(4);
 	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
 	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
@@ -422,6 +433,11 @@ static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = rq->timeline->hwsp_offset;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = GFX_OP_PIPE_CONTROL(4);
+	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_GLOBAL_GTT_IVB;
+	*cs++ = hws_hangcheck_address(rq->engine);
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = GFX_OP_PIPE_CONTROL(4);
 	*cs++ = (PIPE_CONTROL_QW_WRITE |
 		 PIPE_CONTROL_GLOBAL_GTT_IVB |
@@ -447,12 +463,15 @@ static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -472,6 +491,10 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = rq->global_seqno;
@@ -487,6 +510,7 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = 0;
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -930,11 +954,16 @@ static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	*cs++ = MI_STORE_DWORD_INDEX;
 	*cs++ = I915_GEM_HWS_INDEX_ADDR;
 	*cs++ = rq->global_seqno;
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -956,6 +985,10 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
 	*cs++ = rq->fence.seqno;
 
+	*cs++ = MI_STORE_DWORD_INDEX;
+	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
+	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
+
 	BUILD_BUG_ON(GEN5_WA_STORES < 1);
 	for (i = 0; i < GEN5_WA_STORES; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
@@ -964,7 +997,6 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	}
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 493a72ed01af..b30c37ac55a3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -6,6 +6,7 @@
 
 #include <linux/hashtable.h>
 #include <linux/irq_work.h>
+#include <linux/random.h>
 #include <linux/seqlock.h>
 
 #include "i915_gem_batch_pool.h"
@@ -119,7 +120,8 @@ struct intel_instdone {
 
 struct intel_engine_hangcheck {
 	u64 acthd;
-	u32 seqno;
+	u32 last_seqno;
+	u32 next_seqno;
 	unsigned long action_timestamp;
 	struct intel_instdone instdone;
 };
@@ -718,6 +720,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
 #define I915_GEM_HWS_INDEX_ADDR		(I915_GEM_HWS_INDEX * sizeof(u32))
 #define I915_GEM_HWS_PREEMPT		0x32
 #define I915_GEM_HWS_PREEMPT_ADDR	(I915_GEM_HWS_PREEMPT * sizeof(u32))
+#define I915_GEM_HWS_HANGCHECK		0x34
+#define I915_GEM_HWS_HANGCHECK_ADDR	(I915_GEM_HWS_HANGCHECK * sizeof(u32))
 #define I915_GEM_HWS_SEQNO		0x40
 #define I915_GEM_HWS_SEQNO_ADDR		(I915_GEM_HWS_SEQNO * sizeof(u32))
 #define I915_GEM_HWS_SCRATCH		0x80
@@ -1078,4 +1082,17 @@ static inline bool inject_preempt_hang(struct intel_engine_execlists *execlists)
 
 #endif
 
+static inline u32
+intel_engine_next_hangcheck_seqno(struct intel_engine_cs *engine)
+{
+	return engine->hangcheck.next_seqno =
+		next_pseudo_random32(engine->hangcheck.next_seqno);
+}
+
+static inline u32
+intel_engine_get_hangcheck_seqno(struct intel_engine_cs *engine)
+{
+	return intel_read_status_page(engine, I915_GEM_HWS_HANGCHECK);
+}
+
 #endif /* _INTEL_RINGBUFFER_H_ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (17 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 18:18   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 20/46] drm/i915: Remove access to global seqno in the HWSP Chris Wilson
                   ` (33 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

As we no longer have a precise indication of requests queued to an
engine, make no presumptions and just sample the ring registers to see
if the engine is busy.

v2: Report busy while the ring is idling on a semaphore/event.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_pmu.c | 55 +++++++++++++--------------------
 1 file changed, 21 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
index 13d70b90dd0f..157cbfa155d9 100644
--- a/drivers/gpu/drm/i915/i915_pmu.c
+++ b/drivers/gpu/drm/i915/i915_pmu.c
@@ -101,7 +101,7 @@ static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
 	 *
 	 * Use RCS as proxy for all engines.
 	 */
-	else if (intel_engine_supports_stats(i915->engine[RCS]))
+	else if (i915->caps.scheduler & I915_SCHEDULER_CAP_PMU)
 		enable &= ~BIT(I915_SAMPLE_BUSY);
 
 	/*
@@ -148,14 +148,6 @@ void i915_pmu_gt_unparked(struct drm_i915_private *i915)
 	spin_unlock_irq(&i915->pmu.lock);
 }
 
-static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
-{
-	if (!fw)
-		intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
-
-	return true;
-}
-
 static void
 add_sample(struct i915_pmu_sample *sample, u32 val)
 {
@@ -168,7 +160,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
 	intel_wakeref_t wakeref;
-	bool fw = false;
 
 	if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
 		return;
@@ -181,36 +172,32 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
 		return;
 
 	for_each_engine(engine, dev_priv, id) {
-		u32 current_seqno = intel_engine_get_seqno(engine);
-		u32 last_seqno = intel_engine_last_submit(engine);
+		typeof(engine->pmu) *pmu = &engine->pmu;
+		bool busy;
 		u32 val;
 
-		val = !i915_seqno_passed(current_seqno, last_seqno);
-
-		if (val)
-			add_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
-				   period_ns);
-
-		if (val && (engine->pmu.enable &
-		    (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
-			fw = grab_forcewake(dev_priv, fw);
-
-			val = I915_READ_FW(RING_CTL(engine->mmio_base));
-		} else {
-			val = 0;
-		}
+		val = I915_READ_FW(RING_CTL(engine->mmio_base));
+		if (val == 0 || val == ~0u) /* outside of powerwell */
+			continue;
 
 		if (val & RING_WAIT)
-			add_sample(&engine->pmu.sample[I915_SAMPLE_WAIT],
-				   period_ns);
-
+			add_sample(&pmu->sample[I915_SAMPLE_WAIT], period_ns);
 		if (val & RING_WAIT_SEMAPHORE)
-			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
-				   period_ns);
-	}
+			add_sample(&pmu->sample[I915_SAMPLE_SEMA], period_ns);
 
-	if (fw)
-		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
+		/*
+		 * MI_MODE reports IDLE if the ring is waiting, but we regard
+		 * this as being busy instead, as the engine is busy with the
+		 * user request.
+		 */
+		busy = val & (RING_WAIT_SEMAPHORE | RING_WAIT);
+		if (!busy) {
+			val = I915_READ_FW(RING_MI_MODE(engine->mmio_base));
+			busy = !(val & MODE_IDLE);
+		}
+		if (busy)
+			add_sample(&pmu->sample[I915_SAMPLE_BUSY], period_ns);
+	}
 
 	intel_runtime_pm_put(dev_priv, wakeref);
 }
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 20/46] drm/i915: Remove access to global seqno in the HWSP
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (18 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 18:22   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 21/46] drm/i915: Remove i915_request.global_seqno Chris Wilson
                   ` (32 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Stop accessing the HWSP to read the global seqno, and stop tracking the
mirror in the engine's execution timeline -- it is unused.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c         |  4 --
 drivers/gpu/drm/i915/i915_gpu_error.h         |  3 --
 drivers/gpu/drm/i915/i915_request.c           | 27 +++++--------
 drivers/gpu/drm/i915/i915_reset.c             |  1 -
 drivers/gpu/drm/i915/intel_engine_cs.c        | 14 +------
 drivers/gpu/drm/i915/intel_lrc.c              | 21 +++-------
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  7 +---
 drivers/gpu/drm/i915/intel_ringbuffer.h       | 40 -------------------
 drivers/gpu/drm/i915/selftests/i915_request.c |  3 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  2 -
 10 files changed, 19 insertions(+), 103 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 9a65341fec09..a674c78ca1f8 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -533,8 +533,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
 				   ee->vm_info.pp_dir_base);
 		}
 	}
-	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
-	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
 	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
 	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
 	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
@@ -1227,8 +1225,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
 
 	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
 	ee->acthd = intel_engine_get_active_head(engine);
-	ee->seqno = intel_engine_get_seqno(engine);
-	ee->last_seqno = intel_engine_last_submit(engine);
 	ee->start = I915_READ_START(engine);
 	ee->head = I915_READ_HEAD(engine);
 	ee->tail = I915_READ_TAIL(engine);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index d5c58e82508b..4dbbd0f02edb 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -94,8 +94,6 @@ struct i915_gpu_state {
 		u32 cpu_ring_head;
 		u32 cpu_ring_tail;
 
-		u32 last_seqno;
-
 		/* Register state */
 		u32 start;
 		u32 tail;
@@ -108,7 +106,6 @@ struct i915_gpu_state {
 		u32 bbstate;
 		u32 instpm;
 		u32 instps;
-		u32 seqno;
 		u64 bbaddr;
 		u64 acthd;
 		u32 fault_reg;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index eed66d3606d9..85cf5cfbc7ed 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -192,12 +192,11 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_request(struct intel_engine_cs *engine,
 				    struct i915_request *rq)
 {
-	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d:%d\n",
+	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
 		  __func__, engine->name,
 		  rq->fence.context, rq->fence.seqno,
 		  rq->global_seqno,
-		  hwsp_seqno(rq),
-		  intel_engine_get_seqno(engine));
+		  hwsp_seqno(rq));
 
 	GEM_BUG_ON(!i915_request_completed(rq));
 
@@ -256,12 +255,11 @@ static void i915_request_retire(struct i915_request *request)
 {
 	struct i915_active_request *active, *next;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
+	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
 		  request->engine->name,
 		  request->fence.context, request->fence.seqno,
 		  request->global_seqno,
-		  hwsp_seqno(request),
-		  intel_engine_get_seqno(request->engine));
+		  hwsp_seqno(request));
 
 	lockdep_assert_held(&request->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_sw_fence_signaled(&request->submit));
@@ -320,12 +318,11 @@ void i915_request_retire_upto(struct i915_request *rq)
 	struct intel_ring *ring = rq->ring;
 	struct i915_request *tmp;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
+	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
 		  rq->engine->name,
 		  rq->fence.context, rq->fence.seqno,
 		  rq->global_seqno,
-		  hwsp_seqno(rq),
-		  intel_engine_get_seqno(rq->engine));
+		  hwsp_seqno(rq));
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_request_completed(rq));
@@ -427,12 +424,11 @@ void __i915_request_submit(struct i915_request *request)
 	struct intel_engine_cs *engine = request->engine;
 	u32 seqno;
 
-	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d:%d\n",
+	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  engine->timeline.seqno + 1,
-		  hwsp_seqno(request),
-		  intel_engine_get_seqno(engine));
+		  hwsp_seqno(request));
 
 	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->timeline.lock);
@@ -441,7 +437,6 @@ void __i915_request_submit(struct i915_request *request)
 
 	seqno = next_global_seqno(&engine->timeline);
 	GEM_BUG_ON(!seqno);
-	GEM_BUG_ON(intel_engine_signaled(engine, seqno));
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
@@ -492,12 +487,11 @@ void __i915_request_unsubmit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 
-	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d:%d\n",
+	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
 		  request->global_seqno,
-		  hwsp_seqno(request),
-		  intel_engine_get_seqno(engine));
+		  hwsp_seqno(request));
 
 	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->timeline.lock);
@@ -508,7 +502,6 @@ void __i915_request_unsubmit(struct i915_request *request)
 	 */
 	GEM_BUG_ON(!request->global_seqno);
 	GEM_BUG_ON(request->global_seqno != engine->timeline.seqno);
-	GEM_BUG_ON(intel_engine_has_completed(engine, request->global_seqno));
 	engine->timeline.seqno--;
 
 	/* We may be recursing from the signal callback of another i915 fence */
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index b629f25a81f0..7051c0a43941 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -787,7 +787,6 @@ static void nop_submit_request(struct i915_request *request)
 	spin_lock_irqsave(&engine->timeline.lock, flags);
 	__i915_request_submit(request);
 	i915_request_mark_complete(request);
-	intel_engine_write_global_seqno(engine, request->global_seqno);
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 
 	intel_engine_queue_breadcrumbs(engine);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index e1e54b7448b4..ea370ed094a5 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -455,12 +455,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
 	return err;
 }
 
-void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
-{
-	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
-	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
-}
-
 static void intel_engine_init_batch_pool(struct intel_engine_cs *engine)
 {
 	i915_gem_batch_pool_init(&engine->batch_pool, engine);
@@ -1053,10 +1047,6 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
 	if (i915_terminally_wedged(&dev_priv->gpu_error))
 		return true;
 
-	/* Any inflight/incomplete requests? */
-	if (!intel_engine_signaled(engine, intel_engine_last_submit(engine)))
-		return false;
-
 	/* Waiting to drain ELSP? */
 	if (READ_ONCE(engine->execlists.active)) {
 		struct tasklet_struct *t = &engine->execlists.tasklet;
@@ -1538,9 +1528,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
 	if (i915_terminally_wedged(&engine->i915->gpu_error))
 		drm_printf(m, "*** WEDGED ***\n");
 
-	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x/%x [%d ms]\n",
-		   intel_engine_get_seqno(engine),
-		   intel_engine_last_submit(engine),
+	drm_printf(m, "\tHangcheck %x:%x [%d ms]\n",
 		   engine->hangcheck.last_seqno,
 		   engine->hangcheck.next_seqno,
 		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 342d3a91be03..2f2c27e6ae6d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -565,13 +565,12 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 			desc = execlists_update_context(rq);
 			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
 
-			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
+			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
 				  engine->name, n,
 				  port[n].context_id, count,
 				  rq->global_seqno,
 				  rq->fence.context, rq->fence.seqno,
 				  hwsp_seqno(rq),
-				  intel_engine_get_seqno(engine),
 				  rq_prio(rq));
 		} else {
 			GEM_BUG_ON(!n);
@@ -876,13 +875,12 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 	while (num_ports-- && port_isset(port)) {
 		struct i915_request *rq = port_request(port);
 
-		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d:%d)\n",
+		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
 			  rq->engine->name,
 			  (unsigned int)(port - execlists->port),
 			  rq->global_seqno,
 			  rq->fence.context, rq->fence.seqno,
-			  hwsp_seqno(rq),
-			  intel_engine_get_seqno(rq->engine));
+			  hwsp_seqno(rq));
 
 		GEM_BUG_ON(!execlists->active);
 		execlists_context_schedule_out(rq,
@@ -938,8 +936,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 	struct rb_node *rb;
 	unsigned long flags;
 
-	GEM_TRACE("%s current %d\n",
-		  engine->name, intel_engine_get_seqno(engine));
+	GEM_TRACE("%s\n", engine->name);
 
 	/*
 	 * Before we call engine->cancel_requests(), we should have exclusive
@@ -987,10 +984,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 		i915_priolist_free(p);
 	}
 
-	intel_write_status_page(engine,
-				I915_GEM_HWS_INDEX,
-				intel_engine_last_submit(engine));
-
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
 	execlists->queue_priority_hint = INT_MIN;
@@ -1106,14 +1099,13 @@ static void process_csb(struct intel_engine_cs *engine)
 						EXECLISTS_ACTIVE_USER));
 
 		rq = port_unpack(port, &count);
-		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
+		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
 			  engine->name,
 			  port->context_id, count,
 			  rq ? rq->global_seqno : 0,
 			  rq ? rq->fence.context : 0,
 			  rq ? rq->fence.seqno : 0,
 			  rq ? hwsp_seqno(rq) : 0,
-			  intel_engine_get_seqno(engine),
 			  rq ? rq_prio(rq) : 0);
 
 		/* Check the context/desc id for this event matches */
@@ -1975,10 +1967,9 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	reset_csb_pointers(&engine->execlists);
 
-	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
+	GEM_TRACE("%s seqno=%d, stalled? %s\n",
 		  engine->name,
 		  rq ? rq->global_seqno : 0,
-		  intel_engine_get_seqno(engine),
 		  yesno(stalled));
 	if (!rq)
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 870184bbd169..2d59e2990448 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -782,10 +782,9 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 		}
 	}
 
-	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
+	GEM_TRACE("%s seqno=%d, stalled? %s\n",
 		  engine->name,
 		  rq ? rq->global_seqno : 0,
-		  intel_engine_get_seqno(engine),
 		  yesno(stalled));
 	/*
 	 * The guilty request will get skipped on a hung engine.
@@ -924,10 +923,6 @@ static void cancel_requests(struct intel_engine_cs *engine)
 		i915_request_mark_complete(request);
 	}
 
-	intel_write_status_page(engine,
-				I915_GEM_HWS_INDEX,
-				intel_engine_last_submit(engine));
-
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b30c37ac55a3..26bae7772208 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -840,8 +840,6 @@ __intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
 	return (head - tail - CACHELINE_BYTES) & (size - 1);
 }
 
-void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno);
-
 int intel_engine_setup_common(struct intel_engine_cs *engine);
 int intel_engine_init_common(struct intel_engine_cs *engine);
 void intel_engine_cleanup_common(struct intel_engine_cs *engine);
@@ -859,44 +857,6 @@ void intel_engine_set_hwsp_writemask(struct intel_engine_cs *engine, u32 mask);
 u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
 u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
 
-static inline u32 intel_engine_last_submit(struct intel_engine_cs *engine)
-{
-	/*
-	 * We are only peeking at the tail of the submit queue (and not the
-	 * queue itself) in order to gain a hint as to the current active
-	 * state of the engine. Callers are not expected to be taking
-	 * engine->timeline->lock, nor are they expected to be concerned
-	 * wtih serialising this hint with anything, so document it as
-	 * a hint and nothing more.
-	 */
-	return READ_ONCE(engine->timeline.seqno);
-}
-
-static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
-{
-	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
-}
-
-static inline bool intel_engine_signaled(struct intel_engine_cs *engine,
-					 u32 seqno)
-{
-	return i915_seqno_passed(intel_engine_get_seqno(engine), seqno);
-}
-
-static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
-					      u32 seqno)
-{
-	GEM_BUG_ON(!seqno);
-	return intel_engine_signaled(engine, seqno);
-}
-
-static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
-					    u32 seqno)
-{
-	GEM_BUG_ON(!seqno);
-	return intel_engine_signaled(engine, seqno - 1);
-}
-
 void intel_engine_get_instdone(struct intel_engine_cs *engine,
 			       struct intel_instdone *instdone);
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 6733dc5b6b4c..074d393f4a02 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -226,8 +226,7 @@ static int igt_request_rewind(void *arg)
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	if (i915_request_wait(vip, 0, HZ) == -ETIME) {
-		pr_err("timed out waiting for high priority request, vip.seqno=%d, current seqno=%d\n",
-		       vip->global_seqno, intel_engine_get_seqno(i915->engine[RCS]));
+		pr_err("timed out waiting for high priority request\n");
 		goto err;
 	}
 
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 0d35af07867b..f055da01ced9 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -86,7 +86,6 @@ static struct i915_request *first_request(struct mock_engine *engine)
 static void advance(struct i915_request *request)
 {
 	list_del_init(&request->mock.link);
-	intel_engine_write_global_seqno(request->engine, request->global_seqno);
 	i915_request_mark_complete(request);
 	GEM_BUG_ON(!i915_request_completed(request));
 
@@ -276,7 +275,6 @@ void mock_engine_flush(struct intel_engine_cs *engine)
 
 void mock_engine_reset(struct intel_engine_cs *engine)
 {
-	intel_engine_write_global_seqno(engine, 0);
 }
 
 void mock_engine_free(struct intel_engine_cs *engine)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 21/46] drm/i915: Remove i915_request.global_seqno
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (19 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 20/46] drm/i915: Remove access to global seqno in the HWSP Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 18:44   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 22/46] drm/i915: Force GPU idle on suspend Chris Wilson
                   ` (31 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Having weaned the interrupt handling off using a single global execution
queue, we no longer need to emit a global_seqno.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gpu_error.c         | 35 ++-----------
 drivers/gpu/drm/i915/i915_gpu_error.h         |  2 -
 drivers/gpu/drm/i915/i915_request.c           | 31 ++----------
 drivers/gpu/drm/i915/i915_request.h           | 32 ------------
 drivers/gpu/drm/i915/i915_trace.h             | 25 +++-------
 drivers/gpu/drm/i915/intel_engine_cs.c        |  5 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c              | 34 ++-----------
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 50 +++----------------
 drivers/gpu/drm/i915/intel_ringbuffer.h       |  2 -
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  5 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |  1 -
 12 files changed, 31 insertions(+), 193 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index a674c78ca1f8..8792ad12373d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -380,19 +380,16 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
 	err_printf(m, "%s [%d]:\n", name, count);
 
 	while (count--) {
-		err_printf(m, "    %08x_%08x %8u %02x %02x %02x",
+		err_printf(m, "    %08x_%08x %8u %02x %02x",
 			   upper_32_bits(err->gtt_offset),
 			   lower_32_bits(err->gtt_offset),
 			   err->size,
 			   err->read_domains,
-			   err->write_domain,
-			   err->wseqno);
+			   err->write_domain);
 		err_puts(m, tiling_flag(err->tiling));
 		err_puts(m, dirty_flag(err->dirty));
 		err_puts(m, purgeable_flag(err->purgeable));
 		err_puts(m, err->userptr ? " userptr" : "");
-		err_puts(m, err->engine != -1 ? " " : "");
-		err_puts(m, engine_name(m->i915, err->engine));
 		err_puts(m, i915_cache_level_str(m->i915, err->cache_level));
 
 		if (err->name)
@@ -1059,27 +1056,6 @@ i915_error_object_create(struct drm_i915_private *i915,
 	return dst;
 }
 
-/* The error capture is special as tries to run underneath the normal
- * locking rules - so we use the raw version of the i915_active_request lookup.
- */
-static inline u32
-__active_get_seqno(struct i915_active_request *active)
-{
-	struct i915_request *request;
-
-	request = __i915_active_request_peek(active);
-	return request ? request->global_seqno : 0;
-}
-
-static inline int
-__active_get_engine_id(struct i915_active_request *active)
-{
-	struct i915_request *request;
-
-	request = __i915_active_request_peek(active);
-	return request ? request->engine->id : -1;
-}
-
 static void capture_bo(struct drm_i915_error_buffer *err,
 		       struct i915_vma *vma)
 {
@@ -1088,9 +1064,6 @@ static void capture_bo(struct drm_i915_error_buffer *err,
 	err->size = obj->base.size;
 	err->name = obj->base.name;
 
-	err->wseqno = __active_get_seqno(&obj->frontbuffer_write);
-	err->engine = __active_get_engine_id(&obj->frontbuffer_write);
-
 	err->gtt_offset = vma->node.start;
 	err->read_domains = obj->read_domains;
 	err->write_domain = obj->write_domain;
@@ -1295,10 +1268,10 @@ static void record_request(struct i915_request *request,
 	struct i915_gem_context *ctx = request->gem_context;
 
 	erq->flags = request->fence.flags;
-	erq->context = ctx->hw_id;
+	erq->context = request->fence.context;
+	erq->seqno = request->fence.seqno;
 	erq->sched_attr = request->sched.attr;
 	erq->ban_score = atomic_read(&ctx->ban_score);
-	erq->seqno = request->global_seqno;
 	erq->jiffies = request->emitted_jiffies;
 	erq->start = i915_ggtt_offset(request->ring->vma);
 	erq->head = request->head;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 4dbbd0f02edb..34fec5f00ef2 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -167,7 +167,6 @@ struct i915_gpu_state {
 	struct drm_i915_error_buffer {
 		u32 size;
 		u32 name;
-		u32 wseqno;
 		u64 gtt_offset;
 		u32 read_domains;
 		u32 write_domain;
@@ -176,7 +175,6 @@ struct i915_gpu_state {
 		u32 dirty:1;
 		u32 purgeable:1;
 		u32 userptr:1;
-		s32 engine:4;
 		u32 cache_level:3;
 	} *active_bo[I915_NUM_ENGINES], *pinned_bo;
 	u32 active_bo_count[I915_NUM_ENGINES], pinned_bo_count;
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 85cf5cfbc7ed..8321f2d8a301 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -192,10 +192,9 @@ static void free_capture_list(struct i915_request *request)
 static void __retire_engine_request(struct intel_engine_cs *engine,
 				    struct i915_request *rq)
 {
-	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s(%s) fence %llx:%lld, current %d\n",
 		  __func__, engine->name,
 		  rq->fence.context, rq->fence.seqno,
-		  rq->global_seqno,
 		  hwsp_seqno(rq));
 
 	GEM_BUG_ON(!i915_request_completed(rq));
@@ -255,10 +254,9 @@ static void i915_request_retire(struct i915_request *request)
 {
 	struct i915_active_request *active, *next;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, current %d\n",
 		  request->engine->name,
 		  request->fence.context, request->fence.seqno,
-		  request->global_seqno,
 		  hwsp_seqno(request));
 
 	lockdep_assert_held(&request->i915->drm.struct_mutex);
@@ -318,10 +316,9 @@ void i915_request_retire_upto(struct i915_request *rq)
 	struct intel_ring *ring = rq->ring;
 	struct i915_request *tmp;
 
-	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld, current %d\n",
 		  rq->engine->name,
 		  rq->fence.context, rq->fence.seqno,
-		  rq->global_seqno,
 		  hwsp_seqno(rq));
 
 	lockdep_assert_held(&rq->i915->drm.struct_mutex);
@@ -412,17 +409,9 @@ static void move_to_timeline(struct i915_request *request,
 	spin_unlock(&request->timeline->lock);
 }
 
-static u32 next_global_seqno(struct i915_timeline *tl)
-{
-	if (!++tl->seqno)
-		++tl->seqno;
-	return tl->seqno;
-}
-
 void __i915_request_submit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
-	u32 seqno;
 
 	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
 		  engine->name,
@@ -433,18 +422,12 @@ void __i915_request_submit(struct i915_request *request)
 	GEM_BUG_ON(!irqs_disabled());
 	lockdep_assert_held(&engine->timeline.lock);
 
-	GEM_BUG_ON(request->global_seqno);
-
-	seqno = next_global_seqno(&engine->timeline);
-	GEM_BUG_ON(!seqno);
-
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
 
 	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
 	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
 
-	request->global_seqno = seqno;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
 	    !i915_request_enable_breadcrumb(request))
 		intel_engine_queue_breadcrumbs(engine);
@@ -487,10 +470,9 @@ void __i915_request_unsubmit(struct i915_request *request)
 {
 	struct intel_engine_cs *engine = request->engine;
 
-	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
+	GEM_TRACE("%s fence %llx:%lld <- current %d\n",
 		  engine->name,
 		  request->fence.context, request->fence.seqno,
-		  request->global_seqno,
 		  hwsp_seqno(request));
 
 	GEM_BUG_ON(!irqs_disabled());
@@ -500,13 +482,9 @@ void __i915_request_unsubmit(struct i915_request *request)
 	 * Only unwind in reverse order, required so that the per-context list
 	 * is kept in seqno/ring order.
 	 */
-	GEM_BUG_ON(!request->global_seqno);
-	GEM_BUG_ON(request->global_seqno != engine->timeline.seqno);
-	engine->timeline.seqno--;
 
 	/* We may be recursing from the signal callback of another i915 fence */
 	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
-	request->global_seqno = 0;
 	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
 		i915_request_cancel_breadcrumb(request);
 	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
@@ -724,7 +702,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	i915_sched_node_init(&rq->sched);
 
 	/* No zalloc, must clear what we need by hand */
-	rq->global_seqno = 0;
 	rq->file_priv = NULL;
 	rq->batch = NULL;
 	rq->capture_list = NULL;
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index df52776b26cf..5a32167ee892 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -149,14 +149,6 @@ struct i915_request {
 	 */
 	const u32 *hwsp_seqno;
 
-	/**
-	 * GEM sequence number associated with this request on the
-	 * global execution timeline. It is zero when the request is not
-	 * on the HW queue (i.e. not on the engine timeline list).
-	 * Its value is guarded by the timeline spinlock.
-	 */
-	u32 global_seqno;
-
 	/** Position in the ring of the start of the request */
 	u32 head;
 
@@ -254,30 +246,6 @@ i915_request_put(struct i915_request *rq)
 	dma_fence_put(&rq->fence);
 }
 
-/**
- * i915_request_global_seqno - report the current global seqno
- * @request - the request
- *
- * A request is assigned a global seqno only when it is on the hardware
- * execution queue. The global seqno can be used to maintain a list of
- * requests on the same engine in retirement order, for example for
- * constructing a priority queue for waiting. Prior to its execution, or
- * if it is subsequently removed in the event of preemption, its global
- * seqno is zero. As both insertion and removal from the execution queue
- * may operate in IRQ context, it is not guarded by the usual struct_mutex
- * BKL. Instead those relying on the global seqno must be prepared for its
- * value to change between reads. Only when the request is complete can
- * the global seqno be stable (due to the memory barriers on submitting
- * the commands to the hardware to write the breadcrumb, if the HWS shows
- * that it has passed the global seqno and the global seqno is unchanged
- * after the read, it is indeed complete).
- */
-static inline u32
-i915_request_global_seqno(const struct i915_request *request)
-{
-	return READ_ONCE(request->global_seqno);
-}
-
 int i915_request_await_object(struct i915_request *to,
 			      struct drm_i915_gem_object *obj,
 			      bool write);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index eab313c3163c..d1d0d9e5f384 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -627,7 +627,6 @@ DECLARE_EVENT_CLASS(i915_request,
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
-			     __field(u32, global)
 			     ),
 
 	    TP_fast_assign(
@@ -637,13 +636,11 @@ DECLARE_EVENT_CLASS(i915_request,
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
-			   __entry->global = rq->global_seqno;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, global=%u",
+	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u",
 		      __entry->dev, __entry->class, __entry->instance,
-		      __entry->hw_id, __entry->ctx, __entry->seqno,
-		      __entry->global)
+		      __entry->hw_id, __entry->ctx, __entry->seqno)
 );
 
 DEFINE_EVENT(i915_request, i915_request_add,
@@ -673,7 +670,6 @@ TRACE_EVENT(i915_request_in,
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
-			     __field(u32, global_seqno)
 			     __field(u32, port)
 			     __field(u32, prio)
 			    ),
@@ -685,15 +681,14 @@ TRACE_EVENT(i915_request_in,
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
-			   __entry->global_seqno = rq->global_seqno;
 			   __entry->prio = rq->sched.attr.priority;
 			   __entry->port = port;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, prio=%u, global=%u, port=%u",
+	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, prio=%u, port=%u",
 		      __entry->dev, __entry->class, __entry->instance,
 		      __entry->hw_id, __entry->ctx, __entry->seqno,
-		      __entry->prio, __entry->global_seqno, __entry->port)
+		      __entry->prio, __entry->port)
 );
 
 TRACE_EVENT(i915_request_out,
@@ -707,7 +702,6 @@ TRACE_EVENT(i915_request_out,
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
-			     __field(u32, global_seqno)
 			     __field(u32, completed)
 			    ),
 
@@ -718,14 +712,13 @@ TRACE_EVENT(i915_request_out,
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
-			   __entry->global_seqno = rq->global_seqno;
 			   __entry->completed = i915_request_completed(rq);
 			   ),
 
-		    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, global=%u, completed?=%u",
+		    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, completed?=%u",
 			      __entry->dev, __entry->class, __entry->instance,
 			      __entry->hw_id, __entry->ctx, __entry->seqno,
-			      __entry->global_seqno, __entry->completed)
+			      __entry->completed)
 );
 
 #else
@@ -768,7 +761,6 @@ TRACE_EVENT(i915_request_wait_begin,
 			     __field(u16, class)
 			     __field(u16, instance)
 			     __field(u32, seqno)
-			     __field(u32, global)
 			     __field(unsigned int, flags)
 			     ),
 
@@ -785,14 +777,13 @@ TRACE_EVENT(i915_request_wait_begin,
 			   __entry->instance = rq->engine->instance;
 			   __entry->ctx = rq->fence.context;
 			   __entry->seqno = rq->fence.seqno;
-			   __entry->global = rq->global_seqno;
 			   __entry->flags = flags;
 			   ),
 
-	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, global=%u, blocking=%u, flags=0x%x",
+	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, blocking=%u, flags=0x%x",
 		      __entry->dev, __entry->class, __entry->instance,
 		      __entry->hw_id, __entry->ctx, __entry->seqno,
-		      __entry->global, !!(__entry->flags & I915_WAIT_LOCKED),
+		      !!(__entry->flags & I915_WAIT_LOCKED),
 		      __entry->flags)
 );
 
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ea370ed094a5..ce7c19f2ae49 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1313,15 +1313,14 @@ static void print_request(struct drm_printer *m,
 
 	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
 
-	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
+	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",
 		   prefix,
-		   rq->global_seqno,
+		   rq->fence.context, rq->fence.seqno,
 		   i915_request_completed(rq) ? "!" :
 		   i915_request_started(rq) ? "*" :
 		   "",
 		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
 			    &rq->fence.flags) ?  "+" : "",
-		   rq->fence.context, rq->fence.seqno,
 		   buf,
 		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
 		   name);
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 4cf94513615d..4366db7978a8 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -535,7 +535,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
 	spin_lock(&client->wq_lock);
 
 	guc_wq_item_append(client, engine->guc_id, ctx_desc,
-			   ring_tail, rq->global_seqno);
+			   ring_tail, rq->fence.seqno);
 	guc_ring_doorbell(client);
 
 	client->submissions[engine->id] += 1;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2f2c27e6ae6d..2424eb2b1fc6 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -174,12 +174,6 @@ static void execlists_init_reg_state(u32 *reg_state,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
 
-static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
-{
-	return (i915_ggtt_offset(engine->status_page.vma) +
-		I915_GEM_HWS_INDEX_ADDR);
-}
-
 static inline u32 intel_hws_hangcheck_address(struct intel_engine_cs *engine)
 {
 	return (i915_ggtt_offset(engine->status_page.vma) +
@@ -565,10 +559,9 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
 			desc = execlists_update_context(rq);
 			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
 
-			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
+			GEM_TRACE("%s in[%d]:  ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n",
 				  engine->name, n,
 				  port[n].context_id, count,
-				  rq->global_seqno,
 				  rq->fence.context, rq->fence.seqno,
 				  hwsp_seqno(rq),
 				  rq_prio(rq));
@@ -875,10 +868,9 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
 	while (num_ports-- && port_isset(port)) {
 		struct i915_request *rq = port_request(port);
 
-		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
+		GEM_TRACE("%s:port%u fence %llx:%lld, (current %d)\n",
 			  rq->engine->name,
 			  (unsigned int)(port - execlists->port),
-			  rq->global_seqno,
 			  rq->fence.context, rq->fence.seqno,
 			  hwsp_seqno(rq));
 
@@ -960,8 +952,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 
 	/* Mark all executing requests as skipped. */
 	list_for_each_entry(rq, &engine->timeline.requests, link) {
-		GEM_BUG_ON(!rq->global_seqno);
-
 		if (!i915_request_signaled(rq))
 			dma_fence_set_error(&rq->fence, -EIO);
 
@@ -1099,10 +1089,9 @@ static void process_csb(struct intel_engine_cs *engine)
 						EXECLISTS_ACTIVE_USER));
 
 		rq = port_unpack(port, &count);
-		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
+		GEM_TRACE("%s out[0]: ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n",
 			  engine->name,
 			  port->context_id, count,
-			  rq ? rq->global_seqno : 0,
 			  rq ? rq->fence.context : 0,
 			  rq ? rq->fence.seqno : 0,
 			  rq ? hwsp_seqno(rq) : 0,
@@ -1967,10 +1956,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	/* Following the reset, we need to reload the CSB read/write pointers */
 	reset_csb_pointers(&engine->execlists);
 
-	GEM_TRACE("%s seqno=%d, stalled? %s\n",
-		  engine->name,
-		  rq ? rq->global_seqno : 0,
-		  yesno(stalled));
+	GEM_TRACE("%s stalled? %s\n", engine->name, yesno(stalled));
 	if (!rq)
 		goto out_unlock;
 
@@ -2225,9 +2211,6 @@ static u32 *gen8_emit_wa_tail(struct i915_request *request, u32 *cs)
 
 static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 {
-	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
-	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
-
 	cs = gen8_emit_ggtt_write(cs,
 				  request->fence.seqno,
 				  request->timeline->hwsp_offset);
@@ -2236,10 +2219,6 @@ static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
 				  intel_engine_next_hangcheck_seqno(request->engine),
 				  intel_hws_hangcheck_address(request->engine));
 
-	cs = gen8_emit_ggtt_write(cs,
-				  request->global_seqno,
-				  intel_hws_seqno_address(request->engine));
-
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
@@ -2265,11 +2244,6 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
 				      intel_hws_hangcheck_address(request->engine),
 				      PIPE_CONTROL_CS_STALL);
 
-	cs = gen8_emit_ggtt_write_rcs(cs,
-				      request->global_seqno,
-				      intel_hws_seqno_address(request->engine),
-				      PIPE_CONTROL_CS_STALL);
-
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 2d59e2990448..1b96b0960adc 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -49,12 +49,6 @@ static inline u32 hws_hangcheck_address(struct intel_engine_cs *engine)
 		I915_GEM_HWS_HANGCHECK_ADDR);
 }
 
-static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
-{
-	return (i915_ggtt_offset(engine->status_page.vma) +
-		I915_GEM_HWS_INDEX_ADDR);
-}
-
 unsigned int intel_ring_update_space(struct intel_ring *ring)
 {
 	unsigned int space;
@@ -327,11 +321,6 @@ static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = hws_hangcheck_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
 	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
 
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
-	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
-	*cs++ = rq->global_seqno;
-
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_NOOP;
 
@@ -438,13 +427,6 @@ static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = hws_hangcheck_address(rq->engine);
 	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
 
-	*cs++ = GFX_OP_PIPE_CONTROL(4);
-	*cs++ = (PIPE_CONTROL_QW_WRITE |
-		 PIPE_CONTROL_GLOBAL_GTT_IVB |
-		 PIPE_CONTROL_CS_STALL);
-	*cs++ = intel_hws_seqno_address(rq->engine);
-	*cs++ = rq->global_seqno;
-
 	*cs++ = MI_USER_INTERRUPT;
 	*cs++ = MI_NOOP;
 
@@ -467,11 +449,8 @@ static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
 
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
-	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
-	*cs++ = rq->global_seqno;
-
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -495,10 +474,6 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
 	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
 
-	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
-	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
-	*cs++ = rq->global_seqno;
-
 	for (i = 0; i < GEN7_XCS_WA; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
 		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
@@ -510,7 +485,6 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = 0;
 
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -782,10 +756,8 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
 		}
 	}
 
-	GEM_TRACE("%s seqno=%d, stalled? %s\n",
-		  engine->name,
-		  rq ? rq->global_seqno : 0,
-		  yesno(stalled));
+	GEM_TRACE("%s stalled? %s\n", engine->name, yesno(stalled));
+
 	/*
 	 * The guilty request will get skipped on a hung engine.
 	 *
@@ -915,8 +887,6 @@ static void cancel_requests(struct intel_engine_cs *engine)
 
 	/* Mark all submitted requests as skipped. */
 	list_for_each_entry(request, &engine->timeline.requests, link) {
-		GEM_BUG_ON(!request->global_seqno);
-
 		if (!i915_request_signaled(request))
 			dma_fence_set_error(&request->fence, -EIO);
 
@@ -953,12 +923,7 @@ static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
 	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
 
-	*cs++ = MI_STORE_DWORD_INDEX;
-	*cs++ = I915_GEM_HWS_INDEX_ADDR;
-	*cs++ = rq->global_seqno;
-
 	*cs++ = MI_USER_INTERRUPT;
-	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
@@ -976,10 +941,6 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 
 	*cs++ = MI_FLUSH;
 
-	*cs++ = MI_STORE_DWORD_INDEX;
-	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
-	*cs++ = rq->fence.seqno;
-
 	*cs++ = MI_STORE_DWORD_INDEX;
 	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
 	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
@@ -987,11 +948,12 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
 	BUILD_BUG_ON(GEN5_WA_STORES < 1);
 	for (i = 0; i < GEN5_WA_STORES; i++) {
 		*cs++ = MI_STORE_DWORD_INDEX;
-		*cs++ = I915_GEM_HWS_INDEX_ADDR;
-		*cs++ = rq->global_seqno;
+		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
+		*cs++ = rq->fence.seqno;
 	}
 
 	*cs++ = MI_USER_INTERRUPT;
+	*cs++ = MI_NOOP;
 
 	rq->tail = intel_ring_offset(rq, cs);
 	assert_ring_tail_valid(rq->ring, rq->tail);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 26bae7772208..39a9ee7b61e2 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -716,8 +716,6 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
  *
  * The area from dword 0x30 to 0x3ff is available for driver usage.
  */
-#define I915_GEM_HWS_INDEX		0x30
-#define I915_GEM_HWS_INDEX_ADDR		(I915_GEM_HWS_INDEX * sizeof(u32))
 #define I915_GEM_HWS_PREEMPT		0x32
 #define I915_GEM_HWS_PREEMPT_ADDR	(I915_GEM_HWS_PREEMPT * sizeof(u32))
 #define I915_GEM_HWS_HANGCHECK		0x34
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 36c17bfe05a7..4aa57d0d1b92 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -571,11 +571,10 @@ static int active_request_put(struct i915_request *rq)
 		return 0;
 
 	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
-		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld, seqno %d.\n",
+		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
 			  rq->engine->name,
 			  rq->fence.context,
-			  rq->fence.seqno,
-			  i915_request_global_seqno(rq));
+			  rq->fence.seqno);
 		GEM_TRACE_DUMP();
 
 		i915_gem_set_wedged(rq->i915);
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index f055da01ced9..ec1ae948954c 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -194,7 +194,6 @@ static void mock_submit_request(struct i915_request *request)
 	unsigned long flags;
 
 	i915_request_submit(request);
-	GEM_BUG_ON(!request->global_seqno);
 
 	spin_lock_irqsave(&engine->hw_lock, flags);
 	list_add_tail(&request->mock.link, &engine->hw_queue);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 22/46] drm/i915: Force GPU idle on suspend
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (20 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 21/46] drm/i915: Remove i915_request.global_seqno Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 23/46] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
                   ` (30 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

To facilitate the next patch to allow preemptible kernels not to incur
the wrath of hangcheck, we need to ensure that we can still suspend and
shutdown. That is we will not be able to rely on hangcheck to terminate
a blocking kernel and instead must manually do so ourselves. The
advantage is that we can apply more pressure!

As we now perform a GPU reset to clean up any residual kernels, we leave
the GPU in an unknown state and in particular can not talk to the GuC
before we reinitialise it following resume. For example, we no longer
need to tell the GuC to suspend itself, as it is already reset.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 20 +++++---------------
 1 file changed, 5 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 78b9aa57932d..89b2d3ac26ce 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3244,13 +3244,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 
 		lockdep_assert_held(&i915->drm.struct_mutex);
 
-		if (GEM_SHOW_DEBUG() && !timeout) {
-			/* Presume that timeout was non-zero to begin with! */
-			dev_warn(&i915->drm.pdev->dev,
-				 "Missed idle-completion interrupt!\n");
-			GEM_TRACE_DUMP();
-		}
-
 		err = wait_for_engines(i915);
 		if (err)
 			return err;
@@ -4497,11 +4490,12 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 					     I915_WAIT_INTERRUPTIBLE |
 					     I915_WAIT_LOCKED |
 					     I915_WAIT_FOR_IDLE_BOOST,
-					     MAX_SCHEDULE_TIMEOUT);
-		if (ret && ret != -EIO)
+					     HZ / 5);
+		if (ret == -EINTR)
 			goto err_unlock;
 
-		assert_kernel_context_is_current(i915);
+		/* Forcibly cancel outstanding work and leave the gpu quiet. */
+		i915_gem_set_wedged(i915);
 	}
 	i915_retire_requests(i915); /* ensure we flush after wedging */
 
@@ -4516,15 +4510,11 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 */
 	drain_delayed_work(&i915->gt.idle_work);
 
-	intel_uc_suspend(i915);
-
 	/*
 	 * Assert that we successfully flushed all the work and
 	 * reset the GPU back to its idle, low power state.
 	 */
-	WARN_ON(i915->gt.awake);
-	if (WARN_ON(!intel_engines_are_idle(i915)))
-		i915_gem_set_wedged(i915); /* no hope, discard everything */
+	GEM_BUG_ON(i915->gt.awake);
 
 	intel_runtime_pm_put(i915, wakeref);
 	return 0;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 23/46] drm/i915/selftests: Improve switch-to-kernel-context checking
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (21 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 22/46] drm/i915: Force GPU idle on suspend Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
                   ` (29 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

We can reduce the switch-to-kernel-context selftest to operate as a loop
and so trivially test another state transition (that of idle->busy).

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 .../gpu/drm/i915/selftests/i915_gem_context.c | 80 ++++++++-----------
 1 file changed, 35 insertions(+), 45 deletions(-)

diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index d00d0bb07784..f8931086eb70 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1493,63 +1493,55 @@ static int __igt_switch_to_kernel_context(struct drm_i915_private *i915,
 {
 	struct intel_engine_cs *engine;
 	unsigned int tmp;
-	int err;
+	int pass;
 
 	GEM_TRACE("Testing %s\n", __engine_name(i915, engines));
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		struct i915_request *rq;
+	for (pass = 0; pass < 4; pass++) { /* Once busy; once idle; repeat */
+		bool from_idle = pass & 1;
+		int err;
 
-		rq = i915_request_alloc(engine, ctx);
-		if (IS_ERR(rq))
-			return PTR_ERR(rq);
+		if (!from_idle) {
+			for_each_engine_masked(engine, i915, engines, tmp) {
+				struct i915_request *rq;
 
-		i915_request_add(rq);
-	}
-
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		return err;
+				rq = i915_request_alloc(engine, ctx);
+				if (IS_ERR(rq))
+					return PTR_ERR(rq);
 
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		if (!engine_has_kernel_context_barrier(engine)) {
-			pr_err("kernel context not last on engine %s!\n",
-			       engine->name);
-			return -EINVAL;
+				i915_request_add(rq);
+			}
 		}
-	}
 
-	err = i915_gem_wait_for_idle(i915,
-				     I915_WAIT_LOCKED,
-				     MAX_SCHEDULE_TIMEOUT);
-	if (err)
-		return err;
+		err = i915_gem_switch_to_kernel_context(i915);
+		if (err)
+			return err;
 
-	GEM_BUG_ON(i915->gt.active_requests);
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		if (engine->last_retired_context->gem_context != i915->kernel_context) {
-			pr_err("engine %s not idling in kernel context!\n",
-			       engine->name);
+		if (!from_idle) {
+			err = i915_gem_wait_for_idle(i915,
+						     I915_WAIT_LOCKED,
+						     MAX_SCHEDULE_TIMEOUT);
+			if (err)
+				return err;
+		}
+
+		if (i915->gt.active_requests) {
+			pr_err("%d active requests remain after switching to kernel context, pass %d (%s) on %s engine%s\n",
+			       i915->gt.active_requests,
+			       pass, from_idle ? "idle" : "busy",
+			       __engine_name(i915, engines),
+			       is_power_of_2(engines) ? "" : "s");
 			return -EINVAL;
 		}
-	}
 
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		return err;
+		/* XXX Bonus points for proving we are the kernel context! */
 
-	if (i915->gt.active_requests) {
-		pr_err("switch-to-kernel-context emitted %d requests even though it should already be idling in the kernel context\n",
-		       i915->gt.active_requests);
-		return -EINVAL;
+		mutex_unlock(&i915->drm.struct_mutex);
+		drain_delayed_work(&i915->gt.idle_work);
+		mutex_lock(&i915->drm.struct_mutex);
 	}
 
-	for_each_engine_masked(engine, i915, engines, tmp) {
-		if (!intel_engine_has_kernel_context(engine)) {
-			pr_err("kernel context not last on engine %s!\n",
-			       engine->name);
-			return -EINVAL;
-		}
-	}
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		return -EIO;
 
 	return 0;
 }
@@ -1593,8 +1585,6 @@ static int igt_switch_to_kernel_context(void *arg)
 
 out_unlock:
 	GEM_TRACE_DUMP_ON(err);
-	if (igt_flush_test(i915, I915_WAIT_LOCKED))
-		err = -EIO;
 
 	intel_runtime_pm_put(i915, wakeref);
 	mutex_unlock(&i915->drm.struct_mutex);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (22 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 23/46] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-21 19:48   ` Daniele Ceraolo Spurio
  2019-02-06 13:03 ` [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
                   ` (28 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

When the system idles, we switch to the kernel context as a defensive
measure (no users are harmed if the kernel context is lost). Currently,
we issue a switch to kernel context and then come back later to see if
the kernel context is still current and the system is idle. However,
if we are no longer privy to the runqueue ordering, then we have to
relax our assumptions about the logical state of the GPU and the only
way to ensure that the kernel context is currently loaded is by issuing
a request to run after all others, and wait for it to complete all while
preventing anyone else from issuing their own requests.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c           |  14 +--
 drivers/gpu/drm/i915/i915_drv.h           |   2 +-
 drivers/gpu/drm/i915/i915_gem.c           | 143 ++++++++--------------
 drivers/gpu/drm/i915/i915_gem_context.c   |   4 +
 drivers/gpu/drm/i915/selftests/i915_gem.c |   9 +-
 5 files changed, 65 insertions(+), 107 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index f5a3558e00fd..36da8ab1e7ce 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -712,8 +712,7 @@ static int i915_load_modeset_init(struct drm_device *dev)
 	return 0;
 
 cleanup_gem:
-	if (i915_gem_suspend(dev_priv))
-		DRM_ERROR("failed to idle hardware; continuing to unload!\n");
+	i915_gem_suspend(dev_priv);
 	i915_gem_fini(dev_priv);
 cleanup_modeset:
 	intel_modeset_cleanup(dev);
@@ -1784,8 +1783,7 @@ void i915_driver_unload(struct drm_device *dev)
 	/* Flush any external code that still may be under the RCU lock */
 	synchronize_rcu();
 
-	if (i915_gem_suspend(dev_priv))
-		DRM_ERROR("failed to idle hardware; continuing to unload!\n");
+	i915_gem_suspend(dev_priv);
 
 	drm_atomic_helper_shutdown(dev);
 
@@ -1893,7 +1891,6 @@ static bool suspend_to_idle(struct drm_i915_private *dev_priv)
 static int i915_drm_prepare(struct drm_device *dev)
 {
 	struct drm_i915_private *i915 = to_i915(dev);
-	int err;
 
 	/*
 	 * NB intel_display_suspend() may issue new requests after we've
@@ -1901,12 +1898,9 @@ static int i915_drm_prepare(struct drm_device *dev)
 	 * split out that work and pull it forward so that after point,
 	 * the GPU is not woken again.
 	 */
-	err = i915_gem_suspend(i915);
-	if (err)
-		dev_err(&i915->drm.pdev->dev,
-			"GEM idle failed, suspend/resume might fail\n");
+	i915_gem_suspend(i915);
 
-	return err;
+	return 0;
 }
 
 static int i915_drm_suspend(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4d697b1002af..8a72dad9471f 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3031,7 +3031,7 @@ void i915_gem_fini(struct drm_i915_private *dev_priv);
 void i915_gem_cleanup_engines(struct drm_i915_private *dev_priv);
 int i915_gem_wait_for_idle(struct drm_i915_private *dev_priv,
 			   unsigned int flags, long timeout);
-int __must_check i915_gem_suspend(struct drm_i915_private *dev_priv);
+void i915_gem_suspend(struct drm_i915_private *dev_priv);
 void i915_gem_suspend_late(struct drm_i915_private *dev_priv);
 void i915_gem_resume(struct drm_i915_private *dev_priv);
 vm_fault_t i915_gem_fault(struct vm_fault *vmf);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 89b2d3ac26ce..43bc26d5807a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2930,13 +2930,6 @@ static void __sleep_rcu(struct rcu_head *rcu)
 	}
 }
 
-static inline bool
-new_requests_since_last_retire(const struct drm_i915_private *i915)
-{
-	return (READ_ONCE(i915->gt.active_requests) ||
-		work_pending(&i915->gt.idle_work.work));
-}
-
 static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
@@ -2945,7 +2938,8 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	if (i915_terminally_wedged(&i915->gpu_error))
 		return;
 
-	GEM_BUG_ON(i915->gt.active_requests);
+	i915_retire_requests(i915);
+
 	for_each_engine(engine, i915, id) {
 		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
 		GEM_BUG_ON(engine->last_retired_context !=
@@ -2953,78 +2947,76 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	}
 }
 
+static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
+{
+	if (i915_gem_switch_to_kernel_context(i915))
+		return false;
+
+	if (i915_gem_wait_for_idle(i915,
+				   I915_WAIT_LOCKED |
+				   I915_WAIT_FOR_IDLE_BOOST,
+				   HZ / 10))
+		return false;
+
+	assert_kernel_context_is_current(i915);
+	return true;
+}
+
 static void
 i915_gem_idle_work_handler(struct work_struct *work)
 {
-	struct drm_i915_private *dev_priv =
-		container_of(work, typeof(*dev_priv), gt.idle_work.work);
+	struct drm_i915_private *i915 =
+		container_of(work, typeof(*i915), gt.idle_work.work);
+	typeof(i915->gt) *gt = &i915->gt;
 	unsigned int epoch = I915_EPOCH_INVALID;
 	bool rearm_hangcheck;
 
-	if (!READ_ONCE(dev_priv->gt.awake))
+	if (!READ_ONCE(gt->awake))
 		return;
 
-	if (READ_ONCE(dev_priv->gt.active_requests))
+	if (READ_ONCE(gt->active_requests))
 		return;
 
-	/*
-	 * Flush out the last user context, leaving only the pinned
-	 * kernel context resident. When we are idling on the kernel_context,
-	 * no more new requests (with a context switch) are emitted and we
-	 * can finally rest. A consequence is that the idle work handler is
-	 * always called at least twice before idling (and if the system is
-	 * idle that implies a round trip through the retire worker).
-	 */
-	mutex_lock(&dev_priv->drm.struct_mutex);
-	i915_gem_switch_to_kernel_context(dev_priv);
-	mutex_unlock(&dev_priv->drm.struct_mutex);
-
-	GEM_TRACE("active_requests=%d (after switch-to-kernel-context)\n",
-		  READ_ONCE(dev_priv->gt.active_requests));
-
-	/*
-	 * Wait for last execlists context complete, but bail out in case a
-	 * new request is submitted. As we don't trust the hardware, we
-	 * continue on if the wait times out. This is necessary to allow
-	 * the machine to suspend even if the hardware dies, and we will
-	 * try to recover in resume (after depriving the hardware of power,
-	 * it may be in a better mmod).
-	 */
-	__wait_for(if (new_requests_since_last_retire(dev_priv)) return,
-		   intel_engines_are_idle(dev_priv),
-		   I915_IDLE_ENGINES_TIMEOUT * 1000,
-		   10, 500);
-
 	rearm_hangcheck =
-		cancel_delayed_work_sync(&dev_priv->gpu_error.hangcheck_work);
+		cancel_delayed_work_sync(&i915->gpu_error.hangcheck_work);
 
-	if (!mutex_trylock(&dev_priv->drm.struct_mutex)) {
+	if (!mutex_trylock(&i915->drm.struct_mutex)) {
 		/* Currently busy, come back later */
-		mod_delayed_work(dev_priv->wq,
-				 &dev_priv->gt.idle_work,
+		mod_delayed_work(i915->wq,
+				 &gt->idle_work,
 				 msecs_to_jiffies(50));
 		goto out_rearm;
 	}
 
 	/*
-	 * New request retired after this work handler started, extend active
-	 * period until next instance of the work.
+	 * Flush out the last user context, leaving only the pinned
+	 * kernel context resident. Should anything unfortunate happen
+	 * while we are idle (such as the GPU being power cycled), no users
+	 * will be harmed.
 	 */
-	if (new_requests_since_last_retire(dev_priv))
-		goto out_unlock;
-
-	epoch = __i915_gem_park(dev_priv);
+	if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {
+		++gt->active_requests; /* don't requeue idle */
+
+		if (!switch_to_kernel_context_sync(i915)) {
+			dev_err(i915->drm.dev,
+				"Failed to idle engines, declaring wedged!\n");
+			GEM_TRACE_DUMP();
+			i915_gem_set_wedged(i915);
+		}
+		i915_retire_requests(i915);
 
-	assert_kernel_context_is_current(dev_priv);
+		if (!--gt->active_requests) {
+			epoch = __i915_gem_park(i915);
+			rearm_hangcheck = false;
+		}
+	}
 
-	rearm_hangcheck = false;
-out_unlock:
-	mutex_unlock(&dev_priv->drm.struct_mutex);
+	mutex_unlock(&i915->drm.struct_mutex);
 
 out_rearm:
 	if (rearm_hangcheck) {
-		GEM_BUG_ON(!dev_priv->gt.awake);
-		i915_queue_hangcheck(dev_priv);
+		GEM_BUG_ON(!gt->awake);
+		i915_queue_hangcheck(i915);
 	}
 
 	/*
@@ -3035,11 +3027,11 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	 * period, and then queue a task (that will run last on the wq) to
 	 * shrink and re-optimize the caches.
 	 */
-	if (same_epoch(dev_priv, epoch)) {
+	if (same_epoch(i915, epoch)) {
 		struct sleep_rcu_work *s = kmalloc(sizeof(*s), GFP_KERNEL);
 		if (s) {
 			init_rcu_head(&s->rcu);
-			s->i915 = dev_priv;
+			s->i915 = i915;
 			s->epoch = epoch;
 			call_rcu(&s->rcu, __sleep_rcu);
 		}
@@ -3249,7 +3241,6 @@ int i915_gem_wait_for_idle(struct drm_i915_private *i915,
 			return err;
 
 		i915_retire_requests(i915);
-		GEM_BUG_ON(i915->gt.active_requests);
 	}
 
 	return 0;
@@ -4458,10 +4449,9 @@ void i915_gem_sanitize(struct drm_i915_private *i915)
 	mutex_unlock(&i915->drm.struct_mutex);
 }
 
-int i915_gem_suspend(struct drm_i915_private *i915)
+void i915_gem_suspend(struct drm_i915_private *i915)
 {
 	intel_wakeref_t wakeref;
-	int ret;
 
 	GEM_TRACE("\n");
 
@@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!i915_terminally_wedged(&i915->gpu_error)) {
-		ret = i915_gem_switch_to_kernel_context(i915);
-		if (ret)
-			goto err_unlock;
-
-		ret = i915_gem_wait_for_idle(i915,
-					     I915_WAIT_INTERRUPTIBLE |
-					     I915_WAIT_LOCKED |
-					     I915_WAIT_FOR_IDLE_BOOST,
-					     HZ / 5);
-		if (ret == -EINTR)
-			goto err_unlock;
-
+	if (!switch_to_kernel_context_sync(i915)) {
 		/* Forcibly cancel outstanding work and leave the gpu quiet. */
 		i915_gem_set_wedged(i915);
 	}
@@ -4517,12 +4495,6 @@ int i915_gem_suspend(struct drm_i915_private *i915)
 	GEM_BUG_ON(i915->gt.awake);
 
 	intel_runtime_pm_put(i915, wakeref);
-	return 0;
-
-err_unlock:
-	mutex_unlock(&i915->drm.struct_mutex);
-	intel_runtime_pm_put(i915, wakeref);
-	return ret;
 }
 
 void i915_gem_suspend_late(struct drm_i915_private *i915)
@@ -4788,18 +4760,11 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 			goto err_active;
 	}
 
-	err = i915_gem_switch_to_kernel_context(i915);
-	if (err)
-		goto err_active;
-
-	if (i915_gem_wait_for_idle(i915, I915_WAIT_LOCKED, HZ / 5)) {
-		i915_gem_set_wedged(i915);
+	if (!switch_to_kernel_context_sync(i915)) {
 		err = -EIO; /* Caller will declare us wedged */
 		goto err_active;
 	}
 
-	assert_kernel_context_is_current(i915);
-
 	/*
 	 * Immediately park the GPU so that we enable powersaving and
 	 * treat it as idle. The next time we issue a request, we will
@@ -5043,7 +5008,7 @@ int i915_gem_init(struct drm_i915_private *dev_priv)
 err_init_hw:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
 
-	WARN_ON(i915_gem_suspend(dev_priv));
+	i915_gem_suspend(dev_priv);
 	i915_gem_suspend_late(dev_priv);
 
 	i915_gem_drain_workqueue(dev_priv);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 280813a4bf82..1bdb067845f2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -745,6 +745,10 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
 	lockdep_assert_held(&i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915->kernel_context);
 
+	/* Inoperable, so presume the GPU is safely pointing into the void! */
+	if (i915_terminally_wedged(&i915->gpu_error))
+		return 0;
+
 	i915_retire_requests(i915);
 
 	for_each_engine(engine, i915, id) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem.c b/drivers/gpu/drm/i915/selftests/i915_gem.c
index e77b7ed449ae..50bb7bbd26d3 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem.c
@@ -84,14 +84,9 @@ static void simulate_hibernate(struct drm_i915_private *i915)
 
 static int pm_prepare(struct drm_i915_private *i915)
 {
-	int err = 0;
-
-	if (i915_gem_suspend(i915)) {
-		pr_err("i915_gem_suspend failed\n");
-		err = -EINVAL;
-	}
+	i915_gem_suspend(i915);
 
-	return err;
+	return 0;
 }
 
 static void pm_suspend(struct drm_i915_private *i915)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (23 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 18:51   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 26/46] drm/i915: Refactor common code to load initial power context Chris Wilson
                   ` (27 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we are introducing a broad virtual engine to encompass
multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
reflect the broader set of engines implied by the virtual instance, lets
store the full bitmask.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_reset.c                | 4 ++--
 drivers/gpu/drm/i915/intel_engine_cs.c           | 3 +++
 drivers/gpu/drm/i915/intel_hangcheck.c           | 8 ++++----
 drivers/gpu/drm/i915/intel_ringbuffer.c          | 4 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.h          | 7 +------
 drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c     | 1 +
 7 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index 7051c0a43941..78c9689629a0 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -1053,7 +1053,7 @@ void i915_reset(struct drm_i915_private *i915,
 static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
 					struct intel_engine_cs *engine)
 {
-	return intel_gpu_reset(i915, intel_engine_flag(engine));
+	return intel_gpu_reset(i915, engine->mask);
 }
 
 /**
@@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 				continue;
 
 			if (i915_reset_engine(engine, msg) == 0)
-				engine_mask &= ~intel_engine_flag(engine);
+				engine_mask &= ~engine->mask;
 
 			clear_bit(I915_RESET_ENGINE + engine->id,
 				  &error->flags);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index ce7c19f2ae49..45e38877ab17 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -313,7 +313,10 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	if (!engine)
 		return -ENOMEM;
 
+	BUILD_BUG_ON(BITS_PER_TYPE(engine->mask) < I915_NUM_ENGINES);
+
 	engine->id = id;
+	engine->mask = BIT(id);
 	engine->i915 = dev_priv;
 	__sprint_engine_name(engine->name, info);
 	engine->hw_id = engine->guc_id = info->hw_id;
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index e04b2560369e..58b6ff8453dc 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -120,7 +120,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
 	 */
 	tmp = I915_READ_CTL(engine);
 	if (tmp & RING_WAIT) {
-		i915_handle_error(dev_priv, BIT(engine->id), 0,
+		i915_handle_error(dev_priv, engine->mask, 0,
 				  "stuck wait on %s", engine->name);
 		I915_WRITE_CTL(engine, tmp);
 		return ENGINE_WAIT_KICK;
@@ -282,13 +282,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 		hangcheck_store_sample(engine, &hc);
 
 		if (hc.stalled) {
-			hung |= intel_engine_flag(engine);
+			hung |= engine->mask;
 			if (hc.action != ENGINE_DEAD)
-				stuck |= intel_engine_flag(engine);
+				stuck |= engine->mask;
 		}
 
 		if (hc.wedged)
-			wedged |= intel_engine_flag(engine);
+			wedged |= engine->mask;
 	}
 
 	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 1b96b0960adc..91c49f644898 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1859,8 +1859,8 @@ static int switch_context(struct i915_request *rq)
 				goto err;
 		} while (--loops);
 
-		if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
-			unwind_mm = intel_engine_flag(engine);
+		if (ppgtt->pd_dirty_rings & engine->mask) {
+			unwind_mm = engine->mask;
 			ppgtt->pd_dirty_rings &= ~unwind_mm;
 			hw_flags = MI_FORCE_RESTORE;
 		}
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 39a9ee7b61e2..7777d46784f9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -334,6 +334,7 @@ struct intel_engine_cs {
 	enum intel_engine_id id;
 	unsigned int hw_id;
 	unsigned int guc_id;
+	unsigned long mask;
 
 	u8 uabi_id;
 	u8 uabi_class;
@@ -668,12 +669,6 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
 	return port;
 }
 
-static inline unsigned int
-intel_engine_flag(const struct intel_engine_cs *engine)
-{
-	return BIT(engine->id);
-}
-
 static inline u32
 intel_read_status_page(const struct intel_engine_cs *engine, int reg)
 {
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 4aa57d0d1b92..50a7f57a00a4 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1142,7 +1142,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
 
 out_reset:
 	igt_global_reset_lock(i915);
-	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
+	fake_hangcheck(rq->i915, rq->engine->mask);
 	igt_global_reset_unlock(i915);
 
 	if (tsk) {
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index ec1ae948954c..c2c954f64226 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -223,6 +223,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.i915 = i915;
 	snprintf(engine->base.name, sizeof(engine->base.name), "%s", name);
 	engine->base.id = id;
+	engine->base.mask = BIT(id);
 	engine->base.status_page.addr = (void *)(engine + 1);
 
 	engine->base.context_pin = mock_context_pin;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 26/46] drm/i915: Refactor common code to load initial power context
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (24 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 27/46] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
                   ` (26 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

We load a context (the kernel context) on both module load and resume in
order to initialise some logical state onto the GPU. We can use the same
routine for both operations, which will become more useful as we
refactor rc6/rps enabling.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c | 48 ++++++++++++++++-----------------
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 43bc26d5807a..690a111f3c58 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2962,6 +2962,22 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
 	return true;
 }
 
+static bool load_power_context(struct drm_i915_private *i915)
+{
+	if (!switch_to_kernel_context_sync(i915))
+		return false;
+
+	/*
+	 * Immediately park the GPU so that we enable powersaving and
+	 * treat it as idle. The next time we issue a request, we will
+	 * unpark and start using the engine->pinned_default_state, otherwise
+	 * it is in limbo and an early reset may fail.
+	 */
+	__i915_gem_park(i915);
+
+	return true;
+}
+
 static void
 i915_gem_idle_work_handler(struct work_struct *work)
 {
@@ -4562,7 +4578,7 @@ void i915_gem_resume(struct drm_i915_private *i915)
 	intel_uc_resume(i915);
 
 	/* Always reload a context for powersaving. */
-	if (i915_gem_switch_to_kernel_context(i915))
+	if (!load_power_context(i915))
 		goto err_wedged;
 
 out_unlock:
@@ -4727,7 +4743,7 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 	struct i915_gem_context *ctx;
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
-	int err;
+	int err = 0;
 
 	/*
 	 * As we reset the gpu during very early sanitisation, the current
@@ -4760,19 +4776,12 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 			goto err_active;
 	}
 
-	if (!switch_to_kernel_context_sync(i915)) {
-		err = -EIO; /* Caller will declare us wedged */
+	/* Flush the default context image to memory, and enable powersaving. */
+	if (!load_power_context(i915)) {
+		err = -EIO;
 		goto err_active;
 	}
 
-	/*
-	 * Immediately park the GPU so that we enable powersaving and
-	 * treat it as idle. The next time we issue a request, we will
-	 * unpark and start using the engine->pinned_default_state, otherwise
-	 * it is in limbo and an early reset may fail.
-	 */
-	__i915_gem_park(i915);
-
 	for_each_engine(engine, i915, id) {
 		struct i915_vma *state;
 		void *vaddr;
@@ -4838,19 +4847,10 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 err_active:
 	/*
 	 * If we have to abandon now, we expect the engines to be idle
-	 * and ready to be torn-down. First try to flush any remaining
-	 * request, ensure we are pointing at the kernel context and
-	 * then remove it.
+	 * and ready to be torn-down. The quickest way we can accomplish
+	 * this is by declaring ourselves wedged.
 	 */
-	if (WARN_ON(i915_gem_switch_to_kernel_context(i915)))
-		goto out_ctx;
-
-	if (WARN_ON(i915_gem_wait_for_idle(i915,
-					   I915_WAIT_LOCKED,
-					   MAX_SCHEDULE_TIMEOUT)))
-		goto out_ctx;
-
-	i915_gem_contexts_lost(i915);
+	i915_gem_set_wedged(i915);
 	goto out_ctx;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 27/46] drm/i915: Reduce presumption of request ordering for barriers
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (25 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 26/46] drm/i915: Refactor common code to load initial power context Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 28/46] drm/i915: Remove has-kernel-context Chris Wilson
                   ` (25 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

---
 drivers/gpu/drm/i915/i915_drv.h               |  1 +
 drivers/gpu/drm/i915/i915_gem.c               | 13 ++--
 drivers/gpu/drm/i915/i915_gem_context.c       | 66 +------------------
 drivers/gpu/drm/i915/i915_gem_context.h       |  3 +-
 drivers/gpu/drm/i915/i915_gem_evict.c         |  2 +-
 drivers/gpu/drm/i915/i915_request.c           |  1 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |  5 ++
 .../gpu/drm/i915/selftests/i915_gem_context.c |  3 +-
 .../gpu/drm/i915/selftests/igt_flush_test.c   |  2 +-
 .../gpu/drm/i915/selftests/mock_gem_device.c  |  4 ++
 10 files changed, 28 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8a72dad9471f..e554691304dc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1980,6 +1980,7 @@ struct drm_i915_private {
 
 		struct list_head active_rings;
 		struct list_head closed_vma;
+		unsigned long active_engines;
 		u32 active_requests;
 
 		/**
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 690a111f3c58..cb3232a16394 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2947,9 +2947,10 @@ static void assert_kernel_context_is_current(struct drm_i915_private *i915)
 	}
 }
 
-static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
+static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
+					  unsigned long mask)
 {
-	if (i915_gem_switch_to_kernel_context(i915))
+	if (i915_gem_switch_to_kernel_context(i915, mask))
 		return false;
 
 	if (i915_gem_wait_for_idle(i915,
@@ -2964,7 +2965,8 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915)
 
 static bool load_power_context(struct drm_i915_private *i915)
 {
-	if (!switch_to_kernel_context_sync(i915))
+	/* Force loading the kernel context on all engines */
+	if (!switch_to_kernel_context_sync(i915, -1))
 		return false;
 
 	/*
@@ -3013,7 +3015,8 @@ i915_gem_idle_work_handler(struct work_struct *work)
 	if (!gt->active_requests && !work_pending(&gt->idle_work.work)) {
 		++gt->active_requests; /* don't requeue idle */
 
-		if (!switch_to_kernel_context_sync(i915)) {
+		if (!switch_to_kernel_context_sync(i915,
+						   i915->gt.active_engines)) {
 			dev_err(i915->drm.dev,
 				"Failed to idle engines, declaring wedged!\n");
 			GEM_TRACE_DUMP();
@@ -4487,7 +4490,7 @@ void i915_gem_suspend(struct drm_i915_private *i915)
 	 * state. Fortunately, the kernel_context is disposable and we do
 	 * not rely on its state.
 	 */
-	if (!switch_to_kernel_context_sync(i915)) {
+	if (!switch_to_kernel_context_sync(i915, i915->gt.active_engines)) {
 		/* Forcibly cancel outstanding work and leave the gpu quiet. */
 		i915_gem_set_wedged(i915);
 	}
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 1bdb067845f2..582a7015e6a4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -682,63 +682,10 @@ last_request_on_engine(struct i915_timeline *timeline,
 	return NULL;
 }
 
-static bool engine_has_kernel_context_barrier(struct intel_engine_cs *engine)
-{
-	struct drm_i915_private *i915 = engine->i915;
-	const struct intel_context * const ce =
-		to_intel_context(i915->kernel_context, engine);
-	struct i915_timeline *barrier = ce->ring->timeline;
-	struct intel_ring *ring;
-	bool any_active = false;
-
-	lockdep_assert_held(&i915->drm.struct_mutex);
-	list_for_each_entry(ring, &i915->gt.active_rings, active_link) {
-		struct i915_request *rq;
-
-		rq = last_request_on_engine(ring->timeline, engine);
-		if (!rq)
-			continue;
-
-		any_active = true;
-
-		if (rq->hw_context == ce)
-			continue;
-
-		/*
-		 * Was this request submitted after the previous
-		 * switch-to-kernel-context?
-		 */
-		if (!i915_timeline_sync_is_later(barrier, &rq->fence)) {
-			GEM_TRACE("%s needs barrier for %llx:%lld\n",
-				  ring->timeline->name,
-				  rq->fence.context,
-				  rq->fence.seqno);
-			return false;
-		}
-
-		GEM_TRACE("%s has barrier after %llx:%lld\n",
-			  ring->timeline->name,
-			  rq->fence.context,
-			  rq->fence.seqno);
-	}
-
-	/*
-	 * If any other timeline was still active and behind the last barrier,
-	 * then our last switch-to-kernel-context must still be queued and
-	 * will run last (leaving the engine in the kernel context when it
-	 * eventually idles).
-	 */
-	if (any_active)
-		return true;
-
-	/* The engine is idle; check that it is idling in the kernel context. */
-	return engine->last_retired_context == ce;
-}
-
-int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
+int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
+				      unsigned long mask)
 {
 	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
 
 	GEM_TRACE("awake?=%s\n", yesno(i915->gt.awake));
 
@@ -749,17 +696,11 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
 	if (i915_terminally_wedged(&i915->gpu_error))
 		return 0;
 
-	i915_retire_requests(i915);
-
-	for_each_engine(engine, i915, id) {
+	for_each_engine_masked(engine, i915, mask, mask) {
 		struct intel_ring *ring;
 		struct i915_request *rq;
 
 		GEM_BUG_ON(!to_intel_context(i915->kernel_context, engine));
-		if (engine_has_kernel_context_barrier(engine))
-			continue;
-
-		GEM_TRACE("emit barrier on %s\n", engine->name);
 
 		rq = i915_request_alloc(engine, i915->kernel_context);
 		if (IS_ERR(rq))
@@ -783,7 +724,6 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915)
 			i915_sw_fence_await_sw_fence_gfp(&rq->submit,
 							 &prev->submit,
 							 I915_FENCE_GFP);
-			i915_timeline_sync_set(rq->timeline, &prev->fence);
 		}
 
 		i915_request_add(rq);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ca150a764c24..651f2e4badb6 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -354,7 +354,8 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 void i915_gem_context_close(struct drm_file *file);
 
 int i915_switch_context(struct i915_request *rq);
-int i915_gem_switch_to_kernel_context(struct drm_i915_private *dev_priv);
+int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
+				      unsigned long engine_mask);
 
 void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 68d74c50ac39..7d8e90dfca84 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -62,7 +62,7 @@ static int ggtt_flush(struct drm_i915_private *i915)
 	 * the hopes that we can then remove contexts and the like only
 	 * bound by their active reference.
 	 */
-	err = i915_gem_switch_to_kernel_context(i915);
+	err = i915_gem_switch_to_kernel_context(i915, i915->gt.active_engines);
 	if (err)
 		return err;
 
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 8321f2d8a301..3aaf9d32768d 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1057,6 +1057,7 @@ void i915_request_add(struct i915_request *request)
 		GEM_TRACE("marking %s as active\n", ring->timeline->name);
 		list_add(&ring->active_link, &request->i915->gt.active_rings);
 	}
+	request->i915->gt.active_engines |= request->engine->mask;
 	request->emitted_jiffies = jiffies;
 
 	/*
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 45e38877ab17..0f2f8a10149d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1114,6 +1114,9 @@ bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
 
 	lockdep_assert_held(&engine->i915->drm.struct_mutex);
 
+	if (!engine->context_size)
+		return true;
+
 	/*
 	 * Check the last context seen by the engine. If active, it will be
 	 * the last request that remains in the timeline. When idle, it is
@@ -1213,6 +1216,8 @@ void intel_engines_park(struct drm_i915_private *i915)
 		i915_gem_batch_pool_fini(&engine->batch_pool);
 		engine->execlists.no_priolist = false;
 	}
+
+	i915->gt.active_engines = 0;
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index f8931086eb70..08a4dd697e25 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1512,7 +1512,8 @@ static int __igt_switch_to_kernel_context(struct drm_i915_private *i915,
 			}
 		}
 
-		err = i915_gem_switch_to_kernel_context(i915);
+		err = i915_gem_switch_to_kernel_context(i915,
+							i915->gt.active_engines);
 		if (err)
 			return err;
 
diff --git a/drivers/gpu/drm/i915/selftests/igt_flush_test.c b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
index af66e3d4e23a..e6a3395459f1 100644
--- a/drivers/gpu/drm/i915/selftests/igt_flush_test.c
+++ b/drivers/gpu/drm/i915/selftests/igt_flush_test.c
@@ -14,7 +14,7 @@ int igt_flush_test(struct drm_i915_private *i915, unsigned int flags)
 	cond_resched();
 
 	if (flags & I915_WAIT_LOCKED &&
-	    i915_gem_switch_to_kernel_context(i915)) {
+	    i915_gem_switch_to_kernel_context(i915, i915->gt.active_engines)) {
 		pr_err("Failed to switch back to kernel context; declaring wedged\n");
 		i915_gem_set_wedged(i915);
 	}
diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
index 5a98caba6d69..d00679e21415 100644
--- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
+++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
@@ -112,6 +112,10 @@ static void mock_retire_work_handler(struct work_struct *work)
 
 static void mock_idle_work_handler(struct work_struct *work)
 {
+	struct drm_i915_private *i915 =
+		container_of(work, typeof(*i915), gt.idle_work.work);
+
+	i915->gt.active_engines = 0;
 }
 
 static int pm_domain_resume(struct device *dev)
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 28/46] drm/i915: Remove has-kernel-context
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (26 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 27/46] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method Chris Wilson
                   ` (24 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

We can no longer assume execution ordering, and in particular we cannot
assume which context will execute last. One side-effect of this is that
we cannot determine if the kernel-context is resident on the GPU, so
remove the routines that claimed to do so.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_active.h      | 13 -----------
 drivers/gpu/drm/i915/i915_gem.c         | 18 --------------
 drivers/gpu/drm/i915/i915_gem_evict.c   | 16 +++----------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 31 -------------------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 3 insertions(+), 76 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
index 5fbd9102384b..a049ccd478c6 100644
--- a/drivers/gpu/drm/i915/i915_active.h
+++ b/drivers/gpu/drm/i915/i915_active.h
@@ -108,19 +108,6 @@ i915_active_request_set_retire_fn(struct i915_active_request *active,
 	active->retire = fn ?: i915_active_retire_noop;
 }
 
-static inline struct i915_request *
-__i915_active_request_peek(const struct i915_active_request *active)
-{
-	/*
-	 * Inside the error capture (running with the driver in an unknown
-	 * state), we want to bend the rules slightly (a lot).
-	 *
-	 * Work is in progress to make it safer, in the meantime this keeps
-	 * the known issue from spamming the logs.
-	 */
-	return rcu_dereference_protected(active->request, 1);
-}
-
 /**
  * i915_active_request_raw - return the active request
  * @active - the active tracker
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index cb3232a16394..81e950bac246 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2930,23 +2930,6 @@ static void __sleep_rcu(struct rcu_head *rcu)
 	}
 }
 
-static void assert_kernel_context_is_current(struct drm_i915_private *i915)
-{
-	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
-
-	if (i915_terminally_wedged(&i915->gpu_error))
-		return;
-
-	i915_retire_requests(i915);
-
-	for_each_engine(engine, i915, id) {
-		GEM_BUG_ON(__i915_active_request_peek(&engine->timeline.last_request));
-		GEM_BUG_ON(engine->last_retired_context !=
-			   to_intel_context(i915->kernel_context, engine));
-	}
-}
-
 static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
 					  unsigned long mask)
 {
@@ -2959,7 +2942,6 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
 				   HZ / 10))
 		return false;
 
-	assert_kernel_context_is_current(i915);
 	return true;
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_evict.c b/drivers/gpu/drm/i915/i915_gem_evict.c
index 7d8e90dfca84..060f5903544a 100644
--- a/drivers/gpu/drm/i915/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/i915_gem_evict.c
@@ -38,25 +38,15 @@ I915_SELFTEST_DECLARE(static struct igt_evict_ctl {
 
 static bool ggtt_is_idle(struct drm_i915_private *i915)
 {
-       struct intel_engine_cs *engine;
-       enum intel_engine_id id;
-
-       if (i915->gt.active_requests)
-	       return false;
-
-       for_each_engine(engine, i915, id) {
-	       if (!intel_engine_has_kernel_context(engine))
-		       return false;
-       }
-
-       return true;
+	return !i915->gt.active_requests;
 }
 
 static int ggtt_flush(struct drm_i915_private *i915)
 {
 	int err;
 
-	/* Not everything in the GGTT is tracked via vma (otherwise we
+	/*
+	 * Not everything in the GGTT is tracked via vma (otherwise we
 	 * could evict as required with minimal stalling) so we are forced
 	 * to idle the GPU and explicitly retire outstanding requests in
 	 * the hopes that we can then remove contexts and the like only
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 0f2f8a10149d..94de0ba1d92b 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -1098,37 +1098,6 @@ bool intel_engines_are_idle(struct drm_i915_private *dev_priv)
 	return true;
 }
 
-/**
- * intel_engine_has_kernel_context:
- * @engine: the engine
- *
- * Returns true if the last context to be executed on this engine, or has been
- * executed if the engine is already idle, is the kernel context
- * (#i915.kernel_context).
- */
-bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine)
-{
-	const struct intel_context *kernel_context =
-		to_intel_context(engine->i915->kernel_context, engine);
-	struct i915_request *rq;
-
-	lockdep_assert_held(&engine->i915->drm.struct_mutex);
-
-	if (!engine->context_size)
-		return true;
-
-	/*
-	 * Check the last context seen by the engine. If active, it will be
-	 * the last request that remains in the timeline. When idle, it is
-	 * the last executed context as tracked by retirement.
-	 */
-	rq = __i915_active_request_peek(&engine->timeline.last_request);
-	if (rq)
-		return rq->hw_context == kernel_context;
-	else
-		return engine->last_retired_context == kernel_context;
-}
-
 void intel_engines_reset_default_submission(struct drm_i915_private *i915)
 {
 	struct intel_engine_cs *engine;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7777d46784f9..c0027e058b1f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -936,7 +936,6 @@ void intel_engines_sanitize(struct drm_i915_private *i915, bool force);
 bool intel_engine_is_idle(struct intel_engine_cs *engine);
 bool intel_engines_are_idle(struct drm_i915_private *dev_priv);
 
-bool intel_engine_has_kernel_context(const struct intel_engine_cs *engine);
 void intel_engine_lost_context(struct intel_engine_cs *engine);
 
 void intel_engines_park(struct drm_i915_private *i915);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (27 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 28/46] drm/i915: Remove has-kernel-context Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 19:00   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 30/46] drm/i915: Track active engines within a context Chris Wilson
                   ` (23 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile               |  1 +
 drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
 drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
 include/uapi/drm/i915_drm.h                 | 20 ++++++++++
 5 files changed, 91 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index a1d834068765..89105b1aaf12 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -46,6 +46,7 @@ i915-y := i915_drv.o \
 	  i915_sw_fence.o \
 	  i915_syncmap.o \
 	  i915_sysfs.o \
+	  i915_user_extensions.o \
 	  intel_csr.o \
 	  intel_device_info.o \
 	  intel_pm.o \
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
new file mode 100644
index 000000000000..879b4094b2d7
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.c
@@ -0,0 +1,43 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include <linux/sched/signal.h>
+#include <linux/uaccess.h>
+#include <uapi/drm/i915_drm.h>
+
+#include "i915_user_extensions.h"
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned long count,
+			 void *data)
+{
+	unsigned int stackdepth = 512;
+
+	while (ext) {
+		int err;
+		u64 x;
+
+		if (!stackdepth--) /* recursion vs useful flexibility */
+			return -EINVAL;
+
+		if (get_user(x, &ext->name))
+			return -EFAULT;
+
+		err = -EINVAL;
+		if (x < count && tbl[x])
+			err = tbl[x](ext, data);
+		if (err)
+			return err;
+
+		if (get_user(x, &ext->next_extension))
+			return -EFAULT;
+
+		ext = u64_to_user_ptr(x);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
new file mode 100644
index 000000000000..313a510b068a
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.h
@@ -0,0 +1,20 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef I915_USER_EXTENSIONS_H
+#define I915_USER_EXTENSIONS_H
+
+struct i915_user_extension;
+
+typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
+				      void *data);
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned long count,
+			 void *data);
+
+#endif /* I915_USER_EXTENSIONS_H */
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 9726df37c4c4..fcc751aa1ea8 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -105,6 +105,13 @@
 	__T;								\
 })
 
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })
+
 static inline u64 ptr_to_u64(const void *ptr)
 {
 	return (uintptr_t)ptr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index b641c55420b6..be2fcdf3ba90 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -62,6 +62,26 @@ extern "C" {
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * i915_user_extension: Base class for defining a chain of extensions
+ *
+ * Many interfaces need to grow over time. In most cases we can simply
+ * extend the struct and have userspace pass in more data. Another option,
+ * as demonstrated by Vulkan's approach to providing extensions for forward
+ * and backward compatibility, is to use a list of optional structs to
+ * provide those extra details.
+ *
+ * The key advantage to using an extension chain is that it allows us to
+ * redefine the interface more easily than an ever growing struct of
+ * increasing complexity, and for large parts of that interface to be
+ * entirely optional. The downside is more pointer chasing; chasing across
+ * the __user boundary with pointers encapsulated inside u64.
+ */
+struct i915_user_extension {
+	__u64 next_extension;
+	__u64 name;
+};
+
 /*
  * MOCS indexes used for GPU surfaces, defining the cacheability of the
  * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 30/46] drm/i915: Track active engines within a context
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (28 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-11 19:11   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 31/46] drm/i915: Introduce a context barrier callback Chris Wilson
                   ` (22 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

For use in the next patch, if we track which engines have been used by
the HW, we can reduce the work required to flush our state off the HW to
those engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 3 +++
 drivers/gpu/drm/i915/i915_gem_context.h       | 4 ++++
 drivers/gpu/drm/i915/intel_lrc.c              | 2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 2 ++
 drivers/gpu/drm/i915/selftests/mock_context.c | 1 +
 drivers/gpu/drm/i915/selftests/mock_engine.c  | 2 ++
 6 files changed, 14 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 582a7015e6a4..91037ca96be1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -210,6 +210,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
+	GEM_BUG_ON(!list_empty(&ctx->active_engines));
 
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
@@ -337,6 +338,7 @@ intel_context_init(struct intel_context *ce,
 		   struct intel_engine_cs *engine)
 {
 	ce->gem_context = ctx;
+	ce->engine = engine;
 
 	INIT_LIST_HEAD(&ce->signal_link);
 	INIT_LIST_HEAD(&ce->signals);
@@ -364,6 +366,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	list_add_tail(&ctx->link, &dev_priv->contexts.list);
 	ctx->i915 = dev_priv;
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
+	INIT_LIST_HEAD(&ctx->active_engines);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 651f2e4badb6..ab89c7501408 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -161,6 +161,8 @@ struct i915_gem_context {
 	atomic_t hw_id_pin_count;
 	struct list_head hw_id_link;
 
+	struct list_head active_engines;
+
 	/**
 	 * @user_handle: userspace identifier
 	 *
@@ -174,7 +176,9 @@ struct i915_gem_context {
 	/** engine: per-engine logical HW state */
 	struct intel_context {
 		struct i915_gem_context *gem_context;
+		struct intel_engine_cs *engine;
 		struct intel_engine_cs *active;
+		struct list_head active_link;
 		struct list_head signal_link;
 		struct list_head signals;
 		struct i915_vma *state;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 2424eb2b1fc6..b3555b1b0e07 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1276,6 +1276,7 @@ static void execlists_context_unpin(struct intel_context *ce)
 	i915_gem_object_unpin_map(ce->state->obj);
 	i915_vma_unpin(ce->state);
 
+	list_del(&ce->active_link);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -1361,6 +1362,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
 
 	ce->state->obj->pin_global++;
 	i915_gem_context_get(ctx);
+	list_add(&ce->active_link, &ctx->active_engines);
 	return ce;
 
 unpin_ring:
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 91c49f644898..4557f715663d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1430,6 +1430,7 @@ static void intel_ring_context_unpin(struct intel_context *ce)
 	__context_unpin_ppgtt(ce->gem_context);
 	__context_unpin(ce);
 
+	list_del(&ce->active_link);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -1530,6 +1531,7 @@ __ring_context_pin(struct intel_engine_cs *engine,
 		goto err_unpin;
 
 	i915_gem_context_get(ctx);
+	list_add(&ce->active_link, &ctx->active_engines);
 
 	/* One ringbuffer to rule them all */
 	GEM_BUG_ON(!engine->buffer);
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index b646cdcdd602..1b5073a362eb 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -44,6 +44,7 @@ mock_context(struct drm_i915_private *i915,
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
+	INIT_LIST_HEAD(&ctx->active_engines);
 
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
 		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index c2c954f64226..b8c6769571c4 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -125,6 +125,7 @@ static void hw_delay_complete(struct timer_list *t)
 static void mock_context_unpin(struct intel_context *ce)
 {
 	mock_timeline_unpin(ce->ring->timeline);
+	list_del(&ce->active_link);
 	i915_gem_context_put(ce->gem_context);
 }
 
@@ -161,6 +162,7 @@ mock_context_pin(struct intel_engine_cs *engine,
 
 	ce->ops = &mock_context_ops;
 	i915_gem_context_get(ctx);
+	list_add(&ce->active_link, &ctx->active_engines);
 	return ce;
 
 err:
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 31/46] drm/i915: Introduce a context barrier callback
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (29 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 30/46] drm/i915: Track active engines within a context Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
                   ` (21 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to update live state within a context.
As this state may be in use by the GPU and we haven't been explicitly
tracking its activity, we instead attach it to a request we send down
the context setup with its new state and on retiring that request
cleanup the old state as we then know that it is no longer live.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       |  74 +++++++++++++
 .../gpu/drm/i915/selftests/i915_gem_context.c | 103 ++++++++++++++++++
 2 files changed, 177 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 91037ca96be1..c3f41f501276 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -685,6 +685,80 @@ last_request_on_engine(struct i915_timeline *timeline,
 	return NULL;
 }
 
+struct context_barrier_task {
+	struct i915_active base;
+	void (*task)(void *data);
+	void *data;
+};
+
+static void cb_retire(struct i915_active *base)
+{
+	struct context_barrier_task *cb = container_of(base, typeof(*cb), base);
+
+	if (cb->task)
+		cb->task(cb->data);
+
+	i915_active_fini(&cb->base);
+	kfree(cb);
+}
+
+I915_SELFTEST_DECLARE(static unsigned long context_barrier_inject_fault);
+static int context_barrier_task(struct i915_gem_context *ctx,
+				unsigned long engines,
+				void (*task)(void *data),
+				void *data)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct context_barrier_task *cb;
+	struct intel_context *ce;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+	GEM_BUG_ON(!task);
+
+	cb = kmalloc(sizeof(*cb), GFP_KERNEL);
+	if (!cb)
+		return -ENOMEM;
+
+	i915_active_init(i915, &cb->base, cb_retire);
+	i915_active_acquire(&cb->base);
+
+	wakeref = intel_runtime_pm_get(i915);
+	list_for_each_entry(ce, &ctx->active_engines, active_link) {
+		struct intel_engine_cs *engine = ce->engine;
+		struct i915_request *rq;
+
+		if (!(ce->engine->mask & engines))
+			continue;
+
+		if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
+				       engine->mask)) {
+			err = -ENXIO;
+			break;
+		}
+
+		rq = i915_request_alloc(engine, ctx);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+
+		err = i915_active_ref(&cb->base, rq->fence.context, rq);
+		i915_request_add(rq);
+		if (err)
+			break;
+	}
+	intel_runtime_pm_put(i915, wakeref);
+
+	cb->task = err ? NULL : task; /* caller needs to unwind instead */
+	cb->data = data;
+
+	i915_active_release(&cb->base);
+
+	return err;
+}
+
 int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 				      unsigned long mask)
 {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 08a4dd697e25..4b6df1c55345 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1594,10 +1594,113 @@ static int igt_switch_to_kernel_context(void *arg)
 	return err;
 }
 
+static void mock_barrier_task(void *data)
+{
+	unsigned int *counter = data;
+
+	++*counter;
+}
+
+static int mock_context_barrier(void *arg)
+{
+#undef pr_fmt
+#define pr_fmt(x) "context_barrier_task():" # x
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq;
+	intel_wakeref_t wakeref;
+	unsigned int counter;
+	int err;
+
+	/*
+	 * The context barrier provides us with a callback after it emits
+	 * a request; useful for retiring old state after loading new.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	ctx = mock_context(i915, "mock");
+	if (IS_ERR(ctx)) {
+		err = PTR_ERR(ctx);
+		goto unlock;
+	}
+
+	counter = 0;
+	err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	if (counter == 0) {
+		pr_err("Did not retire immediately with 0 engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	counter = 0;
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	if (counter == 0) {
+		pr_err("Did not retire immediately for all inactive engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	rq = ERR_PTR(-ENODEV);
+	with_intel_runtime_pm(i915, wakeref)
+		rq = i915_request_alloc(i915->engine[RCS], ctx);
+	if (IS_ERR(rq)) {
+		pr_err("Request allocation failed!\n");
+		goto out;
+	}
+	i915_request_add(rq);
+	GEM_BUG_ON(list_empty(&ctx->active_engines));
+
+	counter = 0;
+	context_barrier_inject_fault = BIT(RCS);
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	context_barrier_inject_fault = 0;
+	if (err == -ENXIO)
+		err = 0;
+	else
+		pr_err("Did not hit fault injection!\n");
+	if (counter != 0) {
+		pr_err("Invoked callback on error!\n");
+		err = -EIO;
+	}
+	if (err)
+		goto out;
+
+	counter = 0;
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	mock_device_flush(i915);
+	if (counter == 0) {
+		pr_err("Did not retire on each active engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	mock_context_close(ctx);
+unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+#undef pr_fmt
+#define pr_fmt(x) x
+}
+
 int i915_gem_context_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_switch_to_kernel_context),
+		SUBTEST(mock_context_barrier),
 	};
 	struct drm_i915_private *i915;
 	int err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (30 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 31/46] drm/i915: Introduce a context barrier callback Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-12 11:18   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
                   ` (20 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c               |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
 .../gpu/drm/i915/selftests/i915_gem_context.c | 239 ++++++++++++----
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
 drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
 include/uapi/drm/i915_drm.h                   |  35 +++
 11 files changed, 510 insertions(+), 71 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 36da8ab1e7ce..487e78094e93 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3005,6 +3005,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index e554691304dc..523de3644570 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -217,6 +217,9 @@ struct drm_i915_file_private {
 	} mm;
 	struct idr context_idr;
 
+	struct mutex vm_lock;
+	struct idr vm_idr;
+
 	unsigned int bsd_engine;
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index c3f41f501276..dd49b1ef3ff2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -110,6 +110,8 @@ static void lut_close(struct i915_gem_context *ctx)
 		struct i915_vma *vma = rcu_dereference_raw(*slot);
 
 		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
+
+		vma->open_count--;
 		__i915_gem_object_release_unless_active(vma->obj);
 	}
 	rcu_read_unlock();
@@ -293,7 +295,7 @@ static void context_close(struct i915_gem_context *ctx)
 	 */
 	lut_close(ctx);
 	if (ctx->ppgtt)
-		i915_ppgtt_close(&ctx->ppgtt->vm);
+		i915_ppgtt_close(ctx->ppgtt);
 
 	ctx->file_priv = ERR_PTR(-EBADF);
 	i915_gem_context_put(ctx);
@@ -425,6 +427,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
 	context_close(ctx);
 }
 
+static struct i915_hw_ppgtt *
+__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_hw_ppgtt *old = ctx->ppgtt;
+
+	i915_ppgtt_open(ppgtt);
+	ctx->ppgtt = i915_ppgtt_get(ppgtt);
+
+	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
+
+	return old;
+}
+
+static void __assign_ppgtt(struct i915_gem_context *ctx,
+			   struct i915_hw_ppgtt *ppgtt)
+{
+	if (ppgtt == ctx->ppgtt)
+		return;
+
+	ppgtt = __set_ppgtt(ctx, ppgtt);
+	if (ppgtt) {
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
 			struct drm_i915_file_private *file_priv)
@@ -451,8 +479,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 			return ERR_CAST(ppgtt);
 		}
 
-		ctx->ppgtt = ppgtt;
-		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
+		__assign_ppgtt(ctx, ppgtt);
+		i915_ppgtt_put(ppgtt);
 	}
 
 	trace_i915_context_create(ctx);
@@ -633,19 +661,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+static int vm_idr_cleanup(int id, void *p, void *data)
+{
+	i915_ppgtt_put(p);
+	return 0;
+}
+
 int i915_gem_context_open(struct drm_i915_private *i915,
 			  struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 
+	mutex_init(&file_priv->vm_lock);
+
 	idr_init(&file_priv->context_idr);
+	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	ctx = i915_gem_create_context(i915, file_priv);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
+		idr_destroy(&file_priv->vm_idr);
 		return PTR_ERR(ctx);
 	}
 
@@ -662,6 +700,89 @@ void i915_gem_context_close(struct drm_file *file)
 
 	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
 	idr_destroy(&file_priv->context_idr);
+
+	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
+	idr_destroy(&file_priv->vm_idr);
+
+	mutex_destroy(&file_priv->vm_lock);
+}
+
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_vm_control *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+
+	if (!HAS_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	ppgtt = i915_ppgtt_create(i915, file_priv);
+	if (IS_ERR(ppgtt))
+		return PTR_ERR(ppgtt);
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		goto err_put;
+
+	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+	mutex_unlock(&file_priv->vm_lock);
+	if (err < 0)
+		goto err_put;
+
+	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
+	ppgtt->user_handle = err;
+	args->id = err;
+	return 0;
+
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_vm_control *args = data;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+	u32 id;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	id = args->id;
+	if (!id)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_remove(&file_priv->vm_idr, id);
+	if (ppgtt) {
+		GEM_BUG_ON(!ppgtt->user_handle);
+		ppgtt->user_handle = 0;
+	}
+
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	i915_ppgtt_put(ppgtt);
+	return 0;
 }
 
 static struct i915_request *
@@ -809,6 +930,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 	return 0;
 }
 
+static int get_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int ret;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	/* XXX rcu acquire? */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ppgtt = i915_ppgtt_get(ctx->ppgtt);
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	ret = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (ret)
+		goto err_put;
+
+	if (!ppgtt->user_handle) {
+		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+		GEM_BUG_ON(!ret);
+		if (ret < 0)
+			goto err_unlock;
+
+		ppgtt->user_handle = ret;
+		i915_ppgtt_get(ppgtt);
+	}
+
+	args->size = 0;
+	args->value = ppgtt->user_handle;
+
+	ret = 0;
+err_unlock:
+	mutex_unlock(&file_priv->vm_lock);
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return ret;
+}
+
+static void set_ppgtt_barrier(void *data)
+{
+	struct i915_hw_ppgtt *old = data;
+
+	i915_ppgtt_close(old);
+	i915_ppgtt_put(old);
+}
+
+static int set_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt, *old;
+	int err;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (upper_32_bits(args->value))
+		return -EINVAL;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_find(&file_priv->vm_idr, args->value);
+	if (ppgtt) {
+		GEM_BUG_ON(ppgtt->user_handle != args->value);
+		i915_ppgtt_get(ppgtt);
+	}
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (err)
+		goto out;
+
+	if (ppgtt == ctx->ppgtt)
+		goto unlock;
+
+	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
+	lut_close(ctx);
+
+	old = __set_ppgtt(ctx, ppgtt);
+
+	/*
+	 * We need to flush any requests using the current ppgtt before
+	 * we release it as the requests do not hold a reference themselves,
+	 * only indirectly through the context.
+	 */
+	err = context_barrier_task(ctx, -1, set_ppgtt_barrier, old);
+	if (err) {
+		ctx->ppgtt = old;
+		ctx->desc_template = default_desc_template(ctx->i915, old);
+
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+
+unlock:
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+out:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
 {
 	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
@@ -979,6 +1214,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = get_sseu(ctx, args);
 		break;
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1276,9 +1514,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
 			ret = -EINVAL;
@@ -1325,9 +1560,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 					I915_USER_PRIORITY(priority);
 		}
 		break;
+
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = set_sseu(ctx, args);
 		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = set_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index ab89c7501408..c5a6cb10dbda 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -365,6 +365,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev);
 
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
+
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index d646d37eec2f..ccf10306b1f5 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2103,10 +2103,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
 	return ppgtt;
 }
 
-void i915_ppgtt_close(struct i915_address_space *vm)
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
 {
-	GEM_BUG_ON(vm->closed);
-	vm->closed = true;
+	GEM_BUG_ON(ppgtt->vm.closed);
+
+	ppgtt->open_count++;
+}
+
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
+{
+	GEM_BUG_ON(!ppgtt->open_count);
+	if (--ppgtt->open_count)
+		return;
+
+	GEM_BUG_ON(ppgtt->vm.closed);
+	ppgtt->vm.closed = true;
 }
 
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 03ade71b8d9a..bb750318f52a 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 
 	unsigned long pd_dirty_rings;
+	unsigned int open_count;
+
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
 		struct i915_page_directory_pointer pdp;	/* GEN8+ */
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
+
+	u32 user_handle;
 };
 
 struct gen6_hw_ppgtt {
@@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
 void i915_ppgtt_release(struct kref *kref);
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
 					struct drm_i915_file_private *fpriv);
-void i915_ppgtt_close(struct i915_address_space *vm);
-static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
+
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
+
+static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
-	if (ppgtt)
-		kref_get(&ppgtt->ref);
+	kref_get(&ppgtt->ref);
+	return ppgtt;
 }
+
 static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index a9a2fa35876f..a7ee8e97bcee 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
 	err = i915_subtests(tests, ppgtt);
 
 out_close:
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 4b6df1c55345..a76a4f6f67e4 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
 	return 0;
 }
 
-static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
+static noinline int cpu_check(struct drm_i915_gem_object *obj,
+			      unsigned int idx, unsigned int max)
 {
 	unsigned int n, m, needs_flush;
 	int err;
@@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (m = 0; m < max; m++) {
 			if (map[m] != m) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], m);
+				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
+				       __builtin_return_address(0), idx,
+				       n, real_page_count(obj), m, max,
+				       map[m], m);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (; m < DW_PER_PAGE; m++) {
 			if (map[m] != STACK_MAGIC) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], STACK_MAGIC);
+				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
+				       __builtin_return_address(0), idx, n, m,
+				       map[m], STACK_MAGIC);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
 static int igt_ctx_exec(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct drm_i915_gem_object *obj = NULL;
-	unsigned long ncontexts, ndwords, dw;
-	struct igt_live_test t;
-	struct drm_file *file;
-	IGT_TIMEOUT(end_time);
-	LIST_HEAD(objects);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int err = -ENODEV;
 
 	/*
@@ -495,38 +495,42 @@ static int igt_ctx_exec(void *arg)
 	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return 0;
 
-	file = mock_file(i915);
-	if (IS_ERR(file))
-		return PTR_ERR(file);
+	for_each_engine(engine, i915, id) {
+		struct drm_i915_gem_object *obj = NULL;
+		unsigned long ncontexts, ndwords, dw;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-	mutex_lock(&i915->drm.struct_mutex);
+		if (!intel_engine_can_store_dword(engine))
+			continue;
 
-	err = igt_live_test_begin(&t, i915, __func__, "");
-	if (err)
-		goto out_unlock;
+		if (!engine->context_size)
+			continue; /* No logical context support in HW */
 
-	ncontexts = 0;
-	ndwords = 0;
-	dw = 0;
-	while (!time_after(jiffies, end_time)) {
-		struct intel_engine_cs *engine;
-		struct i915_gem_context *ctx;
-		unsigned int id;
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
 
-		ctx = i915_gem_create_context(i915, file->driver_priv);
-		if (IS_ERR(ctx)) {
-			err = PTR_ERR(ctx);
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
 			goto out_unlock;
-		}
 
-		for_each_engine(engine, i915, id) {
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			struct i915_gem_context *ctx;
 			intel_wakeref_t wakeref;
 
-			if (!engine->context_size)
-				continue; /* No logical context support in HW */
-
-			if (!intel_engine_can_store_dword(engine))
-				continue;
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
 
 			if (!obj) {
 				obj = create_test_object(ctx, file, &objects);
@@ -536,7 +540,6 @@ static int igt_ctx_exec(void *arg)
 				}
 			}
 
-			err = 0;
 			with_intel_runtime_pm(i915, wakeref)
 				err = gpu_fill(obj, ctx, engine, dw);
 			if (err) {
@@ -551,32 +554,158 @@ static int igt_ctx_exec(void *arg)
 				obj = NULL;
 				dw = 0;
 			}
+
 			ndwords++;
+			ncontexts++;
 		}
-		ncontexts++;
+
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
+
+out_unlock:
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
+
+		mock_file_free(i915, file);
+		if (err)
+			return err;
 	}
-	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
-		ncontexts, RUNTIME_INFO(i915)->num_rings, ndwords);
 
-	dw = 0;
-	list_for_each_entry(obj, &objects, st_link) {
-		unsigned int rem =
-			min_t(unsigned int, ndwords - dw, max_dwords(obj));
+	return 0;
+}
+
+static int igt_shared_ctx_exec(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	int err = -ENODEV;
+
+	/*
+	 * Create a few different contexts with the same mm and write
+	 * through each ctx using the GPU making sure those writes end
+	 * up in the expected pages of our obj.
+	 */
+
+	for_each_engine(engine, i915, id) {
+		unsigned long ncontexts, ndwords, dw;
+		struct drm_i915_gem_object *obj = NULL;
+		struct i915_gem_context *ctx = NULL;
+		struct i915_gem_context *parent;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-		err = cpu_check(obj, rem);
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
 		if (err)
-			break;
+			goto out_unlock;
 
-		dw += rem;
-	}
+		parent = i915_gem_create_context(i915, file->driver_priv);
+		if (IS_ERR(parent)) {
+			err = PTR_ERR(parent);
+			if (err == -ENODEV) /* no logical ctx support */
+				err = 0;
+			goto out_unlock;
+		}
+
+		if (!parent->ppgtt) {
+			err = 0;
+			goto out_unlock;
+		}
+
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			intel_wakeref_t wakeref;
+
+			if (ctx)
+				__destroy_hw_context(ctx, file->driver_priv);
+
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
+
+			__assign_ppgtt(ctx, parent->ppgtt);
+
+			if (!obj) {
+				obj = create_test_object(parent, file, &objects);
+				if (IS_ERR(obj)) {
+					err = PTR_ERR(obj);
+					goto out_unlock;
+				}
+			}
+
+			err = 0;
+			with_intel_runtime_pm(i915, wakeref)
+				err = gpu_fill(obj, ctx, engine, dw);
+			if (err) {
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				       ndwords, dw, max_dwords(obj),
+				       engine->name, ctx->hw_id,
+				       yesno(!!ctx->ppgtt), err);
+				goto out_unlock;
+			}
+
+			if (++dw == max_dwords(obj)) {
+				obj = NULL;
+				dw = 0;
+			}
+
+			ndwords++;
+			ncontexts++;
+		}
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
 
 out_unlock:
-	if (igt_live_test_end(&t))
-		err = -EIO;
-	mutex_unlock(&i915->drm.struct_mutex);
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
 
-	mock_file_free(i915, file);
-	return err;
+		mock_file_free(i915, file);
+		if (err)
+			return err;
+	}
+
+	return 0;
 }
 
 static struct i915_vma *rpcs_query_batch(struct i915_vma *vma)
@@ -1048,7 +1177,7 @@ static int igt_ctx_readonly(void *arg)
 	struct drm_i915_gem_object *obj = NULL;
 	struct i915_gem_context *ctx;
 	struct i915_hw_ppgtt *ppgtt;
-	unsigned long ndwords, dw;
+	unsigned long idx, ndwords, dw;
 	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
@@ -1129,6 +1258,7 @@ static int igt_ctx_readonly(void *arg)
 		ndwords, RUNTIME_INFO(i915)->num_rings);
 
 	dw = 0;
+	idx = 0;
 	list_for_each_entry(obj, &objects, st_link) {
 		unsigned int rem =
 			min_t(unsigned int, ndwords - dw, max_dwords(obj));
@@ -1138,7 +1268,7 @@ static int igt_ctx_readonly(void *arg)
 		if (i915_gem_object_is_readonly(obj))
 			num_writes = 0;
 
-		err = cpu_check(obj, num_writes);
+		err = cpu_check(obj, idx++, num_writes);
 		if (err)
 			break;
 
@@ -1723,6 +1853,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_ctx_exec),
 		SUBTEST(igt_ctx_readonly),
 		SUBTEST(igt_ctx_sseu),
+		SUBTEST(igt_shared_ctx_exec),
 		SUBTEST(igt_vm_isolation),
 	};
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 3850ef4a5ec8..08a8f3d20854 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 
 	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
 
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 out_unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 1b5073a362eb..2189b606ca41 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
 		goto err_handles;
 
 	if (name) {
+		struct i915_hw_ppgtt *ppgtt;
+
 		ctx->name = kstrdup(name, GFP_KERNEL);
 		if (!ctx->name)
 			goto err_put;
 
-		ctx->ppgtt = mock_ppgtt(i915, name);
-		if (!ctx->ppgtt)
+		ppgtt = mock_ppgtt(i915, name);
+		if (!ppgtt)
 			goto err_put;
+
+		__set_ppgtt(ctx, ppgtt);
 	}
 
 	return ctx;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index be2fcdf3ba90..2cd79639d6b5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -339,6 +339,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_PERF_ADD_CONFIG	0x37
 #define DRM_I915_PERF_REMOVE_CONFIG	0x38
 #define DRM_I915_QUERY			0x39
+#define DRM_I915_GEM_VM_CREATE		0x3a
+#define DRM_I915_GEM_VM_DESTROY		0x3b
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -397,6 +399,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
 #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1444,6 +1448,26 @@ struct drm_i915_gem_context_destroy {
 	__u32 pad;
 };
 
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
 struct drm_i915_reg_read {
 	/*
 	 * Register offset.
@@ -1513,6 +1537,17 @@ struct drm_i915_gem_context_param {
 	 * drm_i915_gem_context_param_sseu.
 	 */
 #define I915_CONTEXT_PARAM_SSEU		0x7
+
+	/*
+	 * The id of the associated virtual memory address space (ppGTT) of
+	 * this context. Can be retrieved and passed to another context
+	 * (on the same fd) for both to use the same ppGTT and so share
+	 * address layouts, and avoid reloading the page tables on context
+	 * switches between themselves.
+	 *
+	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
+	 */
+#define I915_CONTEXT_PARAM_VM		0x8
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (31 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-12 13:43   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 34/46] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
                   ` (19 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

It can be useful to have a single ioctl to create a context with all
the initial parameters instead of a series of create + setparam + setparam
ioctls. This extension to create context allows any of the parameters
to be passed in as a linked list to be applied to the newly constructed
context.

v2: Make a local copy of user setparam (Tvrtko)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.c         |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c | 428 +++++++++++++-----------
 include/uapi/drm/i915_drm.h             | 163 ++++-----
 3 files changed, 325 insertions(+), 268 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 487e78094e93..fc11460f8327 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2994,7 +2994,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_SET_SPRITE_COLORKEY, intel_sprite_set_colorkey_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GET_SPRITE_COLORKEY, drm_noop, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index dd49b1ef3ff2..609ef59f4d95 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -89,6 +89,7 @@
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
 #include "i915_trace.h"
+#include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
 #include "intel_workarounds.h"
 
@@ -1044,188 +1045,6 @@ static int set_ppgtt(struct i915_gem_context *ctx,
 	return err;
 }
 
-static bool client_is_banned(struct drm_i915_file_private *file_priv)
-{
-	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
-}
-
-int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
-				  struct drm_file *file)
-{
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_i915_gem_context_create *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
-		return -ENODEV;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	if (client_is_banned(file_priv)) {
-		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
-			  current->comm,
-			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
-
-		return -EIO;
-	}
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
-	ctx = i915_gem_create_context(dev_priv, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
-
-	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
-
-	args->ctx_id = ctx->user_handle;
-	DRM_DEBUG("HW context %d created\n", args->ctx_id);
-
-	return 0;
-}
-
-int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
-				   struct drm_file *file)
-{
-	struct drm_i915_gem_context_destroy *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
-		return -ENOENT;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		goto out;
-
-	__destroy_hw_context(ctx, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-
-out:
-	i915_gem_context_put(ctx);
-	return 0;
-}
-
-static int get_sseu(struct i915_gem_context *ctx,
-		    struct drm_i915_gem_context_param *args)
-{
-	struct drm_i915_gem_context_param_sseu user_sseu;
-	struct intel_engine_cs *engine;
-	struct intel_context *ce;
-	int ret;
-
-	if (args->size == 0)
-		goto out;
-	else if (args->size < sizeof(user_sseu))
-		return -EINVAL;
-
-	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
-			   sizeof(user_sseu)))
-		return -EFAULT;
-
-	if (user_sseu.flags || user_sseu.rsvd)
-		return -EINVAL;
-
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
-	if (!engine)
-		return -EINVAL;
-
-	/* Only use for mutex here is to serialize get_param and set_param. */
-	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
-	if (ret)
-		return ret;
-
-	ce = to_intel_context(ctx, engine);
-
-	user_sseu.slice_mask = ce->sseu.slice_mask;
-	user_sseu.subslice_mask = ce->sseu.subslice_mask;
-	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
-	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
-
-	mutex_unlock(&ctx->i915->drm.struct_mutex);
-
-	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
-			 sizeof(user_sseu)))
-		return -EFAULT;
-
-out:
-	args->size = sizeof(user_sseu);
-
-	return 0;
-}
-
-int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
-{
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
-	int ret = 0;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
-	case I915_CONTEXT_PARAM_NO_ZEROMAP:
-		args->size = 0;
-		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-		break;
-	case I915_CONTEXT_PARAM_GTT_SIZE:
-		args->size = 0;
-
-		if (ctx->ppgtt)
-			args->value = ctx->ppgtt->vm.total;
-		else if (to_i915(dev)->mm.aliasing_ppgtt)
-			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
-		else
-			args->value = to_i915(dev)->ggtt.vm.total;
-		break;
-	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
-		args->size = 0;
-		args->value = i915_gem_context_no_error_capture(ctx);
-		break;
-	case I915_CONTEXT_PARAM_BANNABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_bannable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_PRIORITY:
-		args->size = 0;
-		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
-		break;
-	case I915_CONTEXT_PARAM_SSEU:
-		ret = get_sseu(ctx, args);
-		break;
-	case I915_CONTEXT_PARAM_VM:
-		ret = get_ppgtt(ctx, args);
-		break;
-	default:
-		ret = -EINVAL;
-		break;
-	}
-
-	i915_gem_context_put(ctx);
-	return ret;
-}
-
 static int gen8_emit_rpcs_config(struct i915_request *rq,
 				 struct intel_context *ce,
 				 struct intel_sseu sseu)
@@ -1501,18 +1320,11 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 }
 
-int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
+static int ctx_setparam(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
 {
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
 	int ret = 0;
 
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
 	switch (args->param) {
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
@@ -1522,6 +1334,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
 		break;
+
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1530,6 +1343,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_no_error_capture(ctx);
 		break;
+
 	case I915_CONTEXT_PARAM_BANNABLE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1547,7 +1361,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 			if (args->size)
 				ret = -EINVAL;
-			else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
+			else if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
 				ret = -ENODEV;
 			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
 				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
@@ -1575,6 +1389,236 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 
+	return ret;
+}
+
+static int create_setparam(struct i915_user_extension __user *ext, void *data)
+{
+	struct drm_i915_gem_context_create_ext_setparam local;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	if (local.setparam.ctx_id)
+		return -EINVAL;
+
+	return ctx_setparam(data, &local.setparam);
+}
+
+static const i915_user_extension_fn create_extensions[] = {
+	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+};
+
+static bool client_is_banned(struct drm_i915_file_private *file_priv)
+{
+	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
+}
+
+int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_gem_context_create_ext *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
+		return -ENODEV;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (client_is_banned(file_priv)) {
+		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
+			  current->comm,
+			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
+
+		return -EIO;
+	}
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	ctx = i915_gem_create_context(dev_priv, file_priv);
+	mutex_unlock(&dev->struct_mutex);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
+
+	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
+				   create_extensions,
+				   ARRAY_SIZE(create_extensions),
+				   ctx);
+	if (ret) {
+		idr_remove(&file_priv->context_idr, ctx->user_handle);
+		context_close(ctx);
+		return ret;
+	}
+
+	args->ctx_id = ctx->user_handle;
+	DRM_DEBUG("HW context %d created\n", args->ctx_id);
+
+	return 0;
+}
+
+int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file)
+{
+	struct drm_i915_gem_context_destroy *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (args->pad != 0)
+		return -EINVAL;
+
+	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
+		return -ENOENT;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		goto out;
+
+	__destroy_hw_context(ctx, file_priv);
+	mutex_unlock(&dev->struct_mutex);
+
+out:
+	i915_gem_context_put(ctx);
+	return 0;
+}
+
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+	int ret;
+
+	if (args->size == 0)
+		goto out;
+	else if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.flags || user_sseu.rsvd)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.engine_class,
+					  user_sseu.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	/* Only use for mutex here is to serialize get_param and set_param. */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ce = to_intel_context(ctx, engine);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+out:
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
+int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret = 0;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	switch (args->param) {
+	case I915_CONTEXT_PARAM_NO_ZEROMAP:
+		args->size = 0;
+		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
+		break;
+
+	case I915_CONTEXT_PARAM_GTT_SIZE:
+		args->size = 0;
+		if (ctx->ppgtt)
+			args->value = ctx->ppgtt->vm.total;
+		else if (to_i915(dev)->mm.aliasing_ppgtt)
+			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
+		else
+			args->value = to_i915(dev)->ggtt.vm.total;
+		break;
+
+	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->size = 0;
+		args->value = i915_gem_context_no_error_capture(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_BANNABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_bannable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->size = 0;
+		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
+		break;
+
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	i915_gem_context_put(ctx);
+	return ret;
+}
+
+int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = ctx_setparam(ctx, args);
+
 	i915_gem_context_put(ctx);
 	return ret;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 2cd79639d6b5..c9ba7e408117 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -389,6 +389,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
 #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
@@ -1438,85 +1439,14 @@ struct drm_i915_gem_wait {
 };
 
 struct drm_i915_gem_context_create {
-	/*  output: id of new context*/
-	__u32 ctx_id;
-	__u32 pad;
-};
-
-struct drm_i915_gem_context_destroy {
-	__u32 ctx_id;
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 pad;
 };
 
-/*
- * DRM_I915_GEM_VM_CREATE -
- *
- * Create a new virtual memory address space (ppGTT) for use within a context
- * on the same file. Extensions can be provided to configure exactly how the
- * address space is setup upon creation.
- *
- * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
- * returned.
- *
- * DRM_I915_GEM_VM_DESTROY -
- *
- * Destroys a previously created VM id.
- */
-struct drm_i915_gem_vm_control {
-	__u64 extensions;
+struct drm_i915_gem_context_create_ext {
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
-	__u32 id;
-};
-
-struct drm_i915_reg_read {
-	/*
-	 * Register offset.
-	 * For 64bit wide registers where the upper 32bits don't immediately
-	 * follow the lower 32bits, the offset of the lower 32bits must
-	 * be specified
-	 */
-	__u64 offset;
-#define I915_REG_READ_8B_WA (1ul << 0)
-
-	__u64 val; /* Return value */
-};
-/* Known registers:
- *
- * Render engine timestamp - 0x2358 + 64bit - gen7+
- * - Note this register returns an invalid value if using the default
- *   single instruction 8byte read, in order to workaround that pass
- *   flag I915_REG_READ_8B_WA in offset field.
- *
- */
-
-struct drm_i915_reset_stats {
-	__u32 ctx_id;
-	__u32 flags;
-
-	/* All resets since boot/module reload, for all contexts */
-	__u32 reset_count;
-
-	/* Number of batches lost when active in GPU, for this context */
-	__u32 batch_active;
-
-	/* Number of batches lost pending for execution, for this context */
-	__u32 batch_pending;
-
-	__u32 pad;
-};
-
-struct drm_i915_gem_userptr {
-	__u64 user_ptr;
-	__u64 user_size;
-	__u32 flags;
-#define I915_USERPTR_READ_ONLY 0x1
-#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
-	/**
-	 * Returned handle for the object.
-	 *
-	 * Object handles are nonzero.
-	 */
-	__u32 handle;
+	__u64 extensions;
 };
 
 struct drm_i915_gem_context_param {
@@ -1610,6 +1540,89 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct drm_i915_gem_context_create_ext_setparam {
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	struct i915_user_extension base;
+	struct drm_i915_gem_context_param setparam;
+};
+
+struct drm_i915_gem_context_destroy {
+	__u32 ctx_id;
+	__u32 pad;
+};
+
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
+struct drm_i915_reg_read {
+	/*
+	 * Register offset.
+	 * For 64bit wide registers where the upper 32bits don't immediately
+	 * follow the lower 32bits, the offset of the lower 32bits must
+	 * be specified
+	 */
+	__u64 offset;
+#define I915_REG_READ_8B_WA (1ul << 0)
+
+	__u64 val; /* Return value */
+};
+
+/* Known registers:
+ *
+ * Render engine timestamp - 0x2358 + 64bit - gen7+
+ * - Note this register returns an invalid value if using the default
+ *   single instruction 8byte read, in order to workaround that pass
+ *   flag I915_REG_READ_8B_WA in offset field.
+ *
+ */
+
+struct drm_i915_reset_stats {
+	__u32 ctx_id;
+	__u32 flags;
+
+	/* All resets since boot/module reload, for all contexts */
+	__u32 reset_count;
+
+	/* Number of batches lost when active in GPU, for this context */
+	__u32 batch_active;
+
+	/* Number of batches lost pending for execution, for this context */
+	__u32 batch_pending;
+
+	__u32 pad;
+};
+
+struct drm_i915_gem_userptr {
+	__u64 user_ptr;
+	__u64 user_size;
+	__u32 flags;
+#define I915_USERPTR_READ_ONLY 0x1
+#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 34/46] drm/i915: Allow contexts to share a single timeline across all engines
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (32 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 35/46] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
                   ` (18 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 35 +++++++++++++----
 drivers/gpu/drm/i915/i915_gem_context.h       |  3 ++
 drivers/gpu/drm/i915/i915_request.c           | 10 ++++-
 drivers/gpu/drm/i915/i915_request.h           |  5 ++-
 drivers/gpu/drm/i915/i915_sw_fence.c          | 39 ++++++++++++++++---
 drivers/gpu/drm/i915/i915_sw_fence.h          | 13 ++++++-
 drivers/gpu/drm/i915/intel_lrc.c              |  5 ++-
 .../gpu/drm/i915/selftests/i915_gem_context.c | 19 +++++----
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
 include/uapi/drm/i915_drm.h                   |  1 +
 10 files changed, 105 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 609ef59f4d95..2e2de0532c08 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -225,6 +225,9 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 			ce->ops->destroy(ce);
 	}
 
+	if (ctx->timeline)
+		i915_timeline_put(ctx->timeline);
+
 	kfree(ctx->name);
 	put_pid(ctx->pid);
 
@@ -456,12 +459,17 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
 
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
-			struct drm_i915_file_private *file_priv)
+			struct drm_i915_file_private *file_priv,
+			unsigned int flags)
 {
 	struct i915_gem_context *ctx;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
+	if (flags & I915_GEM_CONTEXT_SINGLE_TIMELINE &&
+	    !HAS_EXECLISTS(dev_priv))
+		return ERR_PTR(-EINVAL);
+
 	/* Reap the most stale context */
 	contexts_free_first(dev_priv);
 
@@ -484,6 +492,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 		i915_ppgtt_put(ppgtt);
 	}
 
+	if (flags & I915_GEM_CONTEXT_SINGLE_TIMELINE) {
+		struct i915_timeline *timeline;
+
+		timeline = i915_timeline_create(dev_priv, ctx->name, NULL);
+		if (IS_ERR(timeline)) {
+			__destroy_hw_context(ctx, file_priv);
+			return ERR_CAST(timeline);
+		}
+
+		ctx->timeline = timeline;
+	}
+
 	trace_i915_context_create(ctx);
 
 	return ctx;
@@ -512,7 +532,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	if (ret)
 		return ERR_PTR(ret);
 
-	ctx = i915_gem_create_context(to_i915(dev), NULL);
+	ctx = i915_gem_create_context(to_i915(dev), NULL, 0);
 	if (IS_ERR(ctx))
 		goto out;
 
@@ -548,7 +568,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 	struct i915_gem_context *ctx;
 	int err;
 
-	ctx = i915_gem_create_context(i915, NULL);
+	ctx = i915_gem_create_context(i915, NULL, 0);
 	if (IS_ERR(ctx))
 		return ctx;
 
@@ -680,7 +700,7 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	ctx = i915_gem_create_context(i915, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv, 0);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
@@ -796,7 +816,7 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	rq = i915_active_request_raw(&timeline->last_request,
 				     &engine->i915->drm.struct_mutex);
-	if (rq && rq->engine == engine) {
+	if (rq && rq->engine->mask & engine->mask) {
 		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
 			  timeline->name, engine->name,
 			  rq->fence.context, rq->fence.seqno);
@@ -1426,7 +1446,8 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
 		return -ENODEV;
 
-	if (args->flags)
+	if (args->flags &
+	    ~(I915_GEM_CONTEXT_SINGLE_TIMELINE))
 		return -EINVAL;
 
 	if (client_is_banned(file_priv)) {
@@ -1441,7 +1462,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(dev_priv, file_priv);
+	ctx = i915_gem_create_context(dev_priv, file_priv, args->flags);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index c5a6cb10dbda..3bd1faabbc3f 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -42,6 +42,7 @@ struct drm_i915_private;
 struct drm_i915_file_private;
 struct i915_hw_ppgtt;
 struct i915_request;
+struct i915_timeline;
 struct i915_vma;
 struct intel_ring;
 
@@ -77,6 +78,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct i915_timeline *timeline;
+
 	/**
 	 * @ppgtt: unique address space (GTT)
 	 *
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 3aaf9d32768d..dc79d57ffb84 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -1036,8 +1036,14 @@ void i915_request_add(struct i915_request *request)
 	prev = i915_active_request_raw(&timeline->last_request,
 				       &request->i915->drm.struct_mutex);
 	if (prev && !i915_request_completed(prev)) {
-		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
-					     &request->submitq);
+		if (is_power_of_2(prev->engine->mask | engine->mask))
+			i915_sw_fence_await_sw_fence(&request->submit,
+						     &prev->submit,
+						     &request->submitq);
+		else
+			__i915_sw_fence_await_dma_fence(&request->submit,
+							&prev->fence,
+							&request->dmaq);
 		if (engine->schedule)
 			__i915_sched_node_add_dependency(&request->sched,
 							 &prev->sched,
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 5a32167ee892..35153fa52c8c 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -127,7 +127,10 @@ struct i915_request {
 	 * It is used by the driver to then queue the request for execution.
 	 */
 	struct i915_sw_fence submit;
-	wait_queue_entry_t submitq;
+	union {
+		wait_queue_entry_t submitq;
+		struct i915_sw_dma_fence_cb dmaq;
+	};
 	struct list_head execute_cb;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 8d1400d378d7..5387aafd3424 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -359,11 +359,6 @@ int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 	return __i915_sw_fence_await_sw_fence(fence, signaler, NULL, gfp);
 }
 
-struct i915_sw_dma_fence_cb {
-	struct dma_fence_cb base;
-	struct i915_sw_fence *fence;
-};
-
 struct i915_sw_dma_fence_cb_timer {
 	struct i915_sw_dma_fence_cb base;
 	struct dma_fence *dma;
@@ -480,6 +475,40 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	return ret;
 }
 
+static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
+				     struct dma_fence_cb *data)
+{
+	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
+
+	i915_sw_fence_complete(cb->fence);
+}
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb)
+{
+	int ret;
+
+	debug_fence_assert(fence);
+
+	if (dma_fence_is_signaled(dma))
+		return 0;
+
+	cb->fence = fence;
+	i915_sw_fence_await(fence);
+
+	ret = dma_fence_add_callback(dma, &cb->base, __dma_i915_sw_fence_wake);
+	if (ret == 0) {
+		ret = 1;
+	} else {
+		i915_sw_fence_complete(fence);
+		if (ret == -ENOENT) /* fence already signaled */
+			ret = 0;
+	}
+
+	return ret;
+}
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 6dec9e1d1102..9cb5c3b307a6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -9,14 +9,13 @@
 #ifndef _I915_SW_FENCE_H_
 #define _I915_SW_FENCE_H_
 
+#include <linux/dma-fence.h>
 #include <linux/gfp.h>
 #include <linux/kref.h>
 #include <linux/notifier.h> /* for NOTIFY_DONE */
 #include <linux/wait.h>
 
 struct completion;
-struct dma_fence;
-struct dma_fence_ops;
 struct reservation_object;
 
 struct i915_sw_fence {
@@ -68,10 +67,20 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 				     struct i915_sw_fence *after,
 				     gfp_t gfp);
+
+struct i915_sw_dma_fence_cb {
+	struct dma_fence_cb base;
+	struct i915_sw_fence *fence;
+};
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb);
 int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  struct dma_fence *dma,
 				  unsigned long timeout,
 				  gfp_t gfp);
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b3555b1b0e07..d378ceae813f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2843,7 +2843,10 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_deref_obj;
 	}
 
-	timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
+	if (ctx->timeline)
+		timeline = i915_timeline_get(ctx->timeline);
+	else
+		timeline = i915_timeline_create(ctx->i915, ctx->name, NULL);
 	if (IS_ERR(timeline)) {
 		ret = PTR_ERR(timeline);
 		goto error_deref_obj;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index a76a4f6f67e4..ccc40059c31d 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
 	}
 
 	for (n = 0; n < nctx; n++) {
-		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
+		ctx[n] = i915_gem_create_context(i915, file->driver_priv, 0);
 		if (IS_ERR(ctx[n])) {
 			err = PTR_ERR(ctx[n]);
 			goto out_unlock;
@@ -526,7 +526,8 @@ static int igt_ctx_exec(void *arg)
 			struct i915_gem_context *ctx;
 			intel_wakeref_t wakeref;
 
-			ctx = i915_gem_create_context(i915, file->driver_priv);
+			ctx = i915_gem_create_context(i915,
+						      file->driver_priv, 0);
 			if (IS_ERR(ctx)) {
 				err = PTR_ERR(ctx);
 				goto out_unlock;
@@ -623,7 +624,8 @@ static int igt_shared_ctx_exec(void *arg)
 		if (err)
 			goto out_unlock;
 
-		parent = i915_gem_create_context(i915, file->driver_priv);
+		parent = i915_gem_create_context(i915,
+						 file->driver_priv, 0);
 		if (IS_ERR(parent)) {
 			err = PTR_ERR(parent);
 			if (err == -ENODEV) /* no logical ctx support */
@@ -645,7 +647,8 @@ static int igt_shared_ctx_exec(void *arg)
 			if (ctx)
 				__destroy_hw_context(ctx, file->driver_priv);
 
-			ctx = i915_gem_create_context(i915, file->driver_priv);
+			ctx = i915_gem_create_context(i915,
+						      file->driver_priv, 0);
 			if (IS_ERR(ctx)) {
 				err = PTR_ERR(ctx);
 				goto out_unlock;
@@ -1092,7 +1095,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1201,7 +1204,7 @@ static int igt_ctx_readonly(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1527,13 +1530,13 @@ static int igt_vm_isolation(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx_a = i915_gem_create_context(i915, file->driver_priv);
+	ctx_a = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx_a)) {
 		err = PTR_ERR(ctx_a);
 		goto out_unlock;
 	}
 
-	ctx_b = i915_gem_create_context(i915, file->driver_priv);
+	ctx_b = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx_b)) {
 		err = PTR_ERR(ctx_b);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 2189b606ca41..8137ff6f01b2 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -94,7 +94,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
 {
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	return i915_gem_create_context(i915, file->driver_priv);
+	return i915_gem_create_context(i915, file->driver_priv, 0);
 }
 
 struct i915_gem_context *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index c9ba7e408117..0c1f97fa2101 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1446,6 +1446,7 @@ struct drm_i915_gem_context_create {
 struct drm_i915_gem_context_create_ext {
 	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
+#define I915_GEM_CONTEXT_SINGLE_TIMELINE	0x1
 	__u64 extensions;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 35/46] drm/i915: Fix I915_EXEC_RING_MASK
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (33 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 34/46] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 36/46] drm/i915: Remove last traces of exec-id (GEM_BUSY) Chris Wilson
                   ` (17 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx; +Cc: tvrtko.ursulin, Chris Wilson, stable

This was supposed to be a mask of all known rings, but it is being used
by execbuffer to filter out invalid rings, and so is instead mapping high
unused values onto valid rings. Instead of a mask of all known rings,
we need it to be the mask of all possible rings.

Fixes: 549f7365820a ("drm/i915: Enable SandyBridge blitter ring")
Fixes: de1add360522 ("drm/i915: Decouple execbuf uAPI from internal implementation")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: <stable@vger.kernel.org> # v4.6+
---
 include/uapi/drm/i915_drm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0c1f97fa2101..79f7299783a8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -999,7 +999,7 @@ struct drm_i915_gem_execbuffer2 {
 	 * struct drm_i915_gem_exec_fence *fences.
 	 */
 	__u64 cliprects_ptr;
-#define I915_EXEC_RING_MASK              (7<<0)
+#define I915_EXEC_RING_MASK              (0x3f)
 #define I915_EXEC_DEFAULT                (0<<0)
 #define I915_EXEC_RENDER                 (1<<0)
 #define I915_EXEC_BSD                    (2<<0)
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 36/46] drm/i915: Remove last traces of exec-id (GEM_BUSY)
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (34 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 35/46] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 37/46] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
                   ` (16 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

As we allow per-context engine allows the legacy concept of
I915_EXEC_RING no longer applies universally. We are still exposing the
unrelated exec-id in GEM_BUSY, so transition this ioctl (once more
slightly changing its ABI, but no one cares) over to only reporting the
uabi-class (not instance as we can not foreseeably fit those into the
small bitmask).

The only user of the extended ring information from GEM_BUSY is ddx/sna,
which tries to use the non-rcs business information to guide which
engine to use for subsequent operations on foreign bo. All that matters
for it is the decision between rcs and !rcs, so it is unaffected by the
change in higher bits.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 32 +++++++++++++------------
 drivers/gpu/drm/i915/intel_engine_cs.c  | 10 --------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 include/uapi/drm/i915_drm.h             | 32 +++++++++++++------------
 4 files changed, 34 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 81e950bac246..d50588d54d0b 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3886,20 +3886,17 @@ i915_gem_object_ggtt_pin(struct drm_i915_gem_object *obj,
 
 static __always_inline unsigned int __busy_read_flag(unsigned int id)
 {
-	/* Note that we could alias engines in the execbuf API, but
-	 * that would be very unwise as it prevents userspace from
-	 * fine control over engine selection. Ahem.
-	 *
-	 * This should be something like EXEC_MAX_ENGINE instead of
-	 * I915_NUM_ENGINES.
-	 */
-	BUILD_BUG_ON(I915_NUM_ENGINES > 16);
+	if (id == I915_ENGINE_CLASS_INVALID)
+		return 0xffff0000;
+
+	GEM_BUG_ON(id >= 16);
 	return 0x10000 << id;
 }
 
 static __always_inline unsigned int __busy_write_id(unsigned int id)
 {
-	/* The uABI guarantees an active writer is also amongst the read
+	/*
+	 * The uABI guarantees an active writer is also amongst the read
 	 * engines. This would be true if we accessed the activity tracking
 	 * under the lock, but as we perform the lookup of the object and
 	 * its activity locklessly we can not guarantee that the last_write
@@ -3907,16 +3904,20 @@ static __always_inline unsigned int __busy_write_id(unsigned int id)
 	 * last_read - hence we always set both read and write busy for
 	 * last_write.
 	 */
-	return id | __busy_read_flag(id);
+	if (id == I915_ENGINE_CLASS_INVALID)
+		return 0xffffffff;
+
+	return (id + 1) | __busy_read_flag(id);
 }
 
 static __always_inline unsigned int
 __busy_set_if_active(const struct dma_fence *fence,
 		     unsigned int (*flag)(unsigned int id))
 {
-	struct i915_request *rq;
+	const struct i915_request *rq;
 
-	/* We have to check the current hw status of the fence as the uABI
+	/*
+	 * We have to check the current hw status of the fence as the uABI
 	 * guarantees forward progress. We could rely on the idle worker
 	 * to eventually flush us, but to minimise latency just ask the
 	 * hardware.
@@ -3927,11 +3928,11 @@ __busy_set_if_active(const struct dma_fence *fence,
 		return 0;
 
 	/* opencode to_request() in order to avoid const warnings */
-	rq = container_of(fence, struct i915_request, fence);
+	rq = container_of(fence, const struct i915_request, fence);
 	if (i915_request_completed(rq))
 		return 0;
 
-	return flag(rq->engine->uabi_id);
+	return flag(rq->engine->uabi_class);
 }
 
 static __always_inline unsigned int
@@ -3965,7 +3966,8 @@ i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 	if (!obj)
 		goto out;
 
-	/* A discrepancy here is that we do not report the status of
+	/*
+	 * A discrepancy here is that we do not report the status of
 	 * non-i915 fences, i.e. even though we may report the object as idle,
 	 * a call to set-domain may still stall waiting for foreign rendering.
 	 * This also means that wait-ioctl may report an object as busy,
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 94de0ba1d92b..0372aaa9756c 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -84,7 +84,6 @@ static const struct engine_class_info intel_engine_classes[] = {
 #define MAX_MMIO_BASES 3
 struct engine_info {
 	unsigned int hw_id;
-	unsigned int uabi_id;
 	u8 class;
 	u8 instance;
 	/* mmio bases table *must* be sorted in reverse gen order */
@@ -97,7 +96,6 @@ struct engine_info {
 static const struct engine_info intel_engines[] = {
 	[RCS] = {
 		.hw_id = RCS_HW,
-		.uabi_id = I915_EXEC_RENDER,
 		.class = RENDER_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -106,7 +104,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[BCS] = {
 		.hw_id = BCS_HW,
-		.uabi_id = I915_EXEC_BLT,
 		.class = COPY_ENGINE_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -115,7 +112,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS] = {
 		.hw_id = VCS_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -126,7 +122,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS2] = {
 		.hw_id = VCS2_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 1,
 		.mmio_bases = {
@@ -136,7 +131,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS3] = {
 		.hw_id = VCS3_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 2,
 		.mmio_bases = {
@@ -145,7 +139,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VCS4] = {
 		.hw_id = VCS4_HW,
-		.uabi_id = I915_EXEC_BSD,
 		.class = VIDEO_DECODE_CLASS,
 		.instance = 3,
 		.mmio_bases = {
@@ -154,7 +147,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VECS] = {
 		.hw_id = VECS_HW,
-		.uabi_id = I915_EXEC_VEBOX,
 		.class = VIDEO_ENHANCEMENT_CLASS,
 		.instance = 0,
 		.mmio_bases = {
@@ -164,7 +156,6 @@ static const struct engine_info intel_engines[] = {
 	},
 	[VECS2] = {
 		.hw_id = VECS2_HW,
-		.uabi_id = I915_EXEC_VEBOX,
 		.class = VIDEO_ENHANCEMENT_CLASS,
 		.instance = 1,
 		.mmio_bases = {
@@ -324,7 +315,6 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
 	engine->class = info->class;
 	engine->instance = info->instance;
 
-	engine->uabi_id = info->uabi_id;
 	engine->uabi_class = intel_engine_classes[info->class].uabi_class;
 
 	engine->context_size = __intel_engine_context_size(dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index c0027e058b1f..eb956479f48c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -336,7 +336,6 @@ struct intel_engine_cs {
 	unsigned int guc_id;
 	unsigned long mask;
 
-	u8 uabi_id;
 	u8 uabi_class;
 
 	u8 class;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 79f7299783a8..0c5566b2d244 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1147,32 +1147,34 @@ struct drm_i915_gem_busy {
 	 * as busy may become idle before the ioctl is completed.
 	 *
 	 * Furthermore, if the object is busy, which engine is busy is only
-	 * provided as a guide. There are race conditions which prevent the
-	 * report of which engines are busy from being always accurate.
-	 * However, the converse is not true. If the object is idle, the
-	 * result of the ioctl, that all engines are idle, is accurate.
+	 * provided as a guide and only indirectly by reporting its class
+	 * (there may be more than one engine in each class). There are race
+	 * conditions which prevent the report of which engines are busy from
+	 * being always accurate.  However, the converse is not true. If the
+	 * object is idle, the result of the ioctl, that all engines are idle,
+	 * is accurate.
 	 *
 	 * The returned dword is split into two fields to indicate both
-	 * the engines on which the object is being read, and the
-	 * engine on which it is currently being written (if any).
+	 * the engine classess on which the object is being read, and the
+	 * engine class on which it is currently being written (if any).
 	 *
 	 * The low word (bits 0:15) indicate if the object is being written
 	 * to by any engine (there can only be one, as the GEM implicit
 	 * synchronisation rules force writes to be serialised). Only the
-	 * engine for the last write is reported.
+	 * engine class (offset by 1, I915_ENGINE_CLASS_RENDER is reported as
+	 * 1 not 0 etc) for the last write is reported.
 	 *
-	 * The high word (bits 16:31) are a bitmask of which engines are
-	 * currently reading from the object. Multiple engines may be
+	 * The high word (bits 16:31) are a bitmask of which engines classes
+	 * are currently reading from the object. Multiple engines may be
 	 * reading from the object simultaneously.
 	 *
-	 * The value of each engine is the same as specified in the
-	 * EXECBUFFER2 ioctl, i.e. I915_EXEC_RENDER, I915_EXEC_BSD etc.
-	 * Note I915_EXEC_DEFAULT is a symbolic value and is mapped to
-	 * the I915_EXEC_RENDER engine for execution, and so it is never
+	 * The value of each engine class is the same as specified in the
+	 * I915_CONTEXT_SET_ENGINES parameter and via perf, i.e.
+	 * I915_ENGINE_CLASS_RENDER, I915_ENGINE_CLASS_COPY, etc.
 	 * reported as active itself. Some hardware may have parallel
 	 * execution engines, e.g. multiple media engines, which are
-	 * mapped to the same identifier in the EXECBUFFER2 ioctl and
-	 * so are not separately reported for busyness.
+	 * mapped to the same class identifier and so are not separately
+	 * reported for busyness.
 	 *
 	 * Caveat emptor:
 	 * Only the boolean result of this query is reliable; that is whether
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 37/46] drm/i915: Re-arrange execbuf so context is known before engine
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (35 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 36/46] drm/i915: Remove last traces of exec-id (GEM_BUSY) Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 38/46] drm/i915: Allow a context to define its set of engines Chris Wilson
                   ` (15 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Needed for a following patch.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 11 +++++++----
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 84ef3abc567e..859625474f58 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2308,10 +2308,6 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (args->flags & I915_EXEC_IS_PINNED)
 		eb.batch_flags |= I915_DISPATCH_PINNED;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
-	if (!eb.engine)
-		return -EINVAL;
-
 	if (args->flags & I915_EXEC_FENCE_IN) {
 		in_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
 		if (!in_fence)
@@ -2336,6 +2332,12 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
+	eb.engine = eb_select_engine(eb.i915, file, args);
+	if (!eb.engine) {
+		err = -EINVAL;
+		goto err_engine;
+	}
+
 	/*
 	 * Take a local wakeref for preparing to dispatch the execbuf as
 	 * we expect to access the hardware fairly frequently in the
@@ -2501,6 +2503,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	mutex_unlock(&dev->struct_mutex);
 err_rpm:
 	intel_runtime_pm_put(eb.i915, wakeref);
+err_engine:
 	i915_gem_context_put(eb.ctx);
 err_destroy:
 	eb_destroy(&eb);
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 38/46] drm/i915: Allow a context to define its set of engines
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (36 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 37/46] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-25 10:41   ` Tvrtko Ursulin
  2019-02-06 13:03 ` [PATCH 39/46] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
                   ` (14 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
execbuf. It's use will be revealed in the next patch.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c    | 184 ++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h    |   4 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  22 ++-
 include/uapi/drm/i915_drm.h                |  32 ++++
 4 files changed, 230 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2e2de0532c08..ad8052235f37 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -95,6 +95,21 @@
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
 
+static struct intel_engine_cs *
+lookup_user_engine(struct i915_gem_context *ctx,
+		   unsigned long flags, u16 class, u16 instance)
+#define LOOKUP_USER_INDEX BIT(0)
+{
+	if (flags & LOOKUP_USER_INDEX) {
+		if (instance >= ctx->nengine)
+			return NULL;
+
+		return ctx->engines[instance];
+	}
+
+	return intel_engine_lookup_user(ctx->i915, class, instance);
+}
+
 static void lut_close(struct i915_gem_context *ctx)
 {
 	struct i915_lut_handle *lut, *ln;
@@ -218,6 +233,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
+	kfree(ctx->engines);
+
 	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
 		struct intel_context *ce = &ctx->__engine[n];
 
@@ -1317,9 +1334,9 @@ static int set_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1337,9 +1354,156 @@ static int set_sseu(struct i915_gem_context *ctx,
 
 	args->size = sizeof(user_sseu);
 
+	return 0;
+};
+
+struct set_engines {
+	struct i915_gem_context *ctx;
+	struct intel_engine_cs **engines;
+	unsigned int nengine;
+};
+
+static const i915_user_extension_fn set_engines__extensions[] = {
+};
+
+static int
+set_engines(struct i915_gem_context *ctx,
+	    const struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines __user *user;
+	struct set_engines set = { .ctx = ctx };
+	u64 size, extensions;
+	unsigned int n;
+	int err;
+
+	user = u64_to_user_ptr(args->value);
+	size = args->size;
+	if (!size)
+		goto out;
+
+	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
+	if (size < sizeof(*user) || size % sizeof(*user->class_instance))
+		return -EINVAL;
+
+	set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
+	if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK)
+		return -EINVAL;
+
+	set.engines = kmalloc_array(set.nengine,
+				    sizeof(*set.engines),
+				    GFP_KERNEL);
+	if (!set.engines)
+		return -ENOMEM;
+
+	for (n = 0; n < set.nengine; n++) {
+		u16 class, inst;
+
+		if (get_user(class, &user->class_instance[n].engine_class) ||
+		    get_user(inst, &user->class_instance[n].engine_instance)) {
+			kfree(set.engines);
+			return -EFAULT;
+		}
+
+		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
+		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
+			set.engines[n] = NULL;
+			continue;
+		}
+
+		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
+		if (!set.engines[n]) {
+			kfree(set.engines);
+			return -ENOENT;
+		}
+	}
+
+	err = -EFAULT;
+	if (!get_user(extensions, &user->extensions))
+		err = i915_user_extensions(u64_to_user_ptr(extensions),
+					   set_engines__extensions,
+					   ARRAY_SIZE(set_engines__extensions),
+					   &set);
+	if (err) {
+		kfree(set.engines);
+		return err;
+	}
+
+out:
+	mutex_lock(&ctx->i915->drm.struct_mutex);
+	kfree(ctx->engines);
+	ctx->engines = set.engines;
+	ctx->nengine = set.nengine;
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
 	return 0;
 }
 
+static int
+get_engines(struct i915_gem_context *ctx,
+	    struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines *local;
+	unsigned int n, count, size;
+	int err;
+
+restart:
+	count = READ_ONCE(ctx->nengine);
+	if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
+		return -ENOMEM; /* unrepresentable! */
+
+	size = sizeof(*local) + count * sizeof(*local->class_instance);
+	if (!args->size) {
+		args->size = size;
+		return 0;
+	}
+	if (args->size < size)
+		return -EINVAL;
+
+	local = kmalloc(size, GFP_KERNEL);
+	if (!local)
+		return -ENOMEM;
+
+	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
+		err = -EINTR;
+		goto err;
+	}
+
+	if (READ_ONCE(ctx->nengine) != count) {
+		mutex_unlock(&ctx->i915->drm.struct_mutex);
+		kfree(local);
+		goto restart;
+	}
+
+	local->extensions = 0;
+	for (n = 0; n < count; n++) {
+		if (ctx->engines[n]) {
+			local->class_instance[n].engine_class =
+				ctx->engines[n]->uabi_class;
+			local->class_instance[n].engine_instance =
+				ctx->engines[n]->instance;
+		} else {
+			local->class_instance[n].engine_class =
+				I915_ENGINE_CLASS_INVALID;
+			local->class_instance[n].engine_instance =
+				I915_ENGINE_CLASS_INVALID_NONE;
+		}
+	}
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
+		err = -EFAULT;
+		goto err;
+	}
+
+	args->size = size;
+	return 0;
+
+err:
+	kfree(local);
+	return err;
+}
+
 static int ctx_setparam(struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
 {
@@ -1403,6 +1567,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
 		ret = set_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = set_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@@ -1535,9 +1703,9 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1616,6 +1784,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = get_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 3bd1faabbc3f..775de1af1b10 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -78,6 +78,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct intel_engine_cs **engines;
+
 	struct i915_timeline *timeline;
 
 	/**
@@ -146,6 +148,8 @@ struct i915_gem_context {
 #define CONTEXT_CLOSED			1
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
 
+	unsigned int nengine;
+
 	/**
 	 * @hw_id: - unique identifier for the context
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 859625474f58..5052b49f8dcd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2086,13 +2086,23 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
 };
 
 static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
+eb_select_engine(struct i915_execbuffer *eb,
 		 struct drm_file *file,
 		 struct drm_i915_gem_execbuffer2 *args)
 {
 	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
+	if (eb->ctx->engines) {
+		if (user_ring_id >= eb->ctx->nengine) {
+			DRM_DEBUG("execbuf with unknown ring: %u\n",
+				  user_ring_id);
+			return NULL;
+		}
+
+		return eb->ctx->engines[user_ring_id];
+	}
+
 	if (user_ring_id > I915_USER_RINGS) {
 		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
 		return NULL;
@@ -2105,11 +2115,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 		return NULL;
 	}
 
-	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(dev_priv)) {
+	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(eb->i915)) {
 		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
 
 		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
-			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
+			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
 		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
 			   bsd_idx <= I915_EXEC_BSD_RING2) {
 			bsd_idx >>= I915_EXEC_BSD_SHIFT;
@@ -2120,9 +2130,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 			return NULL;
 		}
 
-		engine = dev_priv->engine[_VCS(bsd_idx)];
+		engine = eb->i915->engine[_VCS(bsd_idx)];
 	} else {
-		engine = dev_priv->engine[user_ring_map[user_ring_id]];
+		engine = eb->i915->engine[user_ring_map[user_ring_id]];
 	}
 
 	if (!engine) {
@@ -2332,7 +2342,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
+	eb.engine = eb_select_engine(&eb, file, args);
 	if (!eb.engine) {
 		err = -EINVAL;
 		goto err_engine;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0c5566b2d244..eb5799fe3868 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -122,6 +122,8 @@ enum drm_i915_gem_engine_class {
 	I915_ENGINE_CLASS_INVALID	= -1
 };
 
+#define I915_ENGINE_CLASS_INVALID_NONE -1
+
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
  *
@@ -1481,6 +1483,27 @@ struct drm_i915_gem_context_param {
 	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
 	 */
 #define I915_CONTEXT_PARAM_VM		0x8
+
+/*
+ * I915_CONTEXT_PARAM_ENGINES:
+ *
+ * Bind this context to operate on this subset of available engines. Henceforth,
+ * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
+ * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
+ * and upwards. Slots 0...N are filled in using the specified (class, instance).
+ * Use
+ *	engine_class: I915_ENGINE_CLASS_INVALID,
+ *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
+ * to specify a gap in the array that can be filled in later, e.g. by a
+ * virtual engine used for load balancing.
+ *
+ * Setting the number of engines bound to the context to 0, by passing a zero
+ * sized argument, will revert back to default settings.
+ *
+ * See struct i915_context_param_engines.
+ */
+#define I915_CONTEXT_PARAM_ENGINES	0x9
+
 	__u64 value;
 };
 
@@ -1543,6 +1566,15 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct i915_context_param_engines {
+	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+
+	struct {
+		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
+		__u16 engine_instance;
+	} class_instance[0];
+};
+
 struct drm_i915_gem_context_create_ext_setparam {
 #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
 	struct i915_user_extension base;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 39/46] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (37 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 38/46] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 40/46] drm/i915: Pass around the intel_context Chris Wilson
                   ` (13 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Allow the user to specify a local engine index (as opposed to
class:index) that they can use to refer to a preset engine inside the
ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
This will be useful for setting SSEU parameters on virtual engines that
are local to the context and do not have a valid global class:instance
lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
 include/uapi/drm/i915_drm.h             |  3 ++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index ad8052235f37..20580463175e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1319,6 +1319,7 @@ static int set_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_sseu sseu;
+	unsigned long lookup;
 	int ret;
 
 	if (args->size < sizeof(user_sseu))
@@ -1331,10 +1332,17 @@ static int set_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
@@ -1689,6 +1697,7 @@ static int get_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_context *ce;
+	unsigned long lookup;
 	int ret;
 
 	if (args->size == 0)
@@ -1700,10 +1709,17 @@ static int get_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index eb5799fe3868..642af28ea6d3 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1536,9 +1536,10 @@ struct drm_i915_gem_context_param_sseu {
 	__u16 engine_instance;
 
 	/*
-	 * Unused for now. Must be cleared to zero.
+	 * Unknown flags must be cleared to zero.
 	 */
 	__u32 flags;
+#define I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX (1u << 0)
 
 	/*
 	 * Mask of slices to enable for the context. Valid values are a subset
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 40/46] drm/i915: Pass around the intel_context
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (38 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 39/46] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 41/46] drm/i915: Split struct intel_context definition to its own header Chris Wilson
                   ` (12 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Instead of passing the gem_context and engine to find the instance of
the intel_context to use, pass around the intel_context instead. This is
useful for the next few patches, where the intel_context is no longer a
direct lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 +-
 drivers/gpu/drm/i915/i915_perf.c | 32 ++++++++++++++--------------
 drivers/gpu/drm/i915/intel_lrc.c | 36 +++++++++++++++++---------------
 3 files changed, 36 insertions(+), 34 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 523de3644570..6be5dba889b5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3119,7 +3119,7 @@ int i915_perf_add_config_ioctl(struct drm_device *dev, void *data,
 int i915_perf_remove_config_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
-			    struct i915_gem_context *ctx,
+			    struct intel_context *ce,
 			    u32 *reg_state);
 
 /* i915_gem_evict.c */
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9ebf99f3d8d3..f969a0512465 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1629,13 +1629,14 @@ static void hsw_disable_metric_set(struct drm_i915_private *dev_priv)
  * It's fine to put out-of-date values into these per-context registers
  * in the case that the OA unit has been disabled.
  */
-static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
-					   u32 *reg_state,
-					   const struct i915_oa_config *oa_config)
+static void
+gen8_update_reg_state_unlocked(struct intel_context *ce,
+			       u32 *reg_state,
+			       const struct i915_oa_config *oa_config)
 {
-	struct drm_i915_private *dev_priv = ctx->i915;
-	u32 ctx_oactxctrl = dev_priv->perf.oa.ctx_oactxctrl_offset;
-	u32 ctx_flexeu0 = dev_priv->perf.oa.ctx_flexeu0_offset;
+	struct drm_i915_private *i915 = ce->gem_context->i915;
+	u32 ctx_oactxctrl = i915->perf.oa.ctx_oactxctrl_offset;
+	u32 ctx_flexeu0 = i915->perf.oa.ctx_flexeu0_offset;
 	/* The MMIO offsets for Flex EU registers aren't contiguous */
 	i915_reg_t flex_regs[] = {
 		EU_PERF_CNTL0,
@@ -1649,8 +1650,8 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 	int i;
 
 	CTX_REG(reg_state, ctx_oactxctrl, GEN8_OACTXCONTROL,
-		(dev_priv->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
-		(dev_priv->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
+		(i915->perf.oa.period_exponent << GEN8_OA_TIMER_PERIOD_SHIFT) |
+		(i915->perf.oa.periodic ? GEN8_OA_TIMER_ENABLE : 0) |
 		GEN8_OA_COUNTER_RESUME);
 
 	for (i = 0; i < ARRAY_SIZE(flex_regs); i++) {
@@ -1678,10 +1679,9 @@ static void gen8_update_reg_state_unlocked(struct i915_gem_context *ctx,
 		CTX_REG(reg_state, state_offset, flex_regs[i], value);
 	}
 
-	CTX_REG(reg_state, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
-		gen8_make_rpcs(dev_priv,
-			       &to_intel_context(ctx,
-						 dev_priv->engine[RCS])->sseu));
+	CTX_REG(reg_state,
+		CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE,
+		gen8_make_rpcs(i915, &ce->sseu));
 }
 
 /*
@@ -1754,7 +1754,7 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 		ce->state->obj->mm.dirty = true;
 		regs += LRC_STATE_PN * PAGE_SIZE / sizeof(*regs);
 
-		gen8_update_reg_state_unlocked(ctx, regs, oa_config);
+		gen8_update_reg_state_unlocked(ce, regs, oa_config);
 
 		i915_gem_object_unpin_map(ce->state->obj);
 	}
@@ -2138,8 +2138,8 @@ static int i915_oa_stream_init(struct i915_perf_stream *stream,
 }
 
 void i915_oa_init_reg_state(struct intel_engine_cs *engine,
-			    struct i915_gem_context *ctx,
-			    u32 *reg_state)
+			    struct intel_context *ce,
+			    u32 *regs)
 {
 	struct i915_perf_stream *stream;
 
@@ -2148,7 +2148,7 @@ void i915_oa_init_reg_state(struct intel_engine_cs *engine,
 
 	stream = engine->i915->perf.oa.exclusive_stream;
 	if (stream)
-		gen8_update_reg_state_unlocked(ctx, reg_state, stream->oa_config);
+		gen8_update_reg_state_unlocked(ce, regs, stream->oa_config);
 }
 
 /**
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d378ceae813f..caec509543b5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -170,7 +170,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
 static void execlists_init_reg_state(u32 *reg_state,
-				     struct i915_gem_context *ctx,
+				     struct intel_context *ce,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring);
 
@@ -1314,9 +1314,10 @@ __execlists_update_reg_state(struct intel_engine_cs *engine,
 	regs[CTX_RING_TAIL + 1] = ring->tail;
 
 	/* RPCS */
-	if (engine->class == RENDER_CLASS)
-		regs[CTX_R_PWR_CLK_STATE + 1] = gen8_make_rpcs(engine->i915,
-							       &ce->sseu);
+	if (engine->class == RENDER_CLASS) {
+		regs[CTX_R_PWR_CLK_STATE + 1] =
+			gen8_make_rpcs(engine->i915, &ce->sseu);
+	}
 }
 
 static struct intel_context *
@@ -2011,7 +2012,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
 	rq->ring->head = intel_ring_wrap(rq->ring, rq->postfix);
 	intel_ring_update_space(rq->ring);
 
-	execlists_init_reg_state(regs, rq->gem_context, engine, rq->ring);
+	execlists_init_reg_state(regs, rq->hw_context, engine, rq->ring);
 	__execlists_update_reg_state(engine, rq->hw_context);
 
 out_unlock:
@@ -2649,7 +2650,7 @@ static u32 intel_lr_indirect_ctx_offset(struct intel_engine_cs *engine)
 }
 
 static void execlists_init_reg_state(u32 *regs,
-				     struct i915_gem_context *ctx,
+				     struct intel_context *ce,
 				     struct intel_engine_cs *engine,
 				     struct intel_ring *ring)
 {
@@ -2725,24 +2726,24 @@ static void execlists_init_reg_state(u32 *regs,
 	CTX_REG(regs, CTX_PDP0_UDW, GEN8_RING_PDP_UDW(engine, 0), 0);
 	CTX_REG(regs, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(engine, 0), 0);
 
-	if (i915_vm_is_48bit(&ctx->ppgtt->vm)) {
+	if (i915_vm_is_48bit(&ce->gem_context->ppgtt->vm)) {
 		/* 64b PPGTT (48bit canonical)
 		 * PDP0_DESCRIPTOR contains the base address to PML4 and
 		 * other PDP Descriptors are ignored.
 		 */
-		ASSIGN_CTX_PML4(ctx->ppgtt, regs);
+		ASSIGN_CTX_PML4(ce->gem_context->ppgtt, regs);
 	} else {
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 3);
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 2);
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 1);
-		ASSIGN_CTX_PDP(ctx->ppgtt, regs, 0);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 3);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 2);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 1);
+		ASSIGN_CTX_PDP(ce->gem_context->ppgtt, regs, 0);
 	}
 
 	if (rcs) {
 		regs[CTX_LRI_HEADER_2] = MI_LOAD_REGISTER_IMM(1);
 		CTX_REG(regs, CTX_R_PWR_CLK_STATE, GEN8_R_PWR_CLK_STATE, 0);
 
-		i915_oa_init_reg_state(engine, ctx, regs);
+		i915_oa_init_reg_state(engine, ce, regs);
 	}
 
 	regs[CTX_END] = MI_BATCH_BUFFER_END;
@@ -2751,7 +2752,7 @@ static void execlists_init_reg_state(u32 *regs,
 }
 
 static int
-populate_lr_context(struct i915_gem_context *ctx,
+populate_lr_context(struct intel_context *ce,
 		    struct drm_i915_gem_object *ctx_obj,
 		    struct intel_engine_cs *engine,
 		    struct intel_ring *ring)
@@ -2797,11 +2798,12 @@ populate_lr_context(struct i915_gem_context *ctx,
 	/* The second page of the context object contains some fields which must
 	 * be set up prior to the first execution. */
 	regs = vaddr + LRC_STATE_PN * PAGE_SIZE;
-	execlists_init_reg_state(regs, ctx, engine, ring);
+	execlists_init_reg_state(regs, ce, engine, ring);
 	if (!engine->default_state)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
-	if (ctx == ctx->i915->preempt_context && INTEL_GEN(engine->i915) < 11)
+	if (ce->gem_context == engine->i915->preempt_context &&
+	    INTEL_GEN(engine->i915) < 11)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
 					   CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT);
@@ -2859,7 +2861,7 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_deref_obj;
 	}
 
-	ret = populate_lr_context(ctx, ctx_obj, engine, ring);
+	ret = populate_lr_context(ce, ctx_obj, engine, ring);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
 		goto error_ring_free;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 41/46] drm/i915: Split struct intel_context definition to its own header
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (39 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 40/46] drm/i915: Pass around the intel_context Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 42/46] drm/i915: Move over to intel_context_lookup() Chris Wilson
                   ` (11 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

This complex struct pulling in half the driver deserves its own
isolation in preparation for intel_context becoming an outright
complicated class of its own.

In order to split this beast into its own header also requests splitting
several of its dependent types and their dependencies into their own
headers as well.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.h       | 242 +-------
 drivers/gpu/drm/i915/i915_gem_context_types.h | 174 ++++++
 drivers/gpu/drm/i915/i915_timeline.h          |  68 +--
 drivers/gpu/drm/i915/i915_timeline_types.h    |  79 +++
 drivers/gpu/drm/i915/intel_context.h          |  47 ++
 drivers/gpu/drm/i915/intel_context_types.h    |  60 ++
 drivers/gpu/drm/i915/intel_engine_types.h     | 520 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_guc.h              |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h       | 501 +----------------
 drivers/gpu/drm/i915/intel_workarounds.h      |  13 +-
 .../gpu/drm/i915/intel_workarounds_types.h    |  25 +
 11 files changed, 912 insertions(+), 818 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_gem_context_types.h
 create mode 100644 drivers/gpu/drm/i915/i915_timeline_types.h
 create mode 100644 drivers/gpu/drm/i915/intel_context.h
 create mode 100644 drivers/gpu/drm/i915/intel_context_types.h
 create mode 100644 drivers/gpu/drm/i915/intel_engine_types.h
 create mode 100644 drivers/gpu/drm/i915/intel_workarounds_types.h

diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 775de1af1b10..b391aa1f55ae 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -25,221 +25,16 @@
 #ifndef __I915_GEM_CONTEXT_H__
 #define __I915_GEM_CONTEXT_H__
 
-#include <linux/bitops.h>
-#include <linux/list.h>
-#include <linux/radix-tree.h>
+#include "i915_gem_context_types.h"
 
 #include "i915_gem.h"
 #include "i915_scheduler.h"
+#include "intel_context.h"
 #include "intel_device_info.h"
 
-struct pid;
-
 struct drm_device;
 struct drm_file;
 
-struct drm_i915_private;
-struct drm_i915_file_private;
-struct i915_hw_ppgtt;
-struct i915_request;
-struct i915_timeline;
-struct i915_vma;
-struct intel_ring;
-
-#define DEFAULT_CONTEXT_HANDLE 0
-
-struct intel_context;
-
-struct intel_context_ops {
-	void (*unpin)(struct intel_context *ce);
-	void (*destroy)(struct intel_context *ce);
-};
-
-/*
- * Powergating configuration for a particular (context,engine).
- */
-struct intel_sseu {
-	u8 slice_mask;
-	u8 subslice_mask;
-	u8 min_eus_per_subslice;
-	u8 max_eus_per_subslice;
-};
-
-/**
- * struct i915_gem_context - client state
- *
- * The struct i915_gem_context represents the combined view of the driver and
- * logical hardware state for a particular client.
- */
-struct i915_gem_context {
-	/** i915: i915 device backpointer */
-	struct drm_i915_private *i915;
-
-	/** file_priv: owning file descriptor */
-	struct drm_i915_file_private *file_priv;
-
-	struct intel_engine_cs **engines;
-
-	struct i915_timeline *timeline;
-
-	/**
-	 * @ppgtt: unique address space (GTT)
-	 *
-	 * In full-ppgtt mode, each context has its own address space ensuring
-	 * complete seperation of one client from all others.
-	 *
-	 * In other modes, this is a NULL pointer with the expectation that
-	 * the caller uses the shared global GTT.
-	 */
-	struct i915_hw_ppgtt *ppgtt;
-
-	/**
-	 * @pid: process id of creator
-	 *
-	 * Note that who created the context may not be the principle user,
-	 * as the context may be shared across a local socket. However,
-	 * that should only affect the default context, all contexts created
-	 * explicitly by the client are expected to be isolated.
-	 */
-	struct pid *pid;
-
-	/**
-	 * @name: arbitrary name
-	 *
-	 * A name is constructed for the context from the creator's process
-	 * name, pid and user handle in order to uniquely identify the
-	 * context in messages.
-	 */
-	const char *name;
-
-	/** link: place with &drm_i915_private.context_list */
-	struct list_head link;
-	struct llist_node free_link;
-
-	/**
-	 * @ref: reference count
-	 *
-	 * A reference to a context is held by both the client who created it
-	 * and on each request submitted to the hardware using the request
-	 * (to ensure the hardware has access to the state until it has
-	 * finished all pending writes). See i915_gem_context_get() and
-	 * i915_gem_context_put() for access.
-	 */
-	struct kref ref;
-
-	/**
-	 * @rcu: rcu_head for deferred freeing.
-	 */
-	struct rcu_head rcu;
-
-	/**
-	 * @user_flags: small set of booleans controlled by the user
-	 */
-	unsigned long user_flags;
-#define UCONTEXT_NO_ZEROMAP		0
-#define UCONTEXT_NO_ERROR_CAPTURE	1
-#define UCONTEXT_BANNABLE		2
-
-	/**
-	 * @flags: small set of booleans
-	 */
-	unsigned long flags;
-#define CONTEXT_BANNED			0
-#define CONTEXT_CLOSED			1
-#define CONTEXT_FORCE_SINGLE_SUBMISSION	2
-
-	unsigned int nengine;
-
-	/**
-	 * @hw_id: - unique identifier for the context
-	 *
-	 * The hardware needs to uniquely identify the context for a few
-	 * functions like fault reporting, PASID, scheduling. The
-	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
-	 * id for the lifetime of the context.
-	 *
-	 * @hw_id_pin_count: - number of times this context had been pinned
-	 * for use (should be, at most, once per engine).
-	 *
-	 * @hw_id_link: - all contexts with an assigned id are tracked
-	 * for possible repossession.
-	 */
-	unsigned int hw_id;
-	atomic_t hw_id_pin_count;
-	struct list_head hw_id_link;
-
-	struct list_head active_engines;
-
-	/**
-	 * @user_handle: userspace identifier
-	 *
-	 * A unique per-file identifier is generated from
-	 * &drm_i915_file_private.contexts.
-	 */
-	u32 user_handle;
-
-	struct i915_sched_attr sched;
-
-	/** engine: per-engine logical HW state */
-	struct intel_context {
-		struct i915_gem_context *gem_context;
-		struct intel_engine_cs *engine;
-		struct intel_engine_cs *active;
-		struct list_head active_link;
-		struct list_head signal_link;
-		struct list_head signals;
-		struct i915_vma *state;
-		struct intel_ring *ring;
-		u32 *lrc_reg_state;
-		u64 lrc_desc;
-		int pin_count;
-
-		/**
-		 * active_tracker: Active tracker for the external rq activity
-		 * on this intel_context object.
-		 */
-		struct i915_active_request active_tracker;
-
-		const struct intel_context_ops *ops;
-
-		/** sseu: Control eu/slice partitioning */
-		struct intel_sseu sseu;
-	} __engine[I915_NUM_ENGINES];
-
-	/** ring_size: size for allocating the per-engine ring buffer */
-	u32 ring_size;
-	/** desc_template: invariant fields for the HW context descriptor */
-	u32 desc_template;
-
-	/** guilty_count: How many times this context has caused a GPU hang. */
-	atomic_t guilty_count;
-	/**
-	 * @active_count: How many times this context was active during a GPU
-	 * hang, but did not cause it.
-	 */
-	atomic_t active_count;
-
-#define CONTEXT_SCORE_GUILTY		10
-#define CONTEXT_SCORE_BAN_THRESHOLD	40
-	/** ban_score: Accumulated score of all hangs caused by this context. */
-	atomic_t ban_score;
-
-	/** remap_slice: Bitmask of cache lines that need remapping */
-	u8 remap_slice;
-
-	/** handles_vma: rbtree to look up our context specific obj/vma for
-	 * the user handle. (user handles are per fd, but the binding is
-	 * per vm, which may be one per context or shared with the global GTT)
-	 */
-	struct radix_tree_root handles_vma;
-
-	/** handles_list: reverse list of all the rbtree entries in use for
-	 * this context, which allows us to free all the allocations on
-	 * context close.
-	 */
-	struct list_head handles_list;
-};
-
 static inline bool i915_gem_context_is_closed(const struct i915_gem_context *ctx)
 {
 	return test_bit(CONTEXT_CLOSED, &ctx->flags);
@@ -326,35 +121,6 @@ static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
 	return !ctx->file_priv;
 }
 
-static inline struct intel_context *
-to_intel_context(struct i915_gem_context *ctx,
-		 const struct intel_engine_cs *engine)
-{
-	return &ctx->__engine[engine->id];
-}
-
-static inline struct intel_context *
-intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
-{
-	return engine->context_pin(engine, ctx);
-}
-
-static inline void __intel_context_pin(struct intel_context *ce)
-{
-	GEM_BUG_ON(!ce->pin_count);
-	ce->pin_count++;
-}
-
-static inline void intel_context_unpin(struct intel_context *ce)
-{
-	GEM_BUG_ON(!ce->pin_count);
-	if (--ce->pin_count)
-		return;
-
-	GEM_BUG_ON(!ce->ops);
-	ce->ops->unpin(ce);
-}
-
 /* i915_gem_context.c */
 int __must_check i915_gem_contexts_init(struct drm_i915_private *dev_priv);
 void i915_gem_contexts_lost(struct drm_i915_private *dev_priv);
@@ -403,8 +169,4 @@ static inline void i915_gem_context_put(struct i915_gem_context *ctx)
 	kref_put(&ctx->ref, i915_gem_context_release);
 }
 
-void intel_context_init(struct intel_context *ce,
-			struct i915_gem_context *ctx,
-			struct intel_engine_cs *engine);
-
 #endif /* !__I915_GEM_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
new file mode 100644
index 000000000000..b69309b46098
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -0,0 +1,174 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __I915_GEM_CONTEXT_TYPES_H__
+#define __I915_GEM_CONTEXT_TYPES_H__
+
+#include "i915_gem.h" /* I915_NUM_ENGINES */
+#include "intel_context_types.h"
+
+struct pid;
+
+struct drm_i915_private;
+struct drm_i915_file_private;
+struct i915_hw_ppgtt;
+struct i915_timeline;
+struct intel_ring;
+
+/**
+ * struct i915_gem_context - client state
+ *
+ * The struct i915_gem_context represents the combined view of the driver and
+ * logical hardware state for a particular client.
+ */
+struct i915_gem_context {
+	/** i915: i915 device backpointer */
+	struct drm_i915_private *i915;
+
+	/** file_priv: owning file descriptor */
+	struct drm_i915_file_private *file_priv;
+
+	struct intel_engine_cs **engines;
+
+	struct i915_timeline *timeline;
+
+	/**
+	 * @ppgtt: unique address space (GTT)
+	 *
+	 * In full-ppgtt mode, each context has its own address space ensuring
+	 * complete seperation of one client from all others.
+	 *
+	 * In other modes, this is a NULL pointer with the expectation that
+	 * the caller uses the shared global GTT.
+	 */
+	struct i915_hw_ppgtt *ppgtt;
+
+	/**
+	 * @pid: process id of creator
+	 *
+	 * Note that who created the context may not be the principle user,
+	 * as the context may be shared across a local socket. However,
+	 * that should only affect the default context, all contexts created
+	 * explicitly by the client are expected to be isolated.
+	 */
+	struct pid *pid;
+
+	/**
+	 * @name: arbitrary name
+	 *
+	 * A name is constructed for the context from the creator's process
+	 * name, pid and user handle in order to uniquely identify the
+	 * context in messages.
+	 */
+	const char *name;
+
+	/** link: place with &drm_i915_private.context_list */
+	struct list_head link;
+	struct llist_node free_link;
+
+	/**
+	 * @ref: reference count
+	 *
+	 * A reference to a context is held by both the client who created it
+	 * and on each request submitted to the hardware using the request
+	 * (to ensure the hardware has access to the state until it has
+	 * finished all pending writes). See i915_gem_context_get() and
+	 * i915_gem_context_put() for access.
+	 */
+	struct kref ref;
+
+	/**
+	 * @rcu: rcu_head for deferred freeing.
+	 */
+	struct rcu_head rcu;
+
+	/**
+	 * @user_flags: small set of booleans controlled by the user
+	 */
+	unsigned long user_flags;
+#define UCONTEXT_NO_ZEROMAP		0
+#define UCONTEXT_NO_ERROR_CAPTURE	1
+#define UCONTEXT_BANNABLE		2
+
+	/**
+	 * @flags: small set of booleans
+	 */
+	unsigned long flags;
+#define CONTEXT_BANNED			0
+#define CONTEXT_CLOSED			1
+#define CONTEXT_FORCE_SINGLE_SUBMISSION	2
+
+	unsigned int nengine;
+
+	/**
+	 * @hw_id: - unique identifier for the context
+	 *
+	 * The hardware needs to uniquely identify the context for a few
+	 * functions like fault reporting, PASID, scheduling. The
+	 * &drm_i915_private.context_hw_ida is used to assign a unqiue
+	 * id for the lifetime of the context.
+	 *
+	 * @hw_id_pin_count: - number of times this context had been pinned
+	 * for use (should be, at most, once per engine).
+	 *
+	 * @hw_id_link: - all contexts with an assigned id are tracked
+	 * for possible repossession.
+	 */
+	unsigned int hw_id;
+	atomic_t hw_id_pin_count;
+	struct list_head hw_id_link;
+
+	struct list_head active_engines;
+
+	/**
+	 * @user_handle: userspace identifier
+	 *
+	 * A unique per-file identifier is generated from
+	 * &drm_i915_file_private.contexts.
+	 */
+	u32 user_handle;
+#define DEFAULT_CONTEXT_HANDLE 0
+
+	struct i915_sched_attr sched;
+
+	/** engine: per-engine logical HW state */
+	struct intel_context __engine[I915_NUM_ENGINES];
+
+	/** ring_size: size for allocating the per-engine ring buffer */
+	u32 ring_size;
+	/** desc_template: invariant fields for the HW context descriptor */
+	u32 desc_template;
+
+	/** guilty_count: How many times this context has caused a GPU hang. */
+	atomic_t guilty_count;
+	/**
+	 * @active_count: How many times this context was active during a GPU
+	 * hang, but did not cause it.
+	 */
+	atomic_t active_count;
+
+#define CONTEXT_SCORE_GUILTY		10
+#define CONTEXT_SCORE_BAN_THRESHOLD	40
+	/** ban_score: Accumulated score of all hangs caused by this context. */
+	atomic_t ban_score;
+
+	/** remap_slice: Bitmask of cache lines that need remapping */
+	u8 remap_slice;
+
+	/** handles_vma: rbtree to look up our context specific obj/vma for
+	 * the user handle. (user handles are per fd, but the binding is
+	 * per vm, which may be one per context or shared with the global GTT)
+	 */
+	struct radix_tree_root handles_vma;
+
+	/** handles_list: reverse list of all the rbtree entries in use for
+	 * this context, which allows us to free all the allocations on
+	 * context close.
+	 */
+	struct list_head handles_list;
+};
+
+#endif /* __I915_GEM_CONTEXT_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index d78ec6fbc000..3b865e2b5abf 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -25,74 +25,10 @@
 #ifndef I915_TIMELINE_H
 #define I915_TIMELINE_H
 
-#include <linux/list.h>
-#include <linux/kref.h>
+#include <linux/lockdep.h>
 
-#include "i915_active.h"
-#include "i915_request.h"
 #include "i915_syncmap.h"
-#include "i915_utils.h"
-
-struct i915_vma;
-struct i915_timeline_cacheline;
-
-struct i915_timeline {
-	u64 fence_context;
-	u32 seqno;
-
-	spinlock_t lock;
-#define TIMELINE_CLIENT 0 /* default subclass */
-#define TIMELINE_ENGINE 1
-
-	unsigned int pin_count;
-	const u32 *hwsp_seqno;
-	struct i915_vma *hwsp_ggtt;
-	u32 hwsp_offset;
-
-	struct i915_timeline_cacheline *hwsp_cacheline;
-
-	bool has_initial_breadcrumb;
-
-	/**
-	 * List of breadcrumbs associated with GPU requests currently
-	 * outstanding.
-	 */
-	struct list_head requests;
-
-	/* Contains an RCU guarded pointer to the last request. No reference is
-	 * held to the request, users must carefully acquire a reference to
-	 * the request using i915_active_request_get_request_rcu(), or hold the
-	 * struct_mutex.
-	 */
-	struct i915_active_request last_request;
-
-	/**
-	 * We track the most recent seqno that we wait on in every context so
-	 * that we only have to emit a new await and dependency on a more
-	 * recent sync point. As the contexts may be executed out-of-order, we
-	 * have to track each individually and can not rely on an absolute
-	 * global_seqno. When we know that all tracked fences are completed
-	 * (i.e. when the driver is idle), we know that the syncmap is
-	 * redundant and we can discard it without loss of generality.
-	 */
-	struct i915_syncmap *sync;
-
-	/**
-	 * Barrier provides the ability to serialize ordering between different
-	 * timelines.
-	 *
-	 * Users can call i915_timeline_set_barrier which will make all
-	 * subsequent submissions to this timeline be executed only after the
-	 * barrier has been completed.
-	 */
-	struct i915_active_request barrier;
-
-	struct list_head link;
-	const char *name;
-	struct drm_i915_private *i915;
-
-	struct kref kref;
-};
+#include "i915_timeline_types.h"
 
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *tl,
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
new file mode 100644
index 000000000000..398905e8888f
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -0,0 +1,79 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2016 Intel Corporation
+ */
+
+#ifndef __I915_TIMELINE_TYPES_H__
+#define __I915_TIMELINE_TYPES_H__
+
+#include <linux/list.h>
+#include <linux/kref.h>
+#include <linux/types.h>
+
+#include "i915_active.h"
+
+struct drm_i915_private;
+struct i915_vma;
+struct i915_timeline_cacheline;
+struct i915_syncmap;
+
+struct i915_timeline {
+	u64 fence_context;
+	u32 seqno;
+
+	spinlock_t lock;
+#define TIMELINE_CLIENT 0 /* default subclass */
+#define TIMELINE_ENGINE 1
+
+	unsigned int pin_count;
+	const u32 *hwsp_seqno;
+	struct i915_vma *hwsp_ggtt;
+	u32 hwsp_offset;
+
+	struct i915_timeline_cacheline *hwsp_cacheline;
+
+	bool has_initial_breadcrumb;
+
+	/**
+	 * List of breadcrumbs associated with GPU requests currently
+	 * outstanding.
+	 */
+	struct list_head requests;
+
+	/* Contains an RCU guarded pointer to the last request. No reference is
+	 * held to the request, users must carefully acquire a reference to
+	 * the request using i915_active_request_get_request_rcu(), or hold the
+	 * struct_mutex.
+	 */
+	struct i915_active_request last_request;
+
+	/**
+	 * We track the most recent seqno that we wait on in every context so
+	 * that we only have to emit a new await and dependency on a more
+	 * recent sync point. As the contexts may be executed out-of-order, we
+	 * have to track each individually and can not rely on an absolute
+	 * global_seqno. When we know that all tracked fences are completed
+	 * (i.e. when the driver is idle), we know that the syncmap is
+	 * redundant and we can discard it without loss of generality.
+	 */
+	struct i915_syncmap *sync;
+
+	/**
+	 * Barrier provides the ability to serialize ordering between different
+	 * timelines.
+	 *
+	 * Users can call i915_timeline_set_barrier which will make all
+	 * subsequent submissions to this timeline be executed only after the
+	 * barrier has been completed.
+	 */
+	struct i915_active_request barrier;
+
+	struct list_head link;
+	const char *name;
+	struct drm_i915_private *i915;
+
+	struct kref kref;
+};
+
+#endif /* __I915_TIMELINE_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
new file mode 100644
index 000000000000..dd947692bb0b
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -0,0 +1,47 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_CONTEXT_H__
+#define __INTEL_CONTEXT_H__
+
+#include "i915_gem_context_types.h"
+#include "intel_context_types.h"
+#include "intel_engine_types.h"
+
+void intel_context_init(struct intel_context *ce,
+			struct i915_gem_context *ctx,
+			struct intel_engine_cs *engine);
+
+static inline struct intel_context *
+to_intel_context(struct i915_gem_context *ctx,
+		 const struct intel_engine_cs *engine)
+{
+	return &ctx->__engine[engine->id];
+}
+
+static inline struct intel_context *
+intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
+{
+	return engine->context_pin(engine, ctx);
+}
+
+static inline void __intel_context_pin(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->pin_count);
+	ce->pin_count++;
+}
+
+static inline void intel_context_unpin(struct intel_context *ce)
+{
+	GEM_BUG_ON(!ce->pin_count);
+	if (--ce->pin_count)
+		return;
+
+	GEM_BUG_ON(!ce->ops);
+	ce->ops->unpin(ce);
+}
+
+#endif /* __INTEL_CONTEXT_H__ */
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
new file mode 100644
index 000000000000..16e1306e9595
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -0,0 +1,60 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_CONTEXT_TYPES__
+#define __INTEL_CONTEXT_TYPES__
+
+#include <linux/list.h>
+#include <linux/types.h>
+
+#include "i915_active_types.h"
+
+struct i915_gem_context;
+struct i915_vma;
+struct intel_context;
+struct intel_ring;
+
+struct intel_context_ops {
+	void (*unpin)(struct intel_context *ce);
+	void (*destroy)(struct intel_context *ce);
+};
+
+/*
+ * Powergating configuration for a particular (context,engine).
+ */
+struct intel_sseu {
+	u8 slice_mask;
+	u8 subslice_mask;
+	u8 min_eus_per_subslice;
+	u8 max_eus_per_subslice;
+};
+
+struct intel_context {
+	struct i915_gem_context *gem_context;
+	struct intel_engine_cs *engine;
+	struct intel_engine_cs *active;
+	struct list_head active_link;
+	struct list_head signal_link;
+	struct list_head signals;
+	struct i915_vma *state;
+	struct intel_ring *ring;
+	u32 *lrc_reg_state;
+	u64 lrc_desc;
+	int pin_count;
+
+	/**
+	 * active_tracker: Active tracker for the external rq activity
+	 * on this intel_context object.
+	 */
+	struct i915_active_request active_tracker;
+
+	const struct intel_context_ops *ops;
+
+	/** sseu: Control eu/slice partitioning */
+	struct intel_sseu sseu;
+};
+
+#endif /* __INTEL_CONTEXT_TYPES__ */
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
new file mode 100644
index 000000000000..88a13435e474
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -0,0 +1,520 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#ifndef __INTEL_ENGINE_TYPES__
+#define __INTEL_ENGINE_TYPES__
+
+#include <linux/hashtable.h>
+#include <linux/irq_work.h>
+#include <linux/list.h>
+#include <linux/types.h>
+
+#include "i915_timeline_types.h"
+#include "intel_workarounds_types.h"
+
+#include "i915_gem_batch_pool.h"
+#include "i915_pmu.h"
+
+#define I915_MAX_SLICES	3
+#define I915_MAX_SUBSLICES 8
+
+#define I915_CMD_HASH_ORDER 9
+
+struct drm_i915_reg_table;
+struct i915_gem_context;
+struct i915_request;
+struct i915_sched_attr;
+
+struct intel_hw_status_page {
+	struct i915_vma *vma;
+	u32 *addr;
+};
+
+struct intel_instdone {
+	u32 instdone;
+	/* The following exist only in the RCS engine */
+	u32 slice_common;
+	u32 sampler[I915_MAX_SLICES][I915_MAX_SUBSLICES];
+	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
+};
+
+struct intel_engine_hangcheck {
+	u64 acthd;
+	u32 last_seqno;
+	u32 next_seqno;
+	unsigned long action_timestamp;
+	struct intel_instdone instdone;
+};
+
+struct intel_ring {
+	struct i915_vma *vma;
+	void *vaddr;
+
+	struct i915_timeline *timeline;
+	struct list_head request_list;
+	struct list_head active_link;
+
+	u32 head;
+	u32 tail;
+	u32 emit;
+
+	u32 space;
+	u32 size;
+	u32 effective_size;
+};
+
+/*
+ * we use a single page to load ctx workarounds so all of these
+ * values are referred in terms of dwords
+ *
+ * struct i915_wa_ctx_bb:
+ *  offset: specifies batch starting position, also helpful in case
+ *    if we want to have multiple batches at different offsets based on
+ *    some criteria. It is not a requirement at the moment but provides
+ *    an option for future use.
+ *  size: size of the batch in DWORDS
+ */
+struct i915_ctx_workarounds {
+	struct i915_wa_ctx_bb {
+		u32 offset;
+		u32 size;
+	} indirect_ctx, per_ctx;
+	struct i915_vma *vma;
+};
+
+#define I915_MAX_VCS	4
+#define I915_MAX_VECS	2
+
+/*
+ * Engine IDs definitions.
+ * Keep instances of the same type engine together.
+ */
+enum intel_engine_id {
+	RCS = 0,
+	BCS,
+	VCS,
+	VCS2,
+	VCS3,
+	VCS4,
+#define _VCS(n) (VCS + (n))
+	VECS,
+	VECS2
+#define _VECS(n) (VECS + (n))
+};
+
+struct st_preempt_hang {
+	struct completion completion;
+	unsigned int count;
+	bool inject_hang;
+};
+
+/**
+ * struct intel_engine_execlists - execlist submission queue and port state
+ *
+ * The struct intel_engine_execlists represents the combined logical state of
+ * driver and the hardware state for execlist mode of submission.
+ */
+struct intel_engine_execlists {
+	/**
+	 * @tasklet: softirq tasklet for bottom handler
+	 */
+	struct tasklet_struct tasklet;
+
+	/**
+	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
+	 */
+	struct i915_priolist default_priolist;
+
+	/**
+	 * @no_priolist: priority lists disabled
+	 */
+	bool no_priolist;
+
+	/**
+	 * @submit_reg: gen-specific execlist submission register
+	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
+	 * the ExecList Submission Queue Contents register array for Gen11+
+	 */
+	u32 __iomem *submit_reg;
+
+	/**
+	 * @ctrl_reg: the enhanced execlists control register, used to load the
+	 * submit queue on the HW and to request preemptions to idle
+	 */
+	u32 __iomem *ctrl_reg;
+
+	/**
+	 * @port: execlist port states
+	 *
+	 * For each hardware ELSP (ExecList Submission Port) we keep
+	 * track of the last request and the number of times we submitted
+	 * that port to hw. We then count the number of times the hw reports
+	 * a context completion or preemption. As only one context can
+	 * be active on hw, we limit resubmission of context to port[0]. This
+	 * is called Lite Restore, of the context.
+	 */
+	struct execlist_port {
+		/**
+		 * @request_count: combined request and submission count
+		 */
+		struct i915_request *request_count;
+#define EXECLIST_COUNT_BITS 2
+#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
+#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
+#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
+#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
+#define port_set(p, packed) ((p)->request_count = (packed))
+#define port_isset(p) ((p)->request_count)
+#define port_index(p, execlists) ((p) - (execlists)->port)
+
+		/**
+		 * @context_id: context ID for port
+		 */
+		GEM_DEBUG_DECL(u32 context_id);
+
+#define EXECLIST_MAX_PORTS 2
+	} port[EXECLIST_MAX_PORTS];
+
+	/**
+	 * @active: is the HW active? We consider the HW as active after
+	 * submitting any context for execution and until we have seen the
+	 * last context completion event. After that, we do not expect any
+	 * more events until we submit, and so can park the HW.
+	 *
+	 * As we have a small number of different sources from which we feed
+	 * the HW, we track the state of each inside a single bitfield.
+	 */
+	unsigned int active;
+#define EXECLISTS_ACTIVE_USER 0
+#define EXECLISTS_ACTIVE_PREEMPT 1
+#define EXECLISTS_ACTIVE_HWACK 2
+
+	/**
+	 * @port_mask: number of execlist ports - 1
+	 */
+	unsigned int port_mask;
+
+	/**
+	 * @queue_priority_hint: Highest pending priority.
+	 *
+	 * When we add requests into the queue, or adjust the priority of
+	 * executing requests, we compute the maximum priority of those
+	 * pending requests. We can then use this value to determine if
+	 * we need to preempt the executing requests to service the queue.
+	 * However, since the we may have recorded the priority of an inflight
+	 * request we wanted to preempt but since completed, at the time of
+	 * dequeuing the priority hint may no longer may match the highest
+	 * available request priority.
+	 */
+	int queue_priority_hint;
+
+	/**
+	 * @queue: queue of requests, in priority lists
+	 */
+	struct rb_root_cached queue;
+
+	/**
+	 * @csb_write: control register for Context Switch buffer
+	 *
+	 * Note this register may be either mmio or HWSP shadow.
+	 */
+	u32 *csb_write;
+
+	/**
+	 * @csb_status: status array for Context Switch buffer
+	 *
+	 * Note these register may be either mmio or HWSP shadow.
+	 */
+	u32 *csb_status;
+
+	/**
+	 * @preempt_complete_status: expected CSB upon completing preemption
+	 */
+	u32 preempt_complete_status;
+
+	/**
+	 * @csb_head: context status buffer head
+	 */
+	u8 csb_head;
+
+	I915_SELFTEST_DECLARE(struct st_preempt_hang preempt_hang;)
+};
+
+#define INTEL_ENGINE_CS_MAX_NAME 8
+
+struct intel_engine_cs {
+	struct drm_i915_private *i915;
+	char name[INTEL_ENGINE_CS_MAX_NAME];
+
+	enum intel_engine_id id;
+	unsigned int hw_id;
+	unsigned int guc_id;
+	unsigned long mask;
+
+	u8 uabi_class;
+
+	u8 class;
+	u8 instance;
+	u32 context_size;
+	u32 mmio_base;
+
+	struct intel_ring *buffer;
+
+	struct i915_timeline timeline;
+
+	struct drm_i915_gem_object *default_state;
+	void *pinned_default_state;
+
+	/* Rather than have every client wait upon all user interrupts,
+	 * with the herd waking after every interrupt and each doing the
+	 * heavyweight seqno dance, we delegate the task (of being the
+	 * bottom-half of the user interrupt) to the first client. After
+	 * every interrupt, we wake up one client, who does the heavyweight
+	 * coherent seqno read and either goes back to sleep (if incomplete),
+	 * or wakes up all the completed clients in parallel, before then
+	 * transferring the bottom-half status to the next client in the queue.
+	 *
+	 * Compared to walking the entire list of waiters in a single dedicated
+	 * bottom-half, we reduce the latency of the first waiter by avoiding
+	 * a context switch, but incur additional coherent seqno reads when
+	 * following the chain of request breadcrumbs. Since it is most likely
+	 * that we have a single client waiting on each seqno, then reducing
+	 * the overhead of waking that client is much preferred.
+	 */
+	struct intel_breadcrumbs {
+		spinlock_t irq_lock;
+		struct list_head signalers;
+
+		struct irq_work irq_work; /* for use from inside irq_lock */
+
+		unsigned int irq_enabled;
+
+		bool irq_armed;
+	} breadcrumbs;
+
+	struct {
+		/**
+		 * @enable: Bitmask of enable sample events on this engine.
+		 *
+		 * Bits correspond to sample event types, for instance
+		 * I915_SAMPLE_QUEUED is bit 0 etc.
+		 */
+		u32 enable;
+		/**
+		 * @enable_count: Reference count for the enabled samplers.
+		 *
+		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
+		 */
+		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
+		/**
+		 * @sample: Counter values for sampling events.
+		 *
+		 * Our internal timer stores the current counters in this field.
+		 *
+		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
+		 */
+		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
+	} pmu;
+
+	/*
+	 * A pool of objects to use as shadow copies of client batch buffers
+	 * when the command parser is enabled. Prevents the client from
+	 * modifying the batch contents after software parsing.
+	 */
+	struct i915_gem_batch_pool batch_pool;
+
+	struct intel_hw_status_page status_page;
+	struct i915_ctx_workarounds wa_ctx;
+	struct i915_wa_list ctx_wa_list;
+	struct i915_wa_list wa_list;
+	struct i915_wa_list whitelist;
+
+	u32             irq_keep_mask; /* always keep these interrupts */
+	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
+	void		(*irq_enable)(struct intel_engine_cs *engine);
+	void		(*irq_disable)(struct intel_engine_cs *engine);
+
+	int		(*init_hw)(struct intel_engine_cs *engine);
+
+	struct {
+		void (*prepare)(struct intel_engine_cs *engine);
+		void (*reset)(struct intel_engine_cs *engine, bool stalled);
+		void (*finish)(struct intel_engine_cs *engine);
+	} reset;
+
+	void		(*park)(struct intel_engine_cs *engine);
+	void		(*unpark)(struct intel_engine_cs *engine);
+
+	void		(*set_default_submission)(struct intel_engine_cs *engine);
+
+	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
+					     struct i915_gem_context *ctx);
+
+	int		(*request_alloc)(struct i915_request *rq);
+	int		(*init_context)(struct i915_request *rq);
+
+	int		(*emit_flush)(struct i915_request *request, u32 mode);
+#define EMIT_INVALIDATE	BIT(0)
+#define EMIT_FLUSH	BIT(1)
+#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
+	int		(*emit_bb_start)(struct i915_request *rq,
+					 u64 offset, u32 length,
+					 unsigned int dispatch_flags);
+#define I915_DISPATCH_SECURE BIT(0)
+#define I915_DISPATCH_PINNED BIT(1)
+	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
+	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
+						 u32 *cs);
+	unsigned int	emit_fini_breadcrumb_dw;
+
+	/* Pass the request to the hardware queue (e.g. directly into
+	 * the legacy ringbuffer or to the end of an execlist).
+	 *
+	 * This is called from an atomic context with irqs disabled; must
+	 * be irq safe.
+	 */
+	void		(*submit_request)(struct i915_request *rq);
+
+	/*
+	 * Call when the priority on a request has changed and it and its
+	 * dependencies may need rescheduling. Note the request itself may
+	 * not be ready to run!
+	 */
+	void		(*schedule)(struct i915_request *request,
+				    const struct i915_sched_attr *attr);
+
+	/*
+	 * Cancel all requests on the hardware, or queued for execution.
+	 * This should only cancel the ready requests that have been
+	 * submitted to the engine (via the engine->submit_request callback).
+	 * This is called when marking the device as wedged.
+	 */
+	void		(*cancel_requests)(struct intel_engine_cs *engine);
+
+	void		(*cleanup)(struct intel_engine_cs *engine);
+
+	struct intel_engine_execlists execlists;
+
+	/* Contexts are pinned whilst they are active on the GPU. The last
+	 * context executed remains active whilst the GPU is idle - the
+	 * switch away and write to the context object only occurs on the
+	 * next execution.  Contexts are only unpinned on retirement of the
+	 * following request ensuring that we can always write to the object
+	 * on the context switch even after idling. Across suspend, we switch
+	 * to the kernel context and trash it as the save may not happen
+	 * before the hardware is powered down.
+	 */
+	struct intel_context *last_retired_context;
+
+	/* status_notifier: list of callbacks for context-switch changes */
+	struct atomic_notifier_head context_status_notifier;
+
+	struct intel_engine_hangcheck hangcheck;
+
+#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
+#define I915_ENGINE_SUPPORTS_STATS   BIT(1)
+#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
+#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+	unsigned int flags;
+
+	/*
+	 * Table of commands the command parser needs to know about
+	 * for this engine.
+	 */
+	DECLARE_HASHTABLE(cmd_hash, I915_CMD_HASH_ORDER);
+
+	/*
+	 * Table of registers allowed in commands that read/write registers.
+	 */
+	const struct drm_i915_reg_table *reg_tables;
+	int reg_table_count;
+
+	/*
+	 * Returns the bitmask for the length field of the specified command.
+	 * Return 0 for an unrecognized/invalid command.
+	 *
+	 * If the command parser finds an entry for a command in the engine's
+	 * cmd_tables, it gets the command's length based on the table entry.
+	 * If not, it calls this function to determine the per-engine length
+	 * field encoding for the command (i.e. different opcode ranges use
+	 * certain bits to encode the command length in the header).
+	 */
+	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	struct {
+		/**
+		 * @lock: Lock protecting the below fields.
+		 */
+		seqlock_t lock;
+		/**
+		 * @enabled: Reference count indicating number of listeners.
+		 */
+		unsigned int enabled;
+		/**
+		 * @active: Number of contexts currently scheduled in.
+		 */
+		unsigned int active;
+		/**
+		 * @enabled_at: Timestamp when busy stats were enabled.
+		 */
+		ktime_t enabled_at;
+		/**
+		 * @start: Timestamp of the last idle to active transition.
+		 *
+		 * Idle is defined as active == 0, active is active > 0.
+		 */
+		ktime_t start;
+		/**
+		 * @total: Total time this engine was busy.
+		 *
+		 * Accumulated time not counting the most recent block in cases
+		 * where engine is currently busy (active > 0).
+		 */
+		ktime_t total;
+	} stats;
+};
+
+static inline bool
+intel_engine_needs_cmd_parser(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_NEEDS_CMD_PARSER;
+}
+
+static inline bool
+intel_engine_supports_stats(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
+}
+
+static inline bool
+intel_engine_has_preemption(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
+}
+
+static inline bool
+intel_engine_has_semaphores(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
+}
+
+#define instdone_slice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
+
+#define instdone_subslice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])
+
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))
+
+#endif /* __INTEL_ENGINE_TYPES_H__ */
diff --git a/drivers/gpu/drm/i915/intel_guc.h b/drivers/gpu/drm/i915/intel_guc.h
index 744220296653..77ec1bd4df5a 100644
--- a/drivers/gpu/drm/i915/intel_guc.h
+++ b/drivers/gpu/drm/i915/intel_guc.h
@@ -32,6 +32,7 @@
 #include "intel_guc_log.h"
 #include "intel_guc_reg.h"
 #include "intel_uc_fw.h"
+#include "i915_utils.h"
 #include "i915_vma.h"
 
 struct guc_preempt_work {
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index eb956479f48c..5f6d43c90675 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -16,13 +16,11 @@
 #include "i915_request.h"
 #include "i915_selftest.h"
 #include "i915_timeline.h"
+#include "intel_engine_types.h"
 #include "intel_gpu_commands.h"
 #include "intel_workarounds.h"
 
 struct drm_printer;
-struct i915_sched_attr;
-
-#define I915_CMD_HASH_ORDER 9
 
 /* Early gen2 devices have a cacheline of just 32 bytes, using 64 is overkill,
  * but keeps the logic simple. Indeed, the whole purpose of this macro is just
@@ -32,11 +30,6 @@ struct i915_sched_attr;
 #define CACHELINE_BYTES 64
 #define CACHELINE_DWORDS (CACHELINE_BYTES / sizeof(u32))
 
-struct intel_hw_status_page {
-	struct i915_vma *vma;
-	u32 *addr;
-};
-
 #define I915_READ_TAIL(engine) I915_READ(RING_TAIL((engine)->mmio_base))
 #define I915_WRITE_TAIL(engine, val) I915_WRITE(RING_TAIL((engine)->mmio_base), val)
 
@@ -91,498 +84,6 @@ hangcheck_action_to_str(const enum intel_engine_hangcheck_action a)
 	return "unknown";
 }
 
-#define I915_MAX_SLICES	3
-#define I915_MAX_SUBSLICES 8
-
-#define instdone_slice_mask(dev_priv__) \
-	(IS_GEN(dev_priv__, 7) ? \
-	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
-
-#define instdone_subslice_mask(dev_priv__) \
-	(IS_GEN(dev_priv__, 7) ? \
-	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])
-
-#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
-	for ((slice__) = 0, (subslice__) = 0; \
-	     (slice__) < I915_MAX_SLICES; \
-	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
-	       (slice__) += ((subslice__) == 0)) \
-		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
-			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))
-
-struct intel_instdone {
-	u32 instdone;
-	/* The following exist only in the RCS engine */
-	u32 slice_common;
-	u32 sampler[I915_MAX_SLICES][I915_MAX_SUBSLICES];
-	u32 row[I915_MAX_SLICES][I915_MAX_SUBSLICES];
-};
-
-struct intel_engine_hangcheck {
-	u64 acthd;
-	u32 last_seqno;
-	u32 next_seqno;
-	unsigned long action_timestamp;
-	struct intel_instdone instdone;
-};
-
-struct intel_ring {
-	struct i915_vma *vma;
-	void *vaddr;
-
-	struct i915_timeline *timeline;
-	struct list_head request_list;
-	struct list_head active_link;
-
-	u32 head;
-	u32 tail;
-	u32 emit;
-
-	u32 space;
-	u32 size;
-	u32 effective_size;
-};
-
-struct i915_gem_context;
-struct drm_i915_reg_table;
-
-/*
- * we use a single page to load ctx workarounds so all of these
- * values are referred in terms of dwords
- *
- * struct i915_wa_ctx_bb:
- *  offset: specifies batch starting position, also helpful in case
- *    if we want to have multiple batches at different offsets based on
- *    some criteria. It is not a requirement at the moment but provides
- *    an option for future use.
- *  size: size of the batch in DWORDS
- */
-struct i915_ctx_workarounds {
-	struct i915_wa_ctx_bb {
-		u32 offset;
-		u32 size;
-	} indirect_ctx, per_ctx;
-	struct i915_vma *vma;
-};
-
-struct i915_request;
-
-#define I915_MAX_VCS	4
-#define I915_MAX_VECS	2
-
-/*
- * Engine IDs definitions.
- * Keep instances of the same type engine together.
- */
-enum intel_engine_id {
-	RCS = 0,
-	BCS,
-	VCS,
-	VCS2,
-	VCS3,
-	VCS4,
-#define _VCS(n) (VCS + (n))
-	VECS,
-	VECS2
-#define _VECS(n) (VECS + (n))
-};
-
-struct st_preempt_hang {
-	struct completion completion;
-	unsigned int count;
-	bool inject_hang;
-};
-
-/**
- * struct intel_engine_execlists - execlist submission queue and port state
- *
- * The struct intel_engine_execlists represents the combined logical state of
- * driver and the hardware state for execlist mode of submission.
- */
-struct intel_engine_execlists {
-	/**
-	 * @tasklet: softirq tasklet for bottom handler
-	 */
-	struct tasklet_struct tasklet;
-
-	/**
-	 * @default_priolist: priority list for I915_PRIORITY_NORMAL
-	 */
-	struct i915_priolist default_priolist;
-
-	/**
-	 * @no_priolist: priority lists disabled
-	 */
-	bool no_priolist;
-
-	/**
-	 * @submit_reg: gen-specific execlist submission register
-	 * set to the ExecList Submission Port (elsp) register pre-Gen11 and to
-	 * the ExecList Submission Queue Contents register array for Gen11+
-	 */
-	u32 __iomem *submit_reg;
-
-	/**
-	 * @ctrl_reg: the enhanced execlists control register, used to load the
-	 * submit queue on the HW and to request preemptions to idle
-	 */
-	u32 __iomem *ctrl_reg;
-
-	/**
-	 * @port: execlist port states
-	 *
-	 * For each hardware ELSP (ExecList Submission Port) we keep
-	 * track of the last request and the number of times we submitted
-	 * that port to hw. We then count the number of times the hw reports
-	 * a context completion or preemption. As only one context can
-	 * be active on hw, we limit resubmission of context to port[0]. This
-	 * is called Lite Restore, of the context.
-	 */
-	struct execlist_port {
-		/**
-		 * @request_count: combined request and submission count
-		 */
-		struct i915_request *request_count;
-#define EXECLIST_COUNT_BITS 2
-#define port_request(p) ptr_mask_bits((p)->request_count, EXECLIST_COUNT_BITS)
-#define port_count(p) ptr_unmask_bits((p)->request_count, EXECLIST_COUNT_BITS)
-#define port_pack(rq, count) ptr_pack_bits(rq, count, EXECLIST_COUNT_BITS)
-#define port_unpack(p, count) ptr_unpack_bits((p)->request_count, count, EXECLIST_COUNT_BITS)
-#define port_set(p, packed) ((p)->request_count = (packed))
-#define port_isset(p) ((p)->request_count)
-#define port_index(p, execlists) ((p) - (execlists)->port)
-
-		/**
-		 * @context_id: context ID for port
-		 */
-		GEM_DEBUG_DECL(u32 context_id);
-
-#define EXECLIST_MAX_PORTS 2
-	} port[EXECLIST_MAX_PORTS];
-
-	/**
-	 * @active: is the HW active? We consider the HW as active after
-	 * submitting any context for execution and until we have seen the
-	 * last context completion event. After that, we do not expect any
-	 * more events until we submit, and so can park the HW.
-	 *
-	 * As we have a small number of different sources from which we feed
-	 * the HW, we track the state of each inside a single bitfield.
-	 */
-	unsigned int active;
-#define EXECLISTS_ACTIVE_USER 0
-#define EXECLISTS_ACTIVE_PREEMPT 1
-#define EXECLISTS_ACTIVE_HWACK 2
-
-	/**
-	 * @port_mask: number of execlist ports - 1
-	 */
-	unsigned int port_mask;
-
-	/**
-	 * @queue_priority_hint: Highest pending priority.
-	 *
-	 * When we add requests into the queue, or adjust the priority of
-	 * executing requests, we compute the maximum priority of those
-	 * pending requests. We can then use this value to determine if
-	 * we need to preempt the executing requests to service the queue.
-	 * However, since the we may have recorded the priority of an inflight
-	 * request we wanted to preempt but since completed, at the time of
-	 * dequeuing the priority hint may no longer may match the highest
-	 * available request priority.
-	 */
-	int queue_priority_hint;
-
-	/**
-	 * @queue: queue of requests, in priority lists
-	 */
-	struct rb_root_cached queue;
-
-	/**
-	 * @csb_write: control register for Context Switch buffer
-	 *
-	 * Note this register may be either mmio or HWSP shadow.
-	 */
-	u32 *csb_write;
-
-	/**
-	 * @csb_status: status array for Context Switch buffer
-	 *
-	 * Note these register may be either mmio or HWSP shadow.
-	 */
-	u32 *csb_status;
-
-	/**
-	 * @preempt_complete_status: expected CSB upon completing preemption
-	 */
-	u32 preempt_complete_status;
-
-	/**
-	 * @csb_head: context status buffer head
-	 */
-	u8 csb_head;
-
-	I915_SELFTEST_DECLARE(struct st_preempt_hang preempt_hang;)
-};
-
-#define INTEL_ENGINE_CS_MAX_NAME 8
-
-struct intel_engine_cs {
-	struct drm_i915_private *i915;
-	char name[INTEL_ENGINE_CS_MAX_NAME];
-
-	enum intel_engine_id id;
-	unsigned int hw_id;
-	unsigned int guc_id;
-	unsigned long mask;
-
-	u8 uabi_class;
-
-	u8 class;
-	u8 instance;
-	u32 context_size;
-	u32 mmio_base;
-
-	struct intel_ring *buffer;
-
-	struct i915_timeline timeline;
-
-	struct drm_i915_gem_object *default_state;
-	void *pinned_default_state;
-
-	/* Rather than have every client wait upon all user interrupts,
-	 * with the herd waking after every interrupt and each doing the
-	 * heavyweight seqno dance, we delegate the task (of being the
-	 * bottom-half of the user interrupt) to the first client. After
-	 * every interrupt, we wake up one client, who does the heavyweight
-	 * coherent seqno read and either goes back to sleep (if incomplete),
-	 * or wakes up all the completed clients in parallel, before then
-	 * transferring the bottom-half status to the next client in the queue.
-	 *
-	 * Compared to walking the entire list of waiters in a single dedicated
-	 * bottom-half, we reduce the latency of the first waiter by avoiding
-	 * a context switch, but incur additional coherent seqno reads when
-	 * following the chain of request breadcrumbs. Since it is most likely
-	 * that we have a single client waiting on each seqno, then reducing
-	 * the overhead of waking that client is much preferred.
-	 */
-	struct intel_breadcrumbs {
-		spinlock_t irq_lock;
-		struct list_head signalers;
-
-		struct irq_work irq_work; /* for use from inside irq_lock */
-
-		unsigned int irq_enabled;
-
-		bool irq_armed;
-	} breadcrumbs;
-
-	struct {
-		/**
-		 * @enable: Bitmask of enable sample events on this engine.
-		 *
-		 * Bits correspond to sample event types, for instance
-		 * I915_SAMPLE_QUEUED is bit 0 etc.
-		 */
-		u32 enable;
-		/**
-		 * @enable_count: Reference count for the enabled samplers.
-		 *
-		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
-		 */
-		unsigned int enable_count[I915_ENGINE_SAMPLE_COUNT];
-		/**
-		 * @sample: Counter values for sampling events.
-		 *
-		 * Our internal timer stores the current counters in this field.
-		 *
-		 * Index number corresponds to @enum drm_i915_pmu_engine_sample.
-		 */
-		struct i915_pmu_sample sample[I915_ENGINE_SAMPLE_COUNT];
-	} pmu;
-
-	/*
-	 * A pool of objects to use as shadow copies of client batch buffers
-	 * when the command parser is enabled. Prevents the client from
-	 * modifying the batch contents after software parsing.
-	 */
-	struct i915_gem_batch_pool batch_pool;
-
-	struct intel_hw_status_page status_page;
-	struct i915_ctx_workarounds wa_ctx;
-	struct i915_wa_list ctx_wa_list;
-	struct i915_wa_list wa_list;
-	struct i915_wa_list whitelist;
-
-	u32             irq_keep_mask; /* always keep these interrupts */
-	u32		irq_enable_mask; /* bitmask to enable ring interrupt */
-	void		(*irq_enable)(struct intel_engine_cs *engine);
-	void		(*irq_disable)(struct intel_engine_cs *engine);
-
-	int		(*init_hw)(struct intel_engine_cs *engine);
-
-	struct {
-		void (*prepare)(struct intel_engine_cs *engine);
-		void (*reset)(struct intel_engine_cs *engine, bool stalled);
-		void (*finish)(struct intel_engine_cs *engine);
-	} reset;
-
-	void		(*park)(struct intel_engine_cs *engine);
-	void		(*unpark)(struct intel_engine_cs *engine);
-
-	void		(*set_default_submission)(struct intel_engine_cs *engine);
-
-	struct intel_context *(*context_pin)(struct intel_engine_cs *engine,
-					     struct i915_gem_context *ctx);
-
-	int		(*request_alloc)(struct i915_request *rq);
-	int		(*init_context)(struct i915_request *rq);
-
-	int		(*emit_flush)(struct i915_request *request, u32 mode);
-#define EMIT_INVALIDATE	BIT(0)
-#define EMIT_FLUSH	BIT(1)
-#define EMIT_BARRIER	(EMIT_INVALIDATE | EMIT_FLUSH)
-	int		(*emit_bb_start)(struct i915_request *rq,
-					 u64 offset, u32 length,
-					 unsigned int dispatch_flags);
-#define I915_DISPATCH_SECURE BIT(0)
-#define I915_DISPATCH_PINNED BIT(1)
-	int		 (*emit_init_breadcrumb)(struct i915_request *rq);
-	u32		*(*emit_fini_breadcrumb)(struct i915_request *rq,
-						 u32 *cs);
-	unsigned int	emit_fini_breadcrumb_dw;
-
-	/* Pass the request to the hardware queue (e.g. directly into
-	 * the legacy ringbuffer or to the end of an execlist).
-	 *
-	 * This is called from an atomic context with irqs disabled; must
-	 * be irq safe.
-	 */
-	void		(*submit_request)(struct i915_request *rq);
-
-	/*
-	 * Call when the priority on a request has changed and it and its
-	 * dependencies may need rescheduling. Note the request itself may
-	 * not be ready to run!
-	 */
-	void		(*schedule)(struct i915_request *request,
-				    const struct i915_sched_attr *attr);
-
-	/*
-	 * Cancel all requests on the hardware, or queued for execution.
-	 * This should only cancel the ready requests that have been
-	 * submitted to the engine (via the engine->submit_request callback).
-	 * This is called when marking the device as wedged.
-	 */
-	void		(*cancel_requests)(struct intel_engine_cs *engine);
-
-	void		(*cleanup)(struct intel_engine_cs *engine);
-
-	struct intel_engine_execlists execlists;
-
-	/* Contexts are pinned whilst they are active on the GPU. The last
-	 * context executed remains active whilst the GPU is idle - the
-	 * switch away and write to the context object only occurs on the
-	 * next execution.  Contexts are only unpinned on retirement of the
-	 * following request ensuring that we can always write to the object
-	 * on the context switch even after idling. Across suspend, we switch
-	 * to the kernel context and trash it as the save may not happen
-	 * before the hardware is powered down.
-	 */
-	struct intel_context *last_retired_context;
-
-	/* status_notifier: list of callbacks for context-switch changes */
-	struct atomic_notifier_head context_status_notifier;
-
-	struct intel_engine_hangcheck hangcheck;
-
-#define I915_ENGINE_NEEDS_CMD_PARSER BIT(0)
-#define I915_ENGINE_SUPPORTS_STATS   BIT(1)
-#define I915_ENGINE_HAS_PREEMPTION   BIT(2)
-#define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
-	unsigned int flags;
-
-	/*
-	 * Table of commands the command parser needs to know about
-	 * for this engine.
-	 */
-	DECLARE_HASHTABLE(cmd_hash, I915_CMD_HASH_ORDER);
-
-	/*
-	 * Table of registers allowed in commands that read/write registers.
-	 */
-	const struct drm_i915_reg_table *reg_tables;
-	int reg_table_count;
-
-	/*
-	 * Returns the bitmask for the length field of the specified command.
-	 * Return 0 for an unrecognized/invalid command.
-	 *
-	 * If the command parser finds an entry for a command in the engine's
-	 * cmd_tables, it gets the command's length based on the table entry.
-	 * If not, it calls this function to determine the per-engine length
-	 * field encoding for the command (i.e. different opcode ranges use
-	 * certain bits to encode the command length in the header).
-	 */
-	u32 (*get_cmd_length_mask)(u32 cmd_header);
-
-	struct {
-		/**
-		 * @lock: Lock protecting the below fields.
-		 */
-		seqlock_t lock;
-		/**
-		 * @enabled: Reference count indicating number of listeners.
-		 */
-		unsigned int enabled;
-		/**
-		 * @active: Number of contexts currently scheduled in.
-		 */
-		unsigned int active;
-		/**
-		 * @enabled_at: Timestamp when busy stats were enabled.
-		 */
-		ktime_t enabled_at;
-		/**
-		 * @start: Timestamp of the last idle to active transition.
-		 *
-		 * Idle is defined as active == 0, active is active > 0.
-		 */
-		ktime_t start;
-		/**
-		 * @total: Total time this engine was busy.
-		 *
-		 * Accumulated time not counting the most recent block in cases
-		 * where engine is currently busy (active > 0).
-		 */
-		ktime_t total;
-	} stats;
-};
-
-static inline bool
-intel_engine_needs_cmd_parser(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_NEEDS_CMD_PARSER;
-}
-
-static inline bool
-intel_engine_supports_stats(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_SUPPORTS_STATS;
-}
-
-static inline bool
-intel_engine_has_preemption(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
-}
-
-static inline bool
-intel_engine_has_semaphores(const struct intel_engine_cs *engine)
-{
-	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
-}
-
 void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
 
 static inline bool __execlists_need_preempt(int prio, int last)
diff --git a/drivers/gpu/drm/i915/intel_workarounds.h b/drivers/gpu/drm/i915/intel_workarounds.h
index 7c734714b05e..a1bf51c611a9 100644
--- a/drivers/gpu/drm/i915/intel_workarounds.h
+++ b/drivers/gpu/drm/i915/intel_workarounds.h
@@ -9,18 +9,7 @@
 
 #include <linux/slab.h>
 
-struct i915_wa {
-	i915_reg_t	  reg;
-	u32		  mask;
-	u32		  val;
-};
-
-struct i915_wa_list {
-	const char	*name;
-	struct i915_wa	*list;
-	unsigned int	count;
-	unsigned int	wa_count;
-};
+#include "intel_workarounds_types.h"
 
 static inline void intel_wa_list_free(struct i915_wa_list *wal)
 {
diff --git a/drivers/gpu/drm/i915/intel_workarounds_types.h b/drivers/gpu/drm/i915/intel_workarounds_types.h
new file mode 100644
index 000000000000..032a0dc49275
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_workarounds_types.h
@@ -0,0 +1,25 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2014-2018 Intel Corporation
+ */
+
+#ifndef __INTEL_WORKAROUNDS_TYPES_H__
+#define __INTEL_WORKAROUNDS_TYPES_H__
+
+#include "i915_reg.h"
+
+struct i915_wa {
+	i915_reg_t	  reg;
+	u32		  mask;
+	u32		  val;
+};
+
+struct i915_wa_list {
+	const char	*name;
+	struct i915_wa	*list;
+	unsigned int	count;
+	unsigned int	wa_count;
+};
+
+#endif /* __INTEL_WORKAROUNDS_TYPES_H__ */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 42/46] drm/i915: Move over to intel_context_lookup()
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (40 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 41/46] drm/i915: Split struct intel_context definition to its own header Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 14:27   ` [PATCH] " Chris Wilson
  2019-02-06 13:03 ` [PATCH 43/46] drm/i915: Load balancing across a virtual engine Chris Wilson
                   ` (10 subsequent siblings)
  52 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gvt/mmio_context.c       |   3 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |  13 +-
 drivers/gpu/drm/i915/i915_gem.c               |   9 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |  57 ++------
 drivers/gpu/drm/i915/i915_gem_context_types.h |   8 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |   4 +-
 drivers/gpu/drm/i915/i915_perf.c              |   4 +-
 drivers/gpu/drm/i915/intel_context.c          | 137 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_context.h          |  37 ++++-
 drivers/gpu/drm/i915/intel_context_types.h    |   2 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
 drivers/gpu/drm/i915/intel_engine_types.h     |   5 +
 drivers/gpu/drm/i915/intel_guc_ads.c          |   4 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c              |  35 +++--
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  23 ++-
 drivers/gpu/drm/i915/selftests/mock_context.c |   7 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |   6 +-
 19 files changed, 268 insertions(+), 93 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_context.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 89105b1aaf12..d7292b349c0d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -86,6 +86,7 @@ i915-y += \
 	  i915_trace_points.o \
 	  i915_vma.o \
 	  intel_breadcrumbs.o \
+	  intel_context.o \
 	  intel_engine_cs.o \
 	  intel_hangcheck.o \
 	  intel_lrc.o \
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
index 7d84cfb9051a..442a74805129 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.c
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
@@ -492,7 +492,8 @@ static void switch_mmio(struct intel_vgpu *pre,
 			 * itself.
 			 */
 			if (mmio->in_context &&
-			    !is_inhibit_context(&s->shadow_ctx->__engine[ring_id]))
+			    !is_inhibit_context(intel_context_lookup(s->shadow_ctx,
+								     dev_priv->engine[ring_id])))
 				continue;
 
 			if (mmio->mask)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 846bd0de3cfa..9ed7ffef54ad 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -392,7 +392,11 @@ static void print_context_stats(struct seq_file *m,
 		enum intel_engine_id id;
 
 		for_each_engine(engine, i915, id) {
-			struct intel_context *ce = to_intel_context(ctx, engine);
+			struct intel_context *ce;
+
+			ce = intel_context_lookup(ctx, engine);
+			if (!ce)
+				continue;
 
 			if (ce->state)
 				per_file_stats(0, ce->state->obj, &kstats);
@@ -1913,8 +1917,11 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		seq_putc(m, '\n');
 
 		for_each_engine(engine, dev_priv, id) {
-			struct intel_context *ce =
-				to_intel_context(ctx, engine);
+			struct intel_context *ce;
+
+			ce = intel_context_lookup(ctx, engine);
+			if (!ce)
+				continue;
 
 			seq_printf(m, "%s: ", engine->name);
 			if (ce->state)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d50588d54d0b..68726d81efef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4770,15 +4770,20 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 	}
 
 	for_each_engine(engine, i915, id) {
+		struct intel_context *ce;
 		struct i915_vma *state;
 		void *vaddr;
 
-		GEM_BUG_ON(to_intel_context(ctx, engine)->pin_count);
+		ce = intel_context_lookup(ctx, engine);
+		if (!ce)
+			continue;
 
-		state = to_intel_context(ctx, engine)->state;
+		state = ce->state;
 		if (!state)
 			continue;
 
+		GEM_BUG_ON(ce->pin_count);
+
 		/*
 		 * As we will hold a reference to the logical state, it will
 		 * not be torn down with the context, and importantly the
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 20580463175e..e75fad339ab8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -224,7 +224,7 @@ static void release_hw_id(struct i915_gem_context *ctx)
 
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
-	unsigned int n;
+	struct intel_context *it, *n;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
@@ -235,11 +235,10 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	kfree(ctx->engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
-		struct intel_context *ce = &ctx->__engine[n];
-
-		if (ce->ops)
-			ce->ops->destroy(ce);
+	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node) {
+		if (it->ops && it->ops->destroy)
+			it->ops->destroy(it);
+		kfree(it);
 	}
 
 	if (ctx->timeline)
@@ -346,39 +345,11 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
-static void intel_context_retire(struct i915_active_request *active,
-				 struct i915_request *rq)
-{
-	struct intel_context *ce =
-		container_of(active, typeof(*ce), active_tracker);
-
-	intel_context_unpin(ce);
-}
-
-void
-intel_context_init(struct intel_context *ce,
-		   struct i915_gem_context *ctx,
-		   struct intel_engine_cs *engine)
-{
-	ce->gem_context = ctx;
-	ce->engine = engine;
-
-	INIT_LIST_HEAD(&ce->signal_link);
-	INIT_LIST_HEAD(&ce->signals);
-
-	/* Use the whole device by default */
-	ce->sseu = intel_device_default_sseu(ctx->i915);
-
-	i915_active_request_init(&ce->active_tracker,
-				 NULL, intel_context_retire);
-}
-
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
-	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -391,8 +362,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	INIT_LIST_HEAD(&ctx->active_engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
-		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
+	ctx->hw_contexts = RB_ROOT;
+	spin_lock_init(&ctx->hw_contexts_lock);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
@@ -936,8 +907,6 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 		struct intel_ring *ring;
 		struct i915_request *rq;
 
-		GEM_BUG_ON(!to_intel_context(i915->kernel_context, engine));
-
 		rq = i915_request_alloc(engine, i915->kernel_context);
 		if (IS_ERR(rq))
 			return PTR_ERR(rq);
@@ -1172,9 +1141,13 @@ __i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
 				    struct intel_engine_cs *engine,
 				    struct intel_sseu sseu)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
 	int ret = 0;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
 	GEM_BUG_ON(engine->id != RCS);
 
@@ -1725,13 +1698,15 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (!engine)
 		return -EINVAL;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	/* Only use for mutex here is to serialize get_param and set_param. */
 	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
 	if (ret)
 		return ret;
 
-	ce = to_intel_context(ctx, engine);
-
 	user_sseu.slice_mask = ce->sseu.slice_mask;
 	user_sseu.subslice_mask = ce->sseu.subslice_mask;
 	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index b69309b46098..865bbcd72ad4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -7,7 +7,8 @@
 #ifndef __I915_GEM_CONTEXT_TYPES_H__
 #define __I915_GEM_CONTEXT_TYPES_H__
 
-#include "i915_gem.h" /* I915_NUM_ENGINES */
+#include <linux/rbtree.h>
+
 #include "intel_context_types.h"
 
 struct pid;
@@ -134,8 +135,9 @@ struct i915_gem_context {
 
 	struct i915_sched_attr sched;
 
-	/** engine: per-engine logical HW state */
-	struct intel_context __engine[I915_NUM_ENGINES];
+	/** hw_contexts: per-engine logical HW state */
+	struct rb_root hw_contexts;
+	spinlock_t hw_contexts_lock;
 
 	/** ring_size: size for allocating the per-engine ring buffer */
 	u32 ring_size;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5052b49f8dcd..cb57178b8fe3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -790,8 +790,8 @@ static int eb_wait_for_ring(const struct i915_execbuffer *eb)
 	 * keeping all of their resources pinned.
 	 */
 
-	ce = to_intel_context(eb->ctx, eb->engine);
-	if (!ce->ring) /* first use, assume empty! */
+	ce = intel_context_lookup(eb->ctx, eb->engine);
+	if (!ce || !ce->ring) /* first use, assume empty! */
 		return 0;
 
 	rq = __eb_wait_for_ring(ce->ring);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f969a0512465..ecca231ca83a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1740,11 +1740,11 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
-		struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce = intel_context_lookup(ctx, engine);
 		u32 *regs;
 
 		/* OA settings will be set upon first use */
-		if (!ce->state)
+		if (!ce || !ce->state)
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, map_type);
diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
new file mode 100644
index 000000000000..e3e8bc84fdf0
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context.c
@@ -0,0 +1,137 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_gem_context.h"
+#include "intel_context.h"
+#include "intel_ringbuffer.h"
+
+struct intel_context *
+intel_context_lookup(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	struct intel_context *ce = NULL;
+	struct rb_node *p;
+
+	spin_lock(&ctx->hw_contexts_lock);
+	p = ctx->hw_contexts.rb_node;
+	while (p) {
+		struct intel_context *this =
+			rb_entry(p, struct intel_context, node);
+
+		if (this->engine == engine) {
+			ce = this;
+			break;
+		}
+
+		if (this->engine < engine)
+			p = p->rb_right;
+		else
+			p = p->rb_left;
+	}
+	spin_unlock(&ctx->hw_contexts_lock);
+
+	return ce;
+}
+
+struct intel_context *
+__intel_context_insert(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context *ce)
+{
+	struct rb_node **p, *parent;
+	int err = 0;
+
+	spin_lock(&ctx->hw_contexts_lock);
+
+	parent = NULL;
+	p = &ctx->hw_contexts.rb_node;
+	while (*p) {
+		struct intel_context *this;
+
+		parent = *p;
+		this = rb_entry(parent, struct intel_context, node);
+
+		if (this->engine == engine) {
+			err = -EEXIST;
+			ce = this;
+			break;
+		}
+
+		if (this->engine < engine)
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+	if (!err) {
+		rb_link_node(&ce->node, parent, p);
+		rb_insert_color(&ce->node, &ctx->hw_contexts);
+	}
+
+	spin_unlock(&ctx->hw_contexts_lock);
+
+	return ce;
+}
+
+void __intel_context_remove(struct intel_context *ce)
+{
+	struct i915_gem_context *ctx = ce->gem_context;
+
+	spin_lock(&ctx->hw_contexts_lock);
+	rb_erase(&ce->node, &ctx->hw_contexts);
+	spin_unlock(&ctx->hw_contexts_lock);
+}
+
+struct intel_context *
+intel_context_instance(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine)
+{
+	struct intel_context *ce, *pos;
+
+	ce = intel_context_lookup(ctx, engine);
+	if (likely(ce))
+		return ce;
+
+	ce = kzalloc(sizeof(*ce), GFP_KERNEL);
+	if (!ce)
+		return ERR_PTR(-ENOMEM);
+
+	intel_context_init(ce, ctx, engine);
+
+	pos = __intel_context_insert(ctx, engine, ce);
+	if (unlikely(pos != ce)) /* Beaten! Use their HW context instead */
+		kfree(ce);
+
+	GEM_BUG_ON(intel_context_lookup(ctx, engine) != pos);
+	return pos;
+}
+
+static void intel_context_retire(struct i915_active_request *active,
+				 struct i915_request *rq)
+{
+	struct intel_context *ce =
+		container_of(active, typeof(*ce), active_tracker);
+
+	intel_context_unpin(ce);
+}
+
+void
+intel_context_init(struct intel_context *ce,
+		   struct i915_gem_context *ctx,
+		   struct intel_engine_cs *engine)
+{
+	ce->gem_context = ctx;
+	ce->engine = engine;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
+
+	/* Use the whole device by default */
+	ce->sseu = intel_device_default_sseu(ctx->i915);
+
+	i915_active_request_init(&ce->active_tracker,
+				 NULL, intel_context_retire);
+}
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
index dd947692bb0b..c3fffd9b8ae4 100644
--- a/drivers/gpu/drm/i915/intel_context.h
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -7,7 +7,6 @@
 #ifndef __INTEL_CONTEXT_H__
 #define __INTEL_CONTEXT_H__
 
-#include "i915_gem_context_types.h"
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
 
@@ -15,12 +14,36 @@ void intel_context_init(struct intel_context *ce,
 			struct i915_gem_context *ctx,
 			struct intel_engine_cs *engine);
 
-static inline struct intel_context *
-to_intel_context(struct i915_gem_context *ctx,
-		 const struct intel_engine_cs *engine)
-{
-	return &ctx->__engine[engine->id];
-}
+/**
+ * intel_context_lookup - Find the matching HW context for this (ctx, engine)
+ * @ctx - the parent GEM context
+ * @engine - the target HW engine
+ *
+ * May return NULL if the HW context hasn't been instantiated (i.e. unused).
+ */
+struct intel_context *
+intel_context_lookup(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine);
+
+/**
+ * intel_context_instance - Lookup or allocate the HW context for (ctx, engine)
+ * @ctx - the parent GEM context
+ * @engine - the target HW engine
+ *
+ * Returns the existing HW context for this pair of (GEM context, engine), or
+ * allocates and initialises a fresh context. Once allocated, the HW context
+ * remains resident until the GEM context is destroyed.
+ */
+struct intel_context *
+intel_context_instance(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine);
+
+struct intel_context *
+__intel_context_insert(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context *ce);
+void
+__intel_context_remove(struct intel_context *ce);
 
 static inline struct intel_context *
 intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
index 16e1306e9595..857f5c335324 100644
--- a/drivers/gpu/drm/i915/intel_context_types.h
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -8,6 +8,7 @@
 #define __INTEL_CONTEXT_TYPES__
 
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/types.h>
 
 #include "i915_active_types.h"
@@ -52,6 +53,7 @@ struct intel_context {
 	struct i915_active_request active_tracker;
 
 	const struct intel_context_ops *ops;
+	struct rb_node node;
 
 	/** sseu: Control eu/slice partitioning */
 	struct intel_sseu sseu;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 0372aaa9756c..a5b2d50208ef 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -644,7 +644,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
-	intel_context_unpin(to_intel_context(ctx, engine));
+	intel_context_unpin(intel_context_lookup(ctx, engine));
 }
 
 struct measure_breadcrumb {
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 88a13435e474..8f0aedb2c2d8 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -230,6 +230,11 @@ struct intel_engine_execlists {
 	 */
 	u32 *csb_status;
 
+	/**
+	 * @preempt_context: the HW context for injecting preempt-to-idle
+	 */
+	struct intel_context *preempt_context;
+
 	/**
 	 * @preempt_complete_status: expected CSB upon completing preemption
 	 */
diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
index f0db62887f50..da220561ac41 100644
--- a/drivers/gpu/drm/i915/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/intel_guc_ads.c
@@ -121,8 +121,8 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	 * to find it. Note that we have to skip our header (1 page),
 	 * because our GuC shared data is there.
 	 */
-	kernel_ctx_vma = to_intel_context(dev_priv->kernel_context,
-					  dev_priv->engine[RCS])->state;
+	kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
+					      dev_priv->engine[RCS])->state;
 	blob->ads.golden_context_lrca =
 		intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 4366db7978a8..fea07e51f109 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -382,7 +382,7 @@ static void guc_stage_desc_init(struct intel_guc_client *client)
 	desc->db_id = client->doorbell_id;
 
 	for_each_engine_masked(engine, dev_priv, client->engines, tmp) {
-		struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce = intel_context_lookup(ctx, engine);
 		u32 guc_engine_id = engine->guc_id;
 		struct guc_execlist_context *lrc = &desc->lrc[guc_engine_id];
 
@@ -567,7 +567,7 @@ static void inject_preempt_context(struct work_struct *work)
 					     preempt_work[engine->id]);
 	struct intel_guc_client *client = guc->preempt_client;
 	struct guc_stage_desc *stage_desc = __get_stage_desc(client);
-	struct intel_context *ce = to_intel_context(client->owner, engine);
+	struct intel_context *ce = intel_context_lookup(client->owner, engine);
 	u32 data[7];
 
 	if (!ce->ring->emit) { /* recreate upon load/resume */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index caec509543b5..48797a1c0964 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -622,8 +622,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 static void inject_preempt_context(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
-	struct intel_context *ce =
-		to_intel_context(engine->i915->preempt_context, engine);
+	struct intel_context *ce = execlists->preempt_context;
 	unsigned int n;
 
 	GEM_BUG_ON(execlists->preempt_complete_status !=
@@ -1231,19 +1230,22 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
-static void execlists_context_destroy(struct intel_context *ce)
+static void __execlists_context_fini(struct intel_context *ce)
 {
-	GEM_BUG_ON(ce->pin_count);
-
-	if (!ce->state)
-		return;
-
 	intel_ring_free(ce->ring);
 
 	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
 	i915_gem_object_put(ce->state->obj);
 }
 
+static void execlists_context_destroy(struct intel_context *ce)
+{
+	GEM_BUG_ON(ce->pin_count);
+
+	if (ce->state)
+		__execlists_context_fini(ce);
+}
+
 static void execlists_context_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine;
@@ -1386,7 +1388,11 @@ static struct intel_context *
 execlists_context_pin(struct intel_engine_cs *engine,
 		      struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!ctx->ppgtt);
@@ -2323,7 +2329,7 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
 
 	engine->flags |= I915_ENGINE_HAS_SEMAPHORES;
 	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
-	if (engine->i915->preempt_context)
+	if (engine->execlists.preempt_context)
 		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
 }
 
@@ -2427,8 +2433,9 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 	execlists->preempt_complete_status = ~0u;
 	if (i915->preempt_context) {
 		struct intel_context *ce =
-			to_intel_context(i915->preempt_context, engine);
+			intel_context_lookup(i915->preempt_context, engine);
 
+		execlists->preempt_context = ce;
 		execlists->preempt_complete_status =
 			upper_32_bits(ce->lrc_desc);
 	}
@@ -2802,7 +2809,7 @@ populate_lr_context(struct intel_context *ce,
 	if (!engine->default_state)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT);
-	if (ce->gem_context == engine->i915->preempt_context &&
+	if (ce == engine->execlists.preempt_context &&
 	    INTEL_GEN(engine->i915) < 11)
 		regs[CTX_CONTEXT_CONTROL + 1] |=
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
@@ -2899,9 +2906,9 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	list_for_each_entry(ctx, &i915->contexts.list, link) {
 		for_each_engine(engine, i915, id) {
 			struct intel_context *ce =
-				to_intel_context(ctx, engine);
+				intel_context_lookup(ctx, engine);
 
-			if (!ce->state)
+			if (!ce || !ce->state)
 				continue;
 
 			intel_ring_reset(ce->ring, 0);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4557f715663d..9a5b420273a8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1348,15 +1348,18 @@ intel_ring_free(struct intel_ring *ring)
 	kfree(ring);
 }
 
+static void __intel_ring_context_fini(struct intel_context *ce)
+{
+	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
+	i915_gem_object_put(ce->state->obj);
+}
+
 static void intel_ring_context_destroy(struct intel_context *ce)
 {
 	GEM_BUG_ON(ce->pin_count);
 
-	if (!ce->state)
-		return;
-
-	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
-	i915_gem_object_put(ce->state->obj);
+	if (ce->state)
+		__intel_ring_context_fini(ce);
 }
 
 static int __context_pin_ppgtt(struct i915_gem_context *ctx)
@@ -1555,7 +1558,11 @@ static struct intel_context *
 intel_ring_context_pin(struct intel_engine_cs *engine,
 		       struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 
@@ -1754,8 +1761,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		 * placeholder we use to flush other contexts.
 		 */
 		*cs++ = MI_SET_CONTEXT;
-		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
-							  engine)->state) |
+		*cs++ = i915_ggtt_offset(intel_context_lookup(i915->kernel_context,
+							      engine)->state) |
 			MI_MM_SPACE_GTT |
 			MI_RESTORE_INHIBIT;
 	}
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 8137ff6f01b2..58d805757052 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -30,7 +30,6 @@ mock_context(struct drm_i915_private *i915,
 	     const char *name)
 {
 	struct i915_gem_context *ctx;
-	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -41,14 +40,14 @@ mock_context(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&ctx->link);
 	ctx->i915 = i915;
 
+	ctx->hw_contexts = RB_ROOT;
+	spin_lock_init(&ctx->hw_contexts_lock);
+
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
 	INIT_LIST_HEAD(&ctx->active_engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
-		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
-
 	ret = i915_gem_context_pin_hw_id(ctx);
 	if (ret < 0)
 		goto err_handles;
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index b8c6769571c4..8f72d26c58fe 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -146,9 +146,13 @@ static struct intel_context *
 mock_context_pin(struct intel_engine_cs *engine,
 		 struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
 	int err = -ENOMEM;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
+
 	if (ce->pin_count++)
 		return ce;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 43/46] drm/i915: Load balancing across a virtual engine
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (41 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 42/46] drm/i915: Move over to intel_context_lookup() Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 44/46] drm/i915: Extend execution fence to support a callback Chris Wilson
                   ` (9 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Having allowed the user to define a set of engines that they will want
to only use, we go one step further and allow them to bind those engines
into a single virtual instance. Submitting a batch to the virtual engine
will then forward it to any one of the set in a manner as best to
distribute load.  The virtual engine has a single timeline across all
engines (it operates as a single queue), so it is not able to concurrently
run batches across multiple engines by itself; that is left up to the user
to submit multiple concurrent batches to multiple queues. Multiple users
will be load balanced across the system.

The mechanism used for load balancing in this patch is a late greedy
balancer. When a request is ready for execution, it is added to each
engine's queue, and when an engine is ready for its next request it
claims it from the virtual engine. The first engine to do so, wins, i.e.
the request is executed at the earliest opportunity (idle moment) in the
system.

As not all HW is created equal, the user is still able to skip the
virtual engine and execute the batch on a specific engine, all within the
same queue. It will then be executed in order on the correct engine,
with execution on other virtual engines being moved away due to the load
detection.

A couple of areas for potential improvement left!

- The virtual engine always take priority over equal-priority tasks.
Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
and hopefully the virtual and real engines are not then congested (i.e.
all work is via virtual engines, or all work is to the real engine).

- We require the breadcrumb irq around every virtual engine request. For
normal engines, we eliminate the need for the slow round trip via
interrupt by using the submit fence and queueing in order. For virtual
engines, we have to allow any job to transfer to a new ring, and cannot
coalesce the submissions, so require the completion fence instead,
forcing the persistent use of interrupts.

- We only drip feed single requests through each virtual engine and onto
the physical engines, even if there was enough work to fill all ELSP,
leaving small stalls with an idle CS event at the end of every request.
Could we be greedy and fill both slots? Being lazy is virtuous for load
distribution on less-than-full workloads though.

Other areas of improvement are more general, such as reducing lock
contention, reducing dispatch overhead, looking at direct submission
rather than bouncing around tasklets etc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    |  90 +++-
 drivers/gpu/drm/i915/i915_scheduler.c      |  20 +-
 drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
 drivers/gpu/drm/i915/intel_lrc.c           | 502 ++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h           |   6 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 +++++++
 include/uapi/drm/i915_drm.h                |  27 ++
 9 files changed, 809 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index b0e4b976880c..9905fcdd33c8 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -89,4 +89,9 @@ static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
 	return !atomic_read(&t->count);
 }
 
+static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
+{
+	return test_bit(TASKLET_STATE_SCHED, &t->state);
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e75fad339ab8..2b474e72f9e2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -91,6 +91,7 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
+#include "intel_lrc.h"
 #include "intel_workarounds.h"
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
@@ -233,7 +234,10 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
-	kfree(ctx->engines);
+	if (ctx->engines) {
+		intel_virtual_engine_destroy(ctx->engines[0]);
+		kfree(ctx->engines);
+	}
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node) {
 		if (it->ops && it->ops->destroy)
@@ -1338,13 +1342,91 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 };
 
+static int check_user_mbz64(u64 __user *user)
+{
+	u64 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
 struct set_engines {
 	struct i915_gem_context *ctx;
 	struct intel_engine_cs **engines;
 	unsigned int nengine;
 };
 
+static int
+set_engines__load_balance(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_load_balance __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *ve;
+	unsigned int n;
+	u64 all_engines;
+	u64 mask;
+	int err;
+
+	/* We always use I915_EXEC_DEFAULT[0] for the load-balancer */
+	if (set->engines[0])
+		return -EEXIST;
+
+	if (!HAS_EXECLISTS(set->ctx->i915))
+		return -ENODEV;
+
+	if (USES_GUC_SUBMISSION(set->ctx->i915))
+		return -ENODEV; /* not implement yet */
+
+	err = check_user_mbz64(&ext->flags);
+	if (err)
+		return err;
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz); n++) {
+		err = check_user_mbz64(&ext->mbz[n]);
+		if (err)
+			return err;
+	}
+
+	if (get_user(mask, &ext->engines_mask))
+		return -EFAULT;
+
+	all_engines = GENMASK_ULL(set->nengine - 1, 1);
+	mask &= all_engines;
+
+	if (!mask) {
+		return -EINVAL;
+	} else if (is_power_of_2(mask)) {
+		ve = set->engines[__ffs64(mask)];
+	} else if (mask == all_engines) {
+		ve = intel_execlists_create_virtual(set->ctx,
+						    set->engines + 1,
+						    set->nengine - 1);
+	} else {
+		struct intel_engine_cs *stack[64];
+		int bit;
+
+		n = 0;
+		for_each_set_bit(bit, (unsigned long *)&mask, set->nengine)
+			stack[n++] = set->engines[bit];
+
+		ve = intel_execlists_create_virtual(set->ctx, stack, n);
+	}
+	if (IS_ERR(ve))
+		return PTR_ERR(ve);
+
+	if (cmpxchg(&set->engines[0], NULL, ve)) {
+		intel_virtual_engine_destroy(ve);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
+	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 };
 
 static int
@@ -1405,13 +1487,17 @@ set_engines(struct i915_gem_context *ctx,
 					   ARRAY_SIZE(set_engines__extensions),
 					   &set);
 	if (err) {
+		intel_virtual_engine_destroy(set.engines[0]);
 		kfree(set.engines);
 		return err;
 	}
 
 out:
 	mutex_lock(&ctx->i915->drm.struct_mutex);
-	kfree(ctx->engines);
+	if (ctx->engines) {
+		intel_virtual_engine_destroy(ctx->engines[0]);
+		kfree(ctx->engines);
+	}
 	ctx->engines = set.engines;
 	ctx->nengine = set.nengine;
 	mutex_unlock(&ctx->i915->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index d579e21c0cf4..f34680543b08 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -236,18 +236,27 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
 }
 
 static struct intel_engine_cs *
-sched_lock_engine(struct i915_sched_node *node, struct intel_engine_cs *locked)
+sched_lock_engine(const struct i915_sched_node *node,
+		  struct intel_engine_cs *locked)
 {
-	struct intel_engine_cs *engine = node_to_request(node)->engine;
+	const struct i915_request *rq = node_to_request(node);
+	struct intel_engine_cs *engine;
 
 	GEM_BUG_ON(!locked);
 
-	if (engine != locked) {
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	while (locked != (engine = READ_ONCE(rq->engine))) {
 		spin_unlock(&locked->timeline.lock);
 		spin_lock(&engine->timeline.lock);
+		locked = engine;
 	}
 
-	return engine;
+	return locked;
 }
 
 static bool inflight(const struct i915_request *rq,
@@ -357,8 +366,11 @@ static void __i915_schedule(struct i915_request *rq,
 		if (prio <= node->attr.priority || node_signaled(node))
 			continue;
 
+		GEM_BUG_ON(node_to_request(node)->engine != engine);
+
 		node->attr.priority = prio;
 		if (!list_empty(&node->link)) {
+			GEM_BUG_ON(intel_engine_is_virtual(engine));
 			if (last != engine) {
 				pl = i915_sched_lookup_priolist(engine, prio);
 				last = engine;
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index 398905e8888f..b02d4702dd5b 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -25,6 +25,7 @@ struct i915_timeline {
 	spinlock_t lock;
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
+#define TIMELINE_VIRTUAL 2
 
 	unsigned int pin_count;
 	const u32 *hwsp_seqno;
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 8f0aedb2c2d8..1cf2740ca60e 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -215,6 +215,7 @@ struct intel_engine_execlists {
 	 * @queue: queue of requests, in priority lists
 	 */
 	struct rb_root_cached queue;
+	struct rb_root_cached virtual;
 
 	/**
 	 * @csb_write: control register for Context Switch buffer
@@ -423,6 +424,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+#define I915_ENGINE_IS_VIRTUAL       BIT(4)
 	unsigned int flags;
 
 	/*
@@ -506,6 +508,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
+static inline bool
+intel_engine_is_virtual(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_IS_VIRTUAL;
+}
+
 #define instdone_slice_mask(dev_priv__) \
 	(IS_GEN(dev_priv__, 7) ? \
 	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 48797a1c0964..0b25fffdb8bc 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -166,6 +166,27 @@
 
 #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
+struct virtual_engine {
+	struct intel_engine_cs base;
+
+	struct intel_context context;
+	struct kref kref;
+
+	struct i915_request *request;
+	struct ve_node {
+		struct rb_node rb;
+		int prio;
+	} nodes[I915_NUM_ENGINES];
+
+	unsigned int count;
+	struct intel_engine_cs *siblings[0];
+};
+
+static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
+{
+	return container_of(engine, struct virtual_engine, base);
+}
+
 static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 					    struct intel_engine_cs *engine,
 					    struct intel_context *ce);
@@ -236,7 +257,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 }
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
-				const struct i915_request *rq)
+				const struct i915_request *rq,
+				struct rb_node *rb)
 {
 	int last_prio;
 
@@ -271,6 +293,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, link)) > last_prio)
 		return true;
 
+	if (rb) { /* XXX virtual precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		bool preempt = false;
+
+		if (engine == ve->siblings[0]) { /* only preempt one sibling */
+			spin_lock(&ve->base.timeline.lock);
+			if (ve->request)
+				preempt = rq_prio(ve->request) > last_prio;
+			spin_unlock(&ve->base.timeline.lock);
+		}
+
+		if (preempt)
+			return preempt;
+	}
+
 	/*
 	 * If the inflight context did not trigger the preemption, then maybe
 	 * it was the set of queued requests? Pick the highest priority in
@@ -390,6 +428,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	list_for_each_entry_safe_reverse(rq, rn,
 					 &engine->timeline.requests,
 					 link) {
+		struct intel_engine_cs *owner;
+
 		if (i915_request_completed(rq))
 			break;
 
@@ -398,14 +438,22 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		GEM_BUG_ON(rq->hw_context->active);
 
-		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-		if (rq_prio(rq) != prio) {
-			prio = rq_prio(rq);
-			pl = i915_sched_lookup_priolist(engine, prio);
-		}
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+		owner = rq->hw_context->engine;
+		if (likely(owner == engine)) {
+			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+			if (rq_prio(rq) != prio) {
+				prio = rq_prio(rq);
+				pl = i915_sched_lookup_priolist(engine, prio);
+			}
+			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
-		list_add(&rq->sched.link, pl);
+			list_add(&rq->sched.link, pl);
+		} else {
+			rq->engine = owner;
+			if (__i915_request_has_started(rq))
+				rq->sched.attr.priority |= ACTIVE_PRIORITY;
+			owner->submit_request(rq);
+		}
 
 		active = rq;
 	}
@@ -661,6 +709,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 						  execlists));
 }
 
+static void virtual_update_register_offsets(u32 *regs,
+					    struct intel_engine_cs *engine)
+{
+	u32 base = engine->mmio_base;
+
+	regs[CTX_CONTEXT_CONTROL] =
+		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
+	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
+	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
+	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
+	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
+
+	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
+	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
+	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
+	regs[CTX_SECOND_BB_HEAD_U] =
+		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
+	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
+	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
+
+	regs[CTX_CTX_TIMESTAMP] =
+		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
+	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
+	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
+	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
+	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
+	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
+	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
+	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
+	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
+
+	if (engine->class == RENDER_CLASS) {
+		regs[CTX_RCS_INDIRECT_CTX] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
+		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
+		regs[CTX_BB_PER_CTX_PTR] =
+			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
+
+		regs[CTX_R_PWR_CLK_STATE] =
+			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+	}
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -693,6 +785,28 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
+	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+		struct intel_engine_cs *active;
+
+		if (!rq) {
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		active = READ_ONCE(ve->context.active);
+		if (active && active != engine) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		break;
+	}
+
 	if (last) {
 		/*
 		 * Don't resubmit or switch until all outstanding
@@ -714,7 +828,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last)) {
+		if (need_preempt(engine, last, rb)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -754,6 +868,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		last->tail = last->wa_tail;
 	}
 
+	while (rb) { /* XXX virtual is always taking precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq;
+
+		spin_lock(&ve->base.timeline.lock);
+
+		rq = ve->request;
+		if (unlikely(!rq)) { /* lost the race to a sibling */
+			spin_unlock(&ve->base.timeline.lock);
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		if (rq_prio(rq) >= queue_prio(execlists)) {
+			if (last && !can_merge_rq(last, rq)) {
+				spin_unlock(&ve->base.timeline.lock);
+				return; /* leave this rq for another engine */
+			}
+
+			GEM_BUG_ON(rq->engine != &ve->base);
+			ve->request = NULL;
+			ve->base.execlists.queue_priority_hint = INT_MIN;
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+
+			GEM_BUG_ON(rq->hw_context != &ve->context);
+			rq->engine = engine;
+
+			if (engine != ve->siblings[0]) {
+				u32 *regs = ve->context.lrc_reg_state;
+				unsigned int n;
+
+				GEM_BUG_ON(READ_ONCE(ve->context.active));
+				virtual_update_register_offsets(regs, engine);
+
+				/*
+				 * Move the bound engine to the top of the list
+				 * for future execution. We then kick this
+				 * tasklet first before checking others, so that
+				 * we preferentially reuse this set of bound
+				 * registers.
+				 */
+				for (n = 1; n < ve->count; n++) {
+					if (ve->siblings[n] == engine) {
+						swap(ve->siblings[n],
+						     ve->siblings[0]);
+						break;
+					}
+				}
+
+				GEM_BUG_ON(ve->siblings[0] != engine);
+			}
+
+			__i915_request_submit(rq);
+			trace_i915_request_in(rq, port_index(port, execlists));
+			submit = true;
+			last = rq;
+		}
+
+		spin_unlock(&ve->base.timeline.lock);
+		break;
+	}
+
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
@@ -2919,6 +3099,287 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	}
 }
 
+static void virtual_engine_free(struct kref *kref)
+{
+	struct virtual_engine *ve = container_of(kref, typeof(*ve), kref);
+	unsigned int n;
+
+	GEM_BUG_ON(ve->request);
+	GEM_BUG_ON(ve->context.active);
+
+	for (n = 0; n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct rb_node *node = &ve->nodes[sibling->id].rb;
+
+		if (RB_EMPTY_NODE(node))
+			continue;
+
+		spin_lock_irq(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(node))
+			rb_erase_cached(node, &sibling->execlists.virtual);
+
+		spin_unlock_irq(&sibling->timeline.lock);
+	}
+	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
+
+	if (ve->context.state)
+		__execlists_context_fini(&ve->context);
+
+	i915_timeline_fini(&ve->base.timeline);
+	kfree(ve);
+}
+
+static void virtual_context_unpin(struct intel_context *ce)
+{
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+
+	execlists_context_unpin(ce);
+
+	kref_put(&ve->kref, virtual_engine_free);
+}
+
+static const struct intel_context_ops virtual_context_ops = {
+	.unpin = virtual_context_unpin,
+};
+
+static void virtual_initial_engine_hint(struct virtual_engine *ve)
+{
+	int swp;
+
+	/*
+	 * Pick a random sibling on starting to help spread the load around.
+	 *
+	 * New contexts are typically created with exactly the same order
+	 * of siblings, and often started in batches. Due to the way we iterate
+	 * the array of sibling when submitting requests, sibling[0] is
+	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
+	 * randomised across the system, we also help spread the load by the
+	 * first engine we inspect being different each time.
+	 *
+	 * NB This does not force us to execute on this engine, it will just
+	 * typically be the first we inspect for submission.
+	 */
+	swp = prandom_u32_max(ve->count);
+	if (!swp)
+		return;
+
+	swap(ve->siblings[swp], ve->siblings[0]);
+	virtual_update_register_offsets(ve->context.lrc_reg_state,
+					ve->siblings[0]);
+}
+
+static struct intel_context *
+virtual_context_pin(struct intel_engine_cs *engine,
+		    struct i915_gem_context *ctx)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct intel_context *ce = &ve->context;
+
+	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
+
+	if (likely(ce->pin_count++))
+		return ce;
+	GEM_BUG_ON(!ce->pin_count); /* no overflow please! */
+
+	kref_get(&ve->kref);
+	ce->ops = &virtual_context_ops;
+
+	/* Note: we must use a real engine class for setting up reg state */
+	ce = __execlists_context_pin(ve->siblings[0], ctx, ce);
+	if (!IS_ERR(ce))
+		virtual_initial_engine_hint(ve);
+
+	return ce;
+}
+
+static void virtual_submission_tasklet(unsigned long data)
+{
+	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int n;
+	int prio;
+
+	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
+	if (prio == INT_MIN)
+		return;
+
+	local_irq_disable();
+	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct ve_node * const node = &ve->nodes[sibling->id];
+		struct rb_node **parent, *rb;
+		bool first;
+
+		spin_lock(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(&node->rb)) {
+			first = rb_first_cached(&sibling->execlists.virtual) == &node->rb;
+			if (prio == node->prio || (prio > node->prio && first))
+				goto submit_engine;
+
+			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
+		}
+
+		rb = NULL;
+		first = true;
+		parent = &sibling->execlists.virtual.rb_root.rb_node;
+		while (*parent) {
+			struct ve_node *other;
+
+			rb = *parent;
+			other = rb_entry(rb, typeof(*other), rb);
+			if (prio > other->prio) {
+				parent = &rb->rb_left;
+			} else {
+				parent = &rb->rb_right;
+				first = false;
+			}
+		}
+
+		rb_link_node(&node->rb, rb, parent);
+		rb_insert_color_cached(&node->rb,
+				       &sibling->execlists.virtual,
+				       first);
+
+submit_engine:
+		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
+		node->prio = prio;
+		if (first && prio > sibling->execlists.queue_priority_hint) {
+			sibling->execlists.queue_priority_hint = prio;
+			tasklet_hi_schedule(&sibling->execlists.tasklet);
+		}
+
+		spin_unlock(&sibling->timeline.lock);
+	}
+	local_irq_enable();
+}
+
+static void virtual_submit_request(struct i915_request *request)
+{
+	struct virtual_engine *ve = to_virtual_engine(request->engine);
+
+	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
+
+	GEM_BUG_ON(ve->request);
+	ve->base.execlists.queue_priority_hint = rq_prio(request);
+	WRITE_ONCE(ve->request, request);
+
+	tasklet_schedule(&ve->base.execlists.tasklet);
+}
+
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count)
+{
+	struct virtual_engine *ve;
+	unsigned int n;
+	int err;
+
+	if (!count)
+		return ERR_PTR(-EINVAL);
+
+	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&ve->kref);
+	ve->base.i915 = ctx->i915;
+	ve->base.id = -1;
+	ve->base.class = OTHER_CLASS;
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	err = i915_timeline_init(ctx->i915,
+				 &ve->base.timeline,
+				 ve->base.name,
+				 NULL);
+	if (err)
+		goto err_put;
+	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
+
+	intel_context_init(&ve->context, ctx, &ve->base);
+
+	ve->base.context_pin = virtual_context_pin;
+	ve->base.request_alloc = execlists_request_alloc;
+
+	ve->base.schedule = i915_schedule;
+	ve->base.submit_request = virtual_submit_request;
+
+	ve->base.execlists.queue_priority_hint = INT_MIN;
+	tasklet_init(&ve->base.execlists.tasklet,
+		     virtual_submission_tasklet,
+		     (unsigned long)ve);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask)
+			continue;
+
+		if (sibling->execlists.tasklet.func != execlists_submission_tasklet) {
+			err = -ENODEV;
+			goto err_put;
+		}
+
+		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
+		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
+
+		ve->siblings[ve->count++] = sibling;
+		ve->base.mask |= sibling->mask;
+
+		if (ve->base.class != OTHER_CLASS) {
+			if (ve->base.class != sibling->class) {
+				err = -EINVAL;
+				goto err_put;
+			}
+			continue;
+		}
+
+		ve->base.class = sibling->class;
+		snprintf(ve->base.name, sizeof(ve->base.name),
+			 "v%dx%d", ve->base.class, count);
+		ve->base.context_size = sibling->context_size;
+
+		ve->base.emit_bb_start = sibling->emit_bb_start;
+		ve->base.emit_flush = sibling->emit_flush;
+		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
+		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
+		ve->base.emit_fini_breadcrumb_dw =
+			sibling->emit_fini_breadcrumb_dw;
+	}
+
+	/* gracefully replace a degenerate virtual engine */
+	if (is_power_of_2(ve->base.mask)) {
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);
+		return actual;
+	}
+
+	__intel_context_insert(ctx, &ve->base, &ve->context);
+	return &ve->base;
+
+err_put:
+	virtual_engine_free(&ve->kref);
+	return ERR_PTR(err);
+}
+
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (!engine || !intel_engine_is_virtual(engine))
+		return;
+
+	__intel_context_remove(&ve->context);
+
+	kref_put(&ve->kref, virtual_engine_free);
+}
+
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
@@ -2976,6 +3437,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\tQ ");
 	}
 
+	last = NULL;
+	count = 0;
+	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (rq) {
+			if (count++ < max - 1)
+				show_request(m, rq, "\t\tV ");
+			else
+				last = rq;
+		}
+	}
+	if (last) {
+		if (count > max) {
+			drm_printf(m,
+				   "\t\t...skipping %d virtual requests...\n",
+				   count - max);
+		}
+		show_request(m, last, "\t\tV ");
+	}
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f1aec8a6986f..c6f441137e3f 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,6 +112,12 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							const char *prefix),
 				   unsigned int max);
 
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count);
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
+
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 1a3af4b4107d..e85184e084e4 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -10,6 +10,7 @@
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
+#include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
 
@@ -1040,6 +1041,169 @@ static int live_preempt_smoke(void *arg)
 	return err;
 }
 
+static int nop_virtual_engine(struct drm_i915_private *i915,
+			      struct intel_engine_cs **siblings,
+			      unsigned int nsibling,
+			      unsigned int nctx,
+			      unsigned int flags)
+#define CHAIN BIT(0)
+{
+	IGT_TIMEOUT(end_time);
+	struct i915_request *request[16];
+	struct i915_gem_context *ctx[16];
+	struct intel_engine_cs *ve[16];
+	unsigned long n, prime, nc;
+	struct igt_live_test t;
+	ktime_t times[2] = {};
+	int err;
+
+	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
+
+	for (n = 0; n < nctx; n++) {
+		ctx[n] = kernel_context(i915);
+		if (!ctx[n])
+			return -ENOMEM;
+
+		ve[n] = intel_execlists_create_virtual(ctx[n],
+						       siblings, nsibling);
+		if (IS_ERR(ve[n]))
+			return PTR_ERR(ve[n]);
+	}
+
+	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
+	if (err)
+		goto out;
+
+	for_each_prime_number_from(prime, 1, 8192) {
+		times[1] = ktime_get_raw();
+
+		if (flags & CHAIN) {
+			for (nc = 0; nc < nctx; nc++) {
+				for (n = 0; n < prime; n++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		} else {
+			for (n = 0; n < prime; n++) {
+				for (nc = 0; nc < nctx; nc++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		}
+
+		for (nc = 0; nc < nctx; nc++) {
+			if (i915_request_wait(request[nc],
+					      I915_WAIT_LOCKED,
+					      HZ / 10) < 0) {
+				pr_err("%s(%s): wait for %llx:%lld timed out\n",
+				       __func__, ve[0]->name,
+				       request[nc]->fence.context,
+				       request[nc]->fence.seqno);
+
+				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
+					  __func__, ve[0]->name,
+					  request[nc]->fence.context,
+					  request[nc]->fence.seqno);
+				GEM_TRACE_DUMP();
+				i915_gem_set_wedged(i915);
+				break;
+			}
+		}
+
+		times[1] = ktime_sub(ktime_get_raw(), times[1]);
+		if (prime == 1)
+			times[0] = times[1];
+
+		if (__igt_timeout(end_time, NULL))
+			break;
+	}
+
+	err = igt_live_test_end(&t);
+	if (err)
+		goto out;
+
+	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
+		nctx, ve[0]->name, ktime_to_ns(times[0]),
+		prime, div64_u64(ktime_to_ns(times[1]), prime));
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (nc = 0; nc < nctx; nc++) {
+		intel_virtual_engine_destroy(ve[nc]);
+		kernel_context_close(ctx[nc]);
+	}
+	return err;
+}
+
+static int live_virtual_engine(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	unsigned int class, inst;
+	int err = -ENODEV;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for_each_engine(engine, i915, id) {
+		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
+		if (err) {
+			pr_err("Failed to wrap engine %s: err=%d\n",
+			       engine->name, err);
+			goto out_unlock;
+		}
+	}
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		int nsibling, n;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (n = 1; n <= nsibling + 1; n++) {
+			err = nop_virtual_engine(i915, siblings, nsibling,
+						 n, 0);
+			if (err)
+				goto out_unlock;
+		}
+
+		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
+		if (err)
+			goto out_unlock;
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1051,6 +1215,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
+		SUBTEST(live_virtual_engine),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 642af28ea6d3..47d2f56b6d90 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -123,6 +123,7 @@ enum drm_i915_gem_engine_class {
 };
 
 #define I915_ENGINE_CLASS_INVALID_NONE -1
+#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
 
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
@@ -1567,8 +1568,34 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+/*
+ * i915_context_engines_load_balance:
+ *
+ * Enable load balancing across this set of engines.
+ *
+ * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
+ * used will proxy the execbuffer request onto one of the set of engines
+ * in such a way as to distribute the load evenly across the set.
+ *
+ * The set of engines must be compatible (e.g. the same HW class) as they
+ * will share the same logical GPU context and ring.
+ *
+ * To intermix rendering with the virtual engine and direct rendering onto
+ * the backing engines (bypassing the load balancing proxy), the context must
+ * be defined to use a single timeline for all engines.
+ */
+struct i915_context_engines_load_balance {
+	struct i915_user_extension base;
+
+	__u64 flags; /* all undefined flags must be zero */
+	__u64 engines_mask; /* selection mask of engines[] */
+
+	__u64 mbz[4]; /* reserved for future use; must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 44/46] drm/i915: Extend execution fence to support a callback
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (42 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 43/46] drm/i915: Load balancing across a virtual engine Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 45/46] drm/i915/execlists: Virtual engine bonding Chris Wilson
                   ` (8 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 84 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h |  4 ++
 2 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index dc79d57ffb84..ace4fd763cac 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -37,6 +37,8 @@ struct execute_cb {
 	struct list_head link;
 	struct irq_work work;
 	struct i915_sw_fence *fence;
+	void (*hook)(struct i915_request *rq, struct dma_fence *signal);
+	struct i915_request *signal;
 };
 
 static struct i915_global_request {
@@ -343,6 +345,17 @@ static void irq_execute_cb(struct irq_work *wrk)
 	kmem_cache_free(global.slab_execute_cbs, cb);
 }
 
+static void irq_execute_cb_hook(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	cb->hook(container_of(cb->fence, struct i915_request, submit),
+		 &cb->signal->fence);
+	i915_request_put(cb->signal);
+
+	irq_execute_cb(wrk);
+}
+
 static void __notify_execute_cb(struct i915_request *rq)
 {
 	struct execute_cb *cb;
@@ -369,14 +382,19 @@ static void __notify_execute_cb(struct i915_request *rq)
 }
 
 static int
-i915_request_await_execution(struct i915_request *rq,
-			     struct i915_request *signal,
-			     gfp_t gfp)
+__i915_request_await_execution(struct i915_request *rq,
+			       struct i915_request *signal,
+			       void (*hook)(struct i915_request *rq,
+					    struct dma_fence *signal),
+			       gfp_t gfp)
 {
 	struct execute_cb *cb;
 
-	if (i915_request_is_active(signal))
+	if (i915_request_is_active(signal)) {
+		if (hook)
+			hook(rq, &signal->fence);
 		return 0;
+	}
 
 	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
 	if (!cb)
@@ -386,8 +404,18 @@ i915_request_await_execution(struct i915_request *rq,
 	i915_sw_fence_await(cb->fence);
 	init_irq_work(&cb->work, irq_execute_cb);
 
+	if (hook) {
+		cb->hook = hook;
+		cb->signal = i915_request_get(signal);
+		cb->work.func = irq_execute_cb_hook;
+	}
+
 	spin_lock_irq(&signal->lock);
 	if (i915_request_is_active(signal)) {
+		if (hook) {
+			hook(rq, &signal->fence);
+			i915_request_put(signal);
+		}
 		i915_sw_fence_complete(cb->fence);
 		kmem_cache_free(global.slab_execute_cbs, cb);
 	} else {
@@ -779,7 +807,7 @@ emit_semaphore_wait(struct i915_request *to,
 		return err;
 
 	/* Only submit our spinner after the signaler is running! */
-	err = i915_request_await_execution(to, from, gfp);
+	err = __i915_request_await_execution(to, from, NULL, gfp);
 	if (err)
 		return err;
 
@@ -899,6 +927,52 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 	return 0;
 }
 
+int
+i915_request_await_execution(struct i915_request *rq,
+			     struct dma_fence *fence,
+			     void (*hook)(struct i915_request *rq,
+					  struct dma_fence *signal))
+{
+	struct dma_fence **child = &fence;
+	unsigned int nchild = 1;
+	int ret;
+
+	if (dma_fence_is_array(fence)) {
+		struct dma_fence_array *array = to_dma_fence_array(fence);
+
+		/* XXX Error for signal-on-any fence arrays */
+
+		child = array->fences;
+		nchild = array->num_fences;
+		GEM_BUG_ON(!nchild);
+	}
+
+	do {
+		fence = *child++;
+		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+			continue;
+
+		/*
+		 * We don't squash repeated fence dependencies here as we
+		 * want to run our callback in all cases.
+		 */
+
+		if (dma_fence_is_i915(fence))
+			ret = __i915_request_await_execution(rq,
+							     to_request(fence),
+							     hook,
+							     I915_FENCE_GFP);
+		else
+			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
+							    I915_FENCE_TIMEOUT,
+							    GFP_KERNEL);
+		if (ret < 0)
+			return ret;
+	} while (--nchild);
+
+	return 0;
+}
+
 /**
  * i915_request_await_object - set this request to (async) wait upon a bo
  * @to: request we are wishing to use
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 35153fa52c8c..b045948da12d 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -254,6 +254,10 @@ int i915_request_await_object(struct i915_request *to,
 			      bool write);
 int i915_request_await_dma_fence(struct i915_request *rq,
 				 struct dma_fence *fence);
+int i915_request_await_execution(struct i915_request *rq,
+				 struct dma_fence *fence,
+				 void (*hook)(struct i915_request *rq,
+					      struct dma_fence *signal));
 
 void i915_request_add(struct i915_request *rq);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 45/46] drm/i915/execlists: Virtual engine bonding
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (43 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 44/46] drm/i915: Extend execution fence to support a callback Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:03 ` [PATCH 46/46] drm/i915: Allow specification of parallel execbuf Chris Wilson
                   ` (7 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

Some users require that when a master batch is executed on one particular
engine, a companion batch is run simultaneously on a specific slave
engine. For this purpose, we introduce virtual engine bonding, allowing
maps of master:slaves to be constructed to constrain which physical
engines a virtual engine may select given a fence on a master engine.

For the moment, we continue to ignore the issue of preemption deferring
the master request for later. Ideally, we would like to then also remove
the slave and run something else rather than have it stall the pipeline.
With load balancing, we should be able to move workload around it, but
there is a similar stall on the master pipeline while it may wait for
the slave to be executed. At the cost of more latency for the bonded
request, it may be interesting to launch both on their engines in
lockstep. (Bubbles abound.)

Opens: Also what about bonding an engine as its own master? It doesn't
break anything internally, so allow the silliness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c    |  45 ++++++
 drivers/gpu/drm/i915/i915_request.c        |   1 +
 drivers/gpu/drm/i915/i915_request.h        |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   7 +
 drivers/gpu/drm/i915/intel_lrc.c           |  97 ++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h           |   3 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 167 +++++++++++++++++++++
 include/uapi/drm/i915_drm.h                |  18 +++
 8 files changed, 339 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2b474e72f9e2..f4279c88cd66 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1342,6 +1342,16 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 };
 
+static int check_user_mbz32(u32 __user *user)
+{
+	u32 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
 static int check_user_mbz64(u64 __user *user)
 {
 	u64 mbz;
@@ -1425,8 +1435,43 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 	return 0;
 }
 
+static int
+set_engines__bond(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_bond __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *master;
+	u32 class, instance, siblings;
+	int err;
+
+	if (!set->engines[0])
+		return -EINVAL;
+
+	err = check_user_mbz32(&ext->flags);
+	if (err)
+		return err;
+
+	if (get_user(class, &ext->master_class))
+		return -EFAULT;
+
+	if (get_user(instance, &ext->master_instance))
+		return -EFAULT;
+
+	master = intel_engine_lookup_user(set->ctx->i915, class, instance);
+	if (!master)
+		return -EINVAL;
+
+	if (get_user(siblings, &ext->sibling_mask))
+		return -EFAULT;
+
+	return intel_virtual_engine_attach_bond(set->engines[0],
+						master, siblings);
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
+	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index ace4fd763cac..225640d0ed31 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -734,6 +734,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->batch = NULL;
 	rq->capture_list = NULL;
 	rq->waitboost = false;
+	rq->execution_mask = ~0u;
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index b045948da12d..b9aae799fbd6 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -144,6 +144,7 @@ struct i915_request {
 	 */
 	struct i915_sched_node sched;
 	struct i915_dependency dep;
+	unsigned int execution_mask;
 
 	/*
 	 * A convenience pointer to the current breadcrumb value stored in
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 1cf2740ca60e..ab1e9e927d7d 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -384,6 +384,13 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct i915_request *rq);
 
+	/*
+	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
+	 * request down to the bonded pairs.
+	 */
+	void            (*bond_execute)(struct i915_request *rq,
+					struct dma_fence *signal);
+
 	/*
 	 * Call when the priority on a request has changed and it and its
 	 * dependencies may need rescheduling. Note the request itself may
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0b25fffdb8bc..ba3605001cf0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -178,6 +178,12 @@ struct virtual_engine {
 		int prio;
 	} nodes[I915_NUM_ENGINES];
 
+	struct ve_bond {
+		struct intel_engine_cs *master;
+		unsigned int sibling_mask;
+	} *bonds;
+	unsigned int nbond;
+
 	unsigned int count;
 	struct intel_engine_cs *siblings[0];
 };
@@ -3196,6 +3202,7 @@ virtual_context_pin(struct intel_engine_cs *engine,
 static void virtual_submission_tasklet(unsigned long data)
 {
 	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int mask;
 	unsigned int n;
 	int prio;
 
@@ -3204,12 +3211,30 @@ static void virtual_submission_tasklet(unsigned long data)
 		return;
 
 	local_irq_disable();
+
+	mask = 0;
+	spin_lock(&ve->base.timeline.lock);
+	if (ve->request)
+		mask = ve->request->execution_mask;
+	spin_unlock(&ve->base.timeline.lock);
+
 	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
 		struct intel_engine_cs *sibling = ve->siblings[n];
 		struct ve_node * const node = &ve->nodes[sibling->id];
 		struct rb_node **parent, *rb;
 		bool first;
 
+		if (unlikely(!(mask & sibling->mask))) {
+			if (!RB_EMPTY_NODE(&node->rb)) {
+				spin_lock(&sibling->timeline.lock);
+				rb_erase_cached(&node->rb,
+						&sibling->execlists.virtual);
+				RB_CLEAR_NODE(&node->rb);
+				spin_unlock(&sibling->timeline.lock);
+			}
+			continue;
+		}
+
 		spin_lock(&sibling->timeline.lock);
 
 		if (!RB_EMPTY_NODE(&node->rb)) {
@@ -3267,6 +3292,30 @@ static void virtual_submit_request(struct i915_request *request)
 	tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
+static struct ve_bond *
+virtual_find_bond(struct virtual_engine *ve, struct intel_engine_cs *master)
+{
+	int i;
+
+	for (i = 0; i < ve->nbond; i++) {
+		if (ve->bonds[i].master == master)
+			return &ve->bonds[i];
+	}
+
+	return NULL;
+}
+
+static void
+virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
+{
+	struct virtual_engine *ve = to_virtual_engine(rq->engine);
+	struct ve_bond *bond;
+
+	bond = virtual_find_bond(ve, to_request(signal)->engine);
+	if (bond) /* XXX serialise with rq->lock? */
+		rq->execution_mask &= bond->sibling_mask;
+}
+
 struct intel_engine_cs *
 intel_execlists_create_virtual(struct i915_gem_context *ctx,
 			       struct intel_engine_cs **siblings,
@@ -3308,6 +3357,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 
 	ve->base.schedule = i915_schedule;
 	ve->base.submit_request = virtual_submit_request;
+	ve->base.bond_execute = virtual_bond_execute;
 
 	ve->base.execlists.queue_priority_hint = INT_MIN;
 	tasklet_init(&ve->base.execlists.tasklet,
@@ -3368,6 +3418,53 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 	return ERR_PTR(err);
 }
 
+static unsigned long
+virtual_execution_mask(struct virtual_engine *ve, unsigned long mask)
+{
+	unsigned long emask = 0;
+	int bit;
+
+	for_each_set_bit(bit, &mask, ve->count)
+		emask |= ve->siblings[bit]->mask;
+
+	return emask;
+}
+
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long mask)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct ve_bond *bond;
+
+	if (mask >> ve->count)
+		return -EINVAL;
+
+	mask = virtual_execution_mask(ve, mask);
+	if (!mask)
+		return -EINVAL;
+
+	bond = virtual_find_bond(ve, master);
+	if (bond) {
+		bond->sibling_mask |= mask;
+		return 0;
+	}
+
+	bond = krealloc(ve->bonds,
+			sizeof(*bond) * (ve->nbond + 1),
+			GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond[ve->nbond].master = master;
+	bond[ve->nbond].sibling_mask = mask;
+
+	ve->bonds = bond;
+	ve->nbond++;
+
+	return 0;
+}
+
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
 {
 	struct virtual_engine *ve = to_virtual_engine(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index c6f441137e3f..8622dbfd8c5c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -116,6 +116,9 @@ struct intel_engine_cs *
 intel_execlists_create_virtual(struct i915_gem_context *ctx,
 			       struct intel_engine_cs **siblings,
 			       unsigned int count);
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long siblings);
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
 
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index e85184e084e4..f5b69d6b4585 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -13,6 +13,7 @@
 #include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
 
@@ -1204,6 +1205,171 @@ static int live_virtual_engine(void *arg)
 	return err;
 }
 
+static int bond_virtual_engine(struct drm_i915_private *i915,
+			       unsigned int class,
+			       struct intel_engine_cs **siblings,
+			       unsigned int nsibling,
+			       unsigned int flags)
+#define BOND_SCHEDULE BIT(0)
+{
+	struct intel_engine_cs *master;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq[16];
+	enum intel_engine_id id;
+	unsigned long n;
+	int err;
+
+	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
+
+	ctx = kernel_context(i915);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = 0;
+	rq[0] = ERR_PTR(-ENOMEM);
+	for_each_engine(master, i915, id) {
+		struct i915_sw_fence fence;
+
+		if (master->class == class)
+			continue;
+
+		rq[0] = i915_request_alloc(master, ctx);
+		if (IS_ERR(rq[0])) {
+			err = PTR_ERR(rq[0]);
+			goto out;
+		}
+
+		if (flags & BOND_SCHEDULE)
+			onstack_fence_init(&fence);
+
+		i915_request_get(rq[0]);
+		i915_request_add(rq[0]);
+
+		for (n = 0; n < nsibling; n++) {
+			struct intel_engine_cs *engine;
+
+			engine = intel_execlists_create_virtual(ctx,
+								siblings,
+								nsibling);
+			if (IS_ERR(engine)) {
+				err = PTR_ERR(engine);
+				goto out;
+			}
+
+			err = intel_virtual_engine_attach_bond(engine,
+							       master,
+							       BIT(n));
+			if (err) {
+				intel_virtual_engine_destroy(engine);
+				goto out;
+			}
+
+			rq[n + 1] = i915_request_alloc(engine, ctx);
+			if (IS_ERR(rq[n + 1])) {
+				err = PTR_ERR(rq[n + 1]);
+				intel_virtual_engine_destroy(engine);
+				goto out;
+			}
+			i915_request_get(rq[n + 1]);
+
+			err = i915_request_await_execution(rq[n + 1],
+							   &rq[0]->fence,
+							   engine->bond_execute);
+			i915_request_add(rq[n + 1]);
+			intel_virtual_engine_destroy(engine);
+			if (err < 0)
+				goto out;
+		}
+		rq[n + 1] = ERR_PTR(-EINVAL);
+
+		if (flags & BOND_SCHEDULE)
+			onstack_fence_fini(&fence);
+
+		for (n = 0; n < nsibling; n++) {
+			if (i915_request_wait(rq[n + 1],
+					      I915_WAIT_LOCKED,
+					      MAX_SCHEDULE_TIMEOUT) < 0) {
+				err = -EIO;
+				goto out;
+			}
+
+			if (rq[n + 1]->engine != siblings[n]) {
+				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
+				       siblings[n]->name,
+				       rq[n + 1]->engine->name,
+				       rq[0]->engine->name);
+				err = -EINVAL;
+				goto out;
+			}
+		}
+
+		for (n = 0; !IS_ERR(rq[n]); n++)
+			i915_request_put(rq[n]);
+		rq[0] = ERR_PTR(-ENOMEM);
+	}
+
+out:
+	for (n = 0; !IS_ERR(rq[n]); n++)
+		i915_request_put(rq[n]);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	kernel_context_close(ctx);
+	return err;
+}
+
+static int live_virtual_bond(void *arg)
+{
+	static const struct phase {
+		const char *name;
+		unsigned int flags;
+	} phases[] = {
+		{ "", 0 },
+		{ "schedule", BOND_SCHEDULE },
+		{ },
+	};
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	unsigned int class, inst;
+	int err = 0;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		const struct phase *p;
+		int nsibling;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			GEM_BUG_ON(nsibling == ARRAY_SIZE(siblings));
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (p = phases; p->name; p++) {
+			err = bond_virtual_engine(i915,
+						  class, siblings, nsibling,
+						  p->flags);
+			if (err) {
+				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
+				       __func__, p->name, class, nsibling, err);
+				goto out_unlock;
+			}
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1216,6 +1382,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
+		SUBTEST(live_virtual_bond),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 47d2f56b6d90..4ef06a9e6791 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1502,6 +1502,10 @@ struct drm_i915_gem_context_param {
  * sized argument, will revert back to default settings.
  *
  * See struct i915_context_param_engines.
+ *
+ * Extensions:
+ *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
+ *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
  */
 #define I915_CONTEXT_PARAM_ENGINES	0x9
 
@@ -1593,9 +1597,23 @@ struct i915_context_engines_load_balance {
 	__u64 mbz[4]; /* reserved for future use; must be zero */
 };
 
+/*
+ * i915_context_engines_bond:
+ *
+ */
+struct i915_context_engines_bond {
+	struct i915_user_extension base;
+
+	__u16 master_class;
+	__u16 master_instance;
+	__u32 flags; /* all undefined flags must be zero */
+	__u64 sibling_mask;
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+#define I915_CONTEXT_ENGINES_EXT_BOND 1
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* [PATCH 46/46] drm/i915: Allow specification of parallel execbuf
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (44 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 45/46] drm/i915/execlists: Virtual engine bonding Chris Wilson
@ 2019-02-06 13:03 ` Chris Wilson
  2019-02-06 13:52 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs Patchwork
                   ` (6 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 13:03 UTC (permalink / raw)
  To: intel-gfx

There is a desire to split a task onto two engines and have them run at
the same time, e.g. scanline interleaving to spread the workload evenly.
Through the use of the out-fence from the first execbuf, we can
coordinate secondary execbuf to only become ready simultaneously with
the first, so that with all things idle the second execbufs are executed
in parallel with the first. The key difference here between the new
EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
waits for the completion of the first request (so that all of its
rendering results are visible to the second execbuf, the more common
userspace fence requirement).

Since we only have a single input fence slot, userspace cannot mix an
in-fence and a submit-fence. It has to use one or the other! This is not
such a harsh requirement, since by virtue of the submit-fence, the
secondary execbuf inherit all of the dependencies from the first
request, and for the application the dependencies should be common
between the primary and secondary execbuf.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_fence/parallel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++++++++++++++++-
 include/uapi/drm/i915_drm.h                | 18 +++++++++++++++-
 3 files changed, 42 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index fc11460f8327..4939c72ef283 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -419,6 +419,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_CAPTURE:
 	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
 	case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
+	case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index cb57178b8fe3..4c72db24582b 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2281,6 +2281,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 {
 	struct i915_execbuffer eb;
 	struct dma_fence *in_fence = NULL;
+	struct dma_fence *exec_fence = NULL;
 	struct sync_file *out_fence = NULL;
 	intel_wakeref_t wakeref;
 	int out_fence_fd = -1;
@@ -2324,11 +2325,24 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			return -EINVAL;
 	}
 
+	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
+		if (in_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+
+		exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!exec_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+	}
+
 	if (args->flags & I915_EXEC_FENCE_OUT) {
 		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
 		if (out_fence_fd < 0) {
 			err = out_fence_fd;
-			goto err_in_fence;
+			goto err_exec_fence;
 		}
 	}
 
@@ -2460,6 +2474,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_request;
 	}
 
+	if (exec_fence) {
+		err = i915_request_await_execution(eb.request, exec_fence,
+						   eb.engine->bond_execute);
+		if (err < 0)
+			goto err_request;
+	}
+
 	if (fences) {
 		err = await_fence_array(&eb, fences);
 		if (err)
@@ -2520,6 +2541,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_out_fence:
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
+err_exec_fence:
+	dma_fence_put(exec_fence);
 err_in_fence:
 	dma_fence_put(in_fence);
 	return err;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4ef06a9e6791..0efc3894a9b4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -589,6 +589,13 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_MMAP_GTT_COHERENT	52
 
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
+ * execution through use of explicit fence support.
+ * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
+ */
+#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
+
 typedef struct drm_i915_getparam {
 	__s32 param;
 	/*
@@ -1108,7 +1115,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/*
+ * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_SUBMIT		(1<<20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT<<1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (45 preceding siblings ...)
  2019-02-06 13:03 ` [PATCH 46/46] drm/i915: Allow specification of parallel execbuf Chris Wilson
@ 2019-02-06 13:52 ` Patchwork
  2019-02-06 14:09 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (5 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 13:52 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs
URL   : https://patchwork.freedesktop.org/series/56281/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
ac96dd19c008 drm/i915: Hack and slash, throttle execbuffer hogs
3de9e2816fae drm/i915: Revoke mmaps and prevent access to fence registers across reset
3475d201b55a drm/i915: Force the GPU reset upon wedging
29cb7893fdc1 drm/i915: Uninterruptibly drain the timelines on unwedging
5b1661925cd0 drm/i915: Wait for old resets before applying debugfs/i915_wedged
43b5dd7698b9 drm/i915: Serialise resets with wedging
122d0f96c087 drm/i915: Don't claim an unstarted request was guilty
b4cb5ec9e45c drm/i915/execlists: Suppress mere WAIT preemption
226a6af19adb drm/i915/execlists: Suppress redundant preemption
09caab95cfb8 drm/i915: Make request allocation caches global
-:162: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#162: 
new file mode 100644

-:167: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#167: FILE: drivers/gpu/drm/i915/i915_globals.c:1:
+/*

-:278: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#278: FILE: drivers/gpu/drm/i915/i915_globals.h:1:
+/*

-:619: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#619: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:619: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#619: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:623: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#623: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

-:623: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#623: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

total: 0 errors, 3 warnings, 4 checks, 803 lines checked
6c4f49593cf9 drm/i915: Keep timeline HWSP allocated until idle across the system
8c07b4664e9f drm/i915/execlists: Refactor out can_merge_rq()
7ec5601b4bc4 drm/i915: Compute the global scheduler caps
35a5590cea9c drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:333: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#333: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:335: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#335: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:336: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#336: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:337: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#337: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:338: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#338: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 298 lines checked
f3a98831e52f drm/i915: Prioritise non-busywait semaphore workloads
1c1693efd3a6 drm/i915: Show support for accurate sw PMU busyness tracking
0367443cbf81 drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
e975d7398521 drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
9fa0d24d624d drm/i915/pmu: Always sample an active ringbuffer
23cd029b714c drm/i915: Remove access to global seqno in the HWSP
091cecde28d1 drm/i915: Remove i915_request.global_seqno
d043df80f2f1 drm/i915: Force GPU idle on suspend
c188a67c1645 drm/i915/selftests: Improve switch-to-kernel-context checking
dc4d86529875 drm/i915: Do a synchronous switch-to-kernel-context on idling
9c2e9cb521f5 drm/i915: Store the BIT(engine->id) as the engine's mask
248ff0cf1731 drm/i915: Refactor common code to load initial power context
0b6fd6b5f721 drm/i915: Reduce presumption of request ordering for barriers
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

-:265: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 203 lines checked
fe05994e0207 drm/i915: Remove has-kernel-context
8483858d6292 drm/i915: Introduce the i915_user_extension_method
-:58: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#58: 
new file mode 100644

-:63: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#63: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:1:
+/*

-:112: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#112: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:1:
+/*

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

total: 0 errors, 3 warnings, 3 checks, 109 lines checked
3d21741d868c drm/i915: Track active engines within a context
52b672c4e222 drm/i915: Introduce a context barrier callback
34e46f094c05 drm/i915: Create/destroy VM (ppGTT) for use with contexts
-:37: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#37: FILE: drivers/gpu/drm/i915/i915_drv.h:220:
+	struct mutex vm_lock;

-:551: WARNING:LINE_SPACING: Missing a blank line after declarations
#551: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:503:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:627: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#627: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:
+		ncontexts = dw = 0;

-:678: WARNING:LINE_SPACING: Missing a blank line after declarations
#678: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:610:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:758: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#758: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:686:
+		ncontexts = dw = 0;

-:876: WARNING:LONG_LINE: line over 100 characters
#876: FILE: include/uapi/drm/i915_drm.h:402:
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)

-:877: WARNING:LONG_LINE: line over 100 characters
#877: FILE: include/uapi/drm/i915_drm.h:403:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:877: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#877: FILE: include/uapi/drm/i915_drm.h:403:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:877: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#877: FILE: include/uapi/drm/i915_drm.h:403:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

total: 1 errors, 5 warnings, 3 checks, 832 lines checked
1ee0726d9413 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
-:26: WARNING:LONG_LINE: line over 100 characters
#26: FILE: drivers/gpu/drm/i915/i915_drv.c:2997:
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),

-:522: WARNING:LONG_LINE: line over 100 characters
#522: FILE: include/uapi/drm/i915_drm.h:392:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:522: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#522: FILE: include/uapi/drm/i915_drm.h:392:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:522: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#522: FILE: include/uapi/drm/i915_drm.h:392:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

total: 1 errors, 3 warnings, 0 checks, 666 lines checked
fb3ec32c9322 drm/i915: Allow contexts to share a single timeline across all engines
47c2db7f9d3b drm/i915: Fix I915_EXEC_RING_MASK
b29f6ab04471 drm/i915: Remove last traces of exec-id (GEM_BUSY)
765bcde8c3fc drm/i915: Re-arrange execbuf so context is known before engine
02a4d497267c drm/i915: Allow a context to define its set of engines
7f00e42a0d67 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
25327eaad5f2 drm/i915: Pass around the intel_context
a5521a6cf918 drm/i915: Split struct intel_context definition to its own header
-:291: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#291: 
new file mode 100644

-:296: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#296: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:1:
+/*

-:557: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#557: FILE: drivers/gpu/drm/i915/i915_timeline_types.h:1:
+/*

-:581: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#581: FILE: drivers/gpu/drm/i915/i915_timeline_types.h:25:
+	spinlock_t lock;

-:642: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#642: FILE: drivers/gpu/drm/i915/intel_context.h:1:
+/*

-:695: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#695: FILE: drivers/gpu/drm/i915/intel_context_types.h:1:
+/*

-:761: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#761: FILE: drivers/gpu/drm/i915/intel_engine_types.h:1:
+/*

-:1048: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#1048: FILE: drivers/gpu/drm/i915/intel_engine_types.h:288:
+		spinlock_t irq_lock;

-:1264: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1264: FILE: drivers/gpu/drm/i915/intel_engine_types.h:504:
+#define instdone_slice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)

-:1268: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1268: FILE: drivers/gpu/drm/i915/intel_engine_types.h:508:
+#define instdone_subslice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])

-:1272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1272: FILE: drivers/gpu/drm/i915/intel_engine_types.h:512:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'slice__' - possible side-effects?
#1272: FILE: drivers/gpu/drm/i915/intel_engine_types.h:512:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'subslice__' - possible side-effects?
#1272: FILE: drivers/gpu/drm/i915/intel_engine_types.h:512:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1853: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#1853: FILE: drivers/gpu/drm/i915/intel_workarounds_types.h:1:
+/*

total: 0 errors, 7 warnings, 7 checks, 1796 lines checked
966d3a47e291 drm/i915: Move over to intel_context_lookup()
-:245: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#245: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:140:
+	spinlock_t hw_contexts_lock;

-:283: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#283: 
new file mode 100644

-:288: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#288: FILE: drivers/gpu/drm/i915/intel_context.c:1:
+/*

total: 0 errors, 2 warnings, 1 checks, 631 lines checked
383cf687e00d drm/i915: Load balancing across a virtual engine
-:825: WARNING:LINE_SPACING: Missing a blank line after declarations
#825: FILE: drivers/gpu/drm/i915/intel_lrc.c:3359:
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);

total: 0 errors, 1 warnings, 0 checks, 1016 lines checked
439c65f993ed drm/i915: Extend execution fence to support a callback
b97f756efdaf drm/i915/execlists: Virtual engine bonding
a6bd3149359a drm/i915: Allow specification of parallel execbuf
-:132: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#132: FILE: include/uapi/drm/i915_drm.h:1125:
+#define I915_EXEC_FENCE_SUBMIT		(1<<20)
                               		  ^

-:134: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#134: FILE: include/uapi/drm/i915_drm.h:1127:
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT<<1))
                                                            ^

total: 0 errors, 0 warnings, 2 checks, 90 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (46 preceding siblings ...)
  2019-02-06 13:52 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs Patchwork
@ 2019-02-06 14:09 ` Patchwork
  2019-02-06 14:11 ` ✗ Fi.CI.SPARSE: warning " Patchwork
                   ` (4 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 14:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs
URL   : https://patchwork.freedesktop.org/series/56281/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5551 -> Patchwork_12154
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12154 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12154, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56281/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12154:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live_execlists:
    - fi-cfl-8700k:       PASS -> DMESG-FAIL
    - fi-kbl-7567u:       PASS -> DMESG-FAIL
    - fi-skl-guc:         PASS -> DMESG-FAIL
    - fi-glk-j4005:       PASS -> DMESG-FAIL
    - fi-cfl-guc:         PASS -> DMESG-FAIL
    - fi-skl-iommu:       PASS -> DMESG-FAIL
    - fi-skl-gvtdvm:      PASS -> DMESG-FAIL
    - fi-bxt-j4205:       PASS -> DMESG-FAIL
    - fi-skl-6700hq:      PASS -> DMESG-FAIL
    - fi-kbl-7500u:       PASS -> DMESG-FAIL
    - fi-kbl-guc:         PASS -> DMESG-FAIL
    - fi-kbl-8809g:       PASS -> DMESG-FAIL
    - fi-kbl-x1275:       PASS -> DMESG-FAIL
    - fi-skl-6600u:       PASS -> DMESG-FAIL
    - fi-skl-6700k2:      PASS -> DMESG-FAIL
    - fi-skl-6260u:       PASS -> DMESG-FAIL
    - fi-kbl-7560u:       PASS -> DMESG-FAIL
    - fi-skl-6770hq:      PASS -> DMESG-FAIL
    - fi-kbl-r:           PASS -> DMESG-FAIL
    - fi-cfl-8109u:       PASS -> DMESG-FAIL

  * igt@i915_selftest@live_requests:
    - fi-apl-guc:         PASS -> DMESG-FAIL +1

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live_execlists:
    - {fi-icl-u2}:        PASS -> DMESG-FAIL
    - {fi-whl-u}:         PASS -> DMESG-FAIL

  * {igt@runner@aborted}:
    - fi-bsw-kefka:       NOTRUN -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12154 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         PASS -> FAIL [fdo#103182]

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-b:
    - fi-blb-e6850:       NOTRUN -> INCOMPLETE [fdo#107718]

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       FAIL [fdo#109485] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108654]: https://bugs.freedesktop.org/show_bug.cgi?id=108654
  [fdo#108756]: https://bugs.freedesktop.org/show_bug.cgi?id=108756
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109485]: https://bugs.freedesktop.org/show_bug.cgi?id=109485


Participating hosts (48 -> 45)
------------------------------

  Additional (1): fi-ivb-3770 
  Missing    (4): fi-icl-y fi-ilk-m540 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5551 -> Patchwork_12154

  CI_DRM_5551: 417d0e0cd0275705aed001d938e646879ee5afe9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4812: 592b854fead32c2b0dac7198edfb9a6bffd66932 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12154: a6bd3149359a8337a76676e2df7027bf8917c0b2 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

a6bd3149359a drm/i915: Allow specification of parallel execbuf
b97f756efdaf drm/i915/execlists: Virtual engine bonding
439c65f993ed drm/i915: Extend execution fence to support a callback
383cf687e00d drm/i915: Load balancing across a virtual engine
966d3a47e291 drm/i915: Move over to intel_context_lookup()
a5521a6cf918 drm/i915: Split struct intel_context definition to its own header
25327eaad5f2 drm/i915: Pass around the intel_context
7f00e42a0d67 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
02a4d497267c drm/i915: Allow a context to define its set of engines
765bcde8c3fc drm/i915: Re-arrange execbuf so context is known before engine
b29f6ab04471 drm/i915: Remove last traces of exec-id (GEM_BUSY)
47c2db7f9d3b drm/i915: Fix I915_EXEC_RING_MASK
fb3ec32c9322 drm/i915: Allow contexts to share a single timeline across all engines
1ee0726d9413 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
34e46f094c05 drm/i915: Create/destroy VM (ppGTT) for use with contexts
52b672c4e222 drm/i915: Introduce a context barrier callback
3d21741d868c drm/i915: Track active engines within a context
8483858d6292 drm/i915: Introduce the i915_user_extension_method
fe05994e0207 drm/i915: Remove has-kernel-context
0b6fd6b5f721 drm/i915: Reduce presumption of request ordering for barriers
248ff0cf1731 drm/i915: Refactor common code to load initial power context
9c2e9cb521f5 drm/i915: Store the BIT(engine->id) as the engine's mask
dc4d86529875 drm/i915: Do a synchronous switch-to-kernel-context on idling
c188a67c1645 drm/i915/selftests: Improve switch-to-kernel-context checking
d043df80f2f1 drm/i915: Force GPU idle on suspend
091cecde28d1 drm/i915: Remove i915_request.global_seqno
23cd029b714c drm/i915: Remove access to global seqno in the HWSP
9fa0d24d624d drm/i915/pmu: Always sample an active ringbuffer
e975d7398521 drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
0367443cbf81 drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
1c1693efd3a6 drm/i915: Show support for accurate sw PMU busyness tracking
f3a98831e52f drm/i915: Prioritise non-busywait semaphore workloads
35a5590cea9c drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
7ec5601b4bc4 drm/i915: Compute the global scheduler caps
8c07b4664e9f drm/i915/execlists: Refactor out can_merge_rq()
6c4f49593cf9 drm/i915: Keep timeline HWSP allocated until idle across the system
09caab95cfb8 drm/i915: Make request allocation caches global
226a6af19adb drm/i915/execlists: Suppress redundant preemption
b4cb5ec9e45c drm/i915/execlists: Suppress mere WAIT preemption
122d0f96c087 drm/i915: Don't claim an unstarted request was guilty
43b5dd7698b9 drm/i915: Serialise resets with wedging
5b1661925cd0 drm/i915: Wait for old resets before applying debugfs/i915_wedged
29cb7893fdc1 drm/i915: Uninterruptibly drain the timelines on unwedging
3475d201b55a drm/i915: Force the GPU reset upon wedging
3de9e2816fae drm/i915: Revoke mmaps and prevent access to fence registers across reset
ac96dd19c008 drm/i915: Hack and slash, throttle execbuffer hogs

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12154/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (47 preceding siblings ...)
  2019-02-06 14:09 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2019-02-06 14:11 ` Patchwork
  2019-02-06 14:37 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2) Patchwork
                   ` (3 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 14:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs
URL   : https://patchwork.freedesktop.org/series/56281/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Hack and slash, throttle execbuffer hogs
Okay!

Commit: drm/i915: Revoke mmaps and prevent access to fence registers across reset
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_reset.c:1302:5: warning: context imbalance in 'i915_reset_trylock' - different lock contexts for basic block
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3565:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3559:16: warning: expression using sizeof(void)

Commit: drm/i915: Force the GPU reset upon wedging
Okay!

Commit: drm/i915: Uninterruptibly drain the timelines on unwedging
Okay!

Commit: drm/i915: Wait for old resets before applying debugfs/i915_wedged
Okay!

Commit: drm/i915: Serialise resets with wedging
Okay!

Commit: drm/i915: Don't claim an unstarted request was guilty
Okay!

Commit: drm/i915/execlists: Suppress mere WAIT preemption
Okay!

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915: Make request allocation caches global
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3559:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3556:16: warning: expression using sizeof(void)

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Compute the global scheduler caps
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915: Show support for accurate sw PMU busyness tracking
Okay!

Commit: drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3556:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3551:16: warning: expression using sizeof(void)

Commit: drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
Okay!

Commit: drm/i915/pmu: Always sample an active ringbuffer
Okay!

Commit: drm/i915: Remove access to global seqno in the HWSP
Okay!

Commit: drm/i915: Remove i915_request.global_seqno
Okay!

Commit: drm/i915: Force GPU idle on suspend
Okay!

Commit: drm/i915/selftests: Improve switch-to-kernel-context checking
Okay!

Commit: drm/i915: Do a synchronous switch-to-kernel-context on idling
Okay!

Commit: drm/i915: Store the BIT(engine->id) as the engine's mask
Okay!

Commit: drm/i915: Refactor common code to load initial power context
Okay!

Commit: drm/i915: Reduce presumption of request ordering for barriers
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3551:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3552:16: warning: expression using sizeof(void)

Commit: drm/i915: Remove has-kernel-context
Okay!

Commit: drm/i915: Introduce the i915_user_extension_method
Okay!

Commit: drm/i915: Track active engines within a context
Okay!

Commit: drm/i915: Introduce a context barrier callback
Okay!

Commit: drm/i915: Create/destroy VM (ppGTT) for use with contexts
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3552:16: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3555:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1264:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1264:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)

Commit: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Okay!

Commit: drm/i915: Allow contexts to share a single timeline across all engines
Okay!

Commit: drm/i915: Fix I915_EXEC_RING_MASK
Okay!

Commit: drm/i915: Remove last traces of exec-id (GEM_BUSY)
Okay!

Commit: drm/i915: Re-arrange execbuf so context is known before engine
Okay!

Commit: drm/i915: Allow a context to define its set of engines
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
Okay!

Commit: drm/i915: Pass around the intel_context
Okay!

Commit: drm/i915: Split struct intel_context definition to its own header
Okay!

Commit: drm/i915: Move over to intel_context_lookup()
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Load balancing across a virtual engine
+./include/linux/overflow.h:285:13: error: incorrect type in conditional
+./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/overflow.h:285:13:    got void
+./include/linux/overflow.h:285:13: warning: call with no type!
+./include/linux/overflow.h:287:13: error: incorrect type in conditional
+./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow'
+./include/linux/overflow.h:287:13:    got void
+./include/linux/overflow.h:287:13: warning: call with no type!

Commit: drm/i915: Extend execution fence to support a callback
Okay!

Commit: drm/i915/execlists: Virtual engine bonding
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* [PATCH] drm/i915: Move over to intel_context_lookup()
  2019-02-06 13:03 ` [PATCH 42/46] drm/i915: Move over to intel_context_lookup() Chris Wilson
@ 2019-02-06 14:27   ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 14:27 UTC (permalink / raw)
  To: intel-gfx

In preparation for an ever growing number of engines and so ever
increasing static array of HW contexts within the GEM context, move the
array over to an rbtree, allocated upon first use.

Unfortunately, this imposes an rbtree lookup at a few frequent callsites,
but we should be able to mitigate those by moving over to using the HW
context as our primary type and so only incur the lookup on the boundary
with the user GEM context and engines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
Revert back to using engine->i915->preempt_context to avoid using
engine->execlists.preempt_context before it is set.
-Chris
---
 drivers/gpu/drm/i915/Makefile                 |   1 +
 drivers/gpu/drm/i915/gvt/mmio_context.c       |   3 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |  13 +-
 drivers/gpu/drm/i915/i915_gem.c               |   9 +-
 drivers/gpu/drm/i915/i915_gem_context.c       |  57 ++------
 drivers/gpu/drm/i915/i915_gem_context_types.h |   8 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |   4 +-
 drivers/gpu/drm/i915/i915_perf.c              |   4 +-
 drivers/gpu/drm/i915/intel_context.c          | 137 ++++++++++++++++++
 drivers/gpu/drm/i915/intel_context.h          |  37 ++++-
 drivers/gpu/drm/i915/intel_context_types.h    |   2 +
 drivers/gpu/drm/i915/intel_engine_cs.c        |   2 +-
 drivers/gpu/drm/i915/intel_engine_types.h     |   5 +
 drivers/gpu/drm/i915/intel_guc_ads.c          |   4 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c              |  31 ++--
 drivers/gpu/drm/i915/intel_ringbuffer.c       |  23 ++-
 drivers/gpu/drm/i915/selftests/mock_context.c |   7 +-
 drivers/gpu/drm/i915/selftests/mock_engine.c  |   6 +-
 19 files changed, 266 insertions(+), 91 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/intel_context.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 89105b1aaf12..d7292b349c0d 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -86,6 +86,7 @@ i915-y += \
 	  i915_trace_points.o \
 	  i915_vma.o \
 	  intel_breadcrumbs.o \
+	  intel_context.o \
 	  intel_engine_cs.o \
 	  intel_hangcheck.o \
 	  intel_lrc.o \
diff --git a/drivers/gpu/drm/i915/gvt/mmio_context.c b/drivers/gpu/drm/i915/gvt/mmio_context.c
index 7d84cfb9051a..442a74805129 100644
--- a/drivers/gpu/drm/i915/gvt/mmio_context.c
+++ b/drivers/gpu/drm/i915/gvt/mmio_context.c
@@ -492,7 +492,8 @@ static void switch_mmio(struct intel_vgpu *pre,
 			 * itself.
 			 */
 			if (mmio->in_context &&
-			    !is_inhibit_context(&s->shadow_ctx->__engine[ring_id]))
+			    !is_inhibit_context(intel_context_lookup(s->shadow_ctx,
+								     dev_priv->engine[ring_id])))
 				continue;
 
 			if (mmio->mask)
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 846bd0de3cfa..9ed7ffef54ad 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -392,7 +392,11 @@ static void print_context_stats(struct seq_file *m,
 		enum intel_engine_id id;
 
 		for_each_engine(engine, i915, id) {
-			struct intel_context *ce = to_intel_context(ctx, engine);
+			struct intel_context *ce;
+
+			ce = intel_context_lookup(ctx, engine);
+			if (!ce)
+				continue;
 
 			if (ce->state)
 				per_file_stats(0, ce->state->obj, &kstats);
@@ -1913,8 +1917,11 @@ static int i915_context_status(struct seq_file *m, void *unused)
 		seq_putc(m, '\n');
 
 		for_each_engine(engine, dev_priv, id) {
-			struct intel_context *ce =
-				to_intel_context(ctx, engine);
+			struct intel_context *ce;
+
+			ce = intel_context_lookup(ctx, engine);
+			if (!ce)
+				continue;
 
 			seq_printf(m, "%s: ", engine->name);
 			if (ce->state)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d50588d54d0b..68726d81efef 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4770,15 +4770,20 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 	}
 
 	for_each_engine(engine, i915, id) {
+		struct intel_context *ce;
 		struct i915_vma *state;
 		void *vaddr;
 
-		GEM_BUG_ON(to_intel_context(ctx, engine)->pin_count);
+		ce = intel_context_lookup(ctx, engine);
+		if (!ce)
+			continue;
 
-		state = to_intel_context(ctx, engine)->state;
+		state = ce->state;
 		if (!state)
 			continue;
 
+		GEM_BUG_ON(ce->pin_count);
+
 		/*
 		 * As we will hold a reference to the logical state, it will
 		 * not be torn down with the context, and importantly the
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 20580463175e..e75fad339ab8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -224,7 +224,7 @@ static void release_hw_id(struct i915_gem_context *ctx)
 
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
-	unsigned int n;
+	struct intel_context *it, *n;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
@@ -235,11 +235,10 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	kfree(ctx->engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
-		struct intel_context *ce = &ctx->__engine[n];
-
-		if (ce->ops)
-			ce->ops->destroy(ce);
+	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node) {
+		if (it->ops && it->ops->destroy)
+			it->ops->destroy(it);
+		kfree(it);
 	}
 
 	if (ctx->timeline)
@@ -346,39 +345,11 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 	return desc;
 }
 
-static void intel_context_retire(struct i915_active_request *active,
-				 struct i915_request *rq)
-{
-	struct intel_context *ce =
-		container_of(active, typeof(*ce), active_tracker);
-
-	intel_context_unpin(ce);
-}
-
-void
-intel_context_init(struct intel_context *ce,
-		   struct i915_gem_context *ctx,
-		   struct intel_engine_cs *engine)
-{
-	ce->gem_context = ctx;
-	ce->engine = engine;
-
-	INIT_LIST_HEAD(&ce->signal_link);
-	INIT_LIST_HEAD(&ce->signals);
-
-	/* Use the whole device by default */
-	ce->sseu = intel_device_default_sseu(ctx->i915);
-
-	i915_active_request_init(&ce->active_tracker,
-				 NULL, intel_context_retire);
-}
-
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
 {
 	struct i915_gem_context *ctx;
-	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -391,8 +362,8 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
 	INIT_LIST_HEAD(&ctx->active_engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
-		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
+	ctx->hw_contexts = RB_ROOT;
+	spin_lock_init(&ctx->hw_contexts_lock);
 
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
@@ -936,8 +907,6 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 		struct intel_ring *ring;
 		struct i915_request *rq;
 
-		GEM_BUG_ON(!to_intel_context(i915->kernel_context, engine));
-
 		rq = i915_request_alloc(engine, i915->kernel_context);
 		if (IS_ERR(rq))
 			return PTR_ERR(rq);
@@ -1172,9 +1141,13 @@ __i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
 				    struct intel_engine_cs *engine,
 				    struct intel_sseu sseu)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
 	int ret = 0;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
 	GEM_BUG_ON(engine->id != RCS);
 
@@ -1725,13 +1698,15 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (!engine)
 		return -EINVAL;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
 	/* Only use for mutex here is to serialize get_param and set_param. */
 	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
 	if (ret)
 		return ret;
 
-	ce = to_intel_context(ctx, engine);
-
 	user_sseu.slice_mask = ce->sseu.slice_mask;
 	user_sseu.subslice_mask = ce->sseu.subslice_mask;
 	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index b69309b46098..865bbcd72ad4 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -7,7 +7,8 @@
 #ifndef __I915_GEM_CONTEXT_TYPES_H__
 #define __I915_GEM_CONTEXT_TYPES_H__
 
-#include "i915_gem.h" /* I915_NUM_ENGINES */
+#include <linux/rbtree.h>
+
 #include "intel_context_types.h"
 
 struct pid;
@@ -134,8 +135,9 @@ struct i915_gem_context {
 
 	struct i915_sched_attr sched;
 
-	/** engine: per-engine logical HW state */
-	struct intel_context __engine[I915_NUM_ENGINES];
+	/** hw_contexts: per-engine logical HW state */
+	struct rb_root hw_contexts;
+	spinlock_t hw_contexts_lock;
 
 	/** ring_size: size for allocating the per-engine ring buffer */
 	u32 ring_size;
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 5052b49f8dcd..cb57178b8fe3 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -790,8 +790,8 @@ static int eb_wait_for_ring(const struct i915_execbuffer *eb)
 	 * keeping all of their resources pinned.
 	 */
 
-	ce = to_intel_context(eb->ctx, eb->engine);
-	if (!ce->ring) /* first use, assume empty! */
+	ce = intel_context_lookup(eb->ctx, eb->engine);
+	if (!ce || !ce->ring) /* first use, assume empty! */
 		return 0;
 
 	rq = __eb_wait_for_ring(ce->ring);
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index f969a0512465..ecca231ca83a 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1740,11 +1740,11 @@ static int gen8_configure_all_contexts(struct drm_i915_private *dev_priv,
 
 	/* Update all contexts now that we've stalled the submission. */
 	list_for_each_entry(ctx, &dev_priv->contexts.list, link) {
-		struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce = intel_context_lookup(ctx, engine);
 		u32 *regs;
 
 		/* OA settings will be set upon first use */
-		if (!ce->state)
+		if (!ce || !ce->state)
 			continue;
 
 		regs = i915_gem_object_pin_map(ce->state->obj, map_type);
diff --git a/drivers/gpu/drm/i915/intel_context.c b/drivers/gpu/drm/i915/intel_context.c
new file mode 100644
index 000000000000..e3e8bc84fdf0
--- /dev/null
+++ b/drivers/gpu/drm/i915/intel_context.c
@@ -0,0 +1,137 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_drv.h"
+#include "i915_gem_context.h"
+#include "intel_context.h"
+#include "intel_ringbuffer.h"
+
+struct intel_context *
+intel_context_lookup(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine)
+{
+	struct intel_context *ce = NULL;
+	struct rb_node *p;
+
+	spin_lock(&ctx->hw_contexts_lock);
+	p = ctx->hw_contexts.rb_node;
+	while (p) {
+		struct intel_context *this =
+			rb_entry(p, struct intel_context, node);
+
+		if (this->engine == engine) {
+			ce = this;
+			break;
+		}
+
+		if (this->engine < engine)
+			p = p->rb_right;
+		else
+			p = p->rb_left;
+	}
+	spin_unlock(&ctx->hw_contexts_lock);
+
+	return ce;
+}
+
+struct intel_context *
+__intel_context_insert(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context *ce)
+{
+	struct rb_node **p, *parent;
+	int err = 0;
+
+	spin_lock(&ctx->hw_contexts_lock);
+
+	parent = NULL;
+	p = &ctx->hw_contexts.rb_node;
+	while (*p) {
+		struct intel_context *this;
+
+		parent = *p;
+		this = rb_entry(parent, struct intel_context, node);
+
+		if (this->engine == engine) {
+			err = -EEXIST;
+			ce = this;
+			break;
+		}
+
+		if (this->engine < engine)
+			p = &parent->rb_right;
+		else
+			p = &parent->rb_left;
+	}
+	if (!err) {
+		rb_link_node(&ce->node, parent, p);
+		rb_insert_color(&ce->node, &ctx->hw_contexts);
+	}
+
+	spin_unlock(&ctx->hw_contexts_lock);
+
+	return ce;
+}
+
+void __intel_context_remove(struct intel_context *ce)
+{
+	struct i915_gem_context *ctx = ce->gem_context;
+
+	spin_lock(&ctx->hw_contexts_lock);
+	rb_erase(&ce->node, &ctx->hw_contexts);
+	spin_unlock(&ctx->hw_contexts_lock);
+}
+
+struct intel_context *
+intel_context_instance(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine)
+{
+	struct intel_context *ce, *pos;
+
+	ce = intel_context_lookup(ctx, engine);
+	if (likely(ce))
+		return ce;
+
+	ce = kzalloc(sizeof(*ce), GFP_KERNEL);
+	if (!ce)
+		return ERR_PTR(-ENOMEM);
+
+	intel_context_init(ce, ctx, engine);
+
+	pos = __intel_context_insert(ctx, engine, ce);
+	if (unlikely(pos != ce)) /* Beaten! Use their HW context instead */
+		kfree(ce);
+
+	GEM_BUG_ON(intel_context_lookup(ctx, engine) != pos);
+	return pos;
+}
+
+static void intel_context_retire(struct i915_active_request *active,
+				 struct i915_request *rq)
+{
+	struct intel_context *ce =
+		container_of(active, typeof(*ce), active_tracker);
+
+	intel_context_unpin(ce);
+}
+
+void
+intel_context_init(struct intel_context *ce,
+		   struct i915_gem_context *ctx,
+		   struct intel_engine_cs *engine)
+{
+	ce->gem_context = ctx;
+	ce->engine = engine;
+
+	INIT_LIST_HEAD(&ce->signal_link);
+	INIT_LIST_HEAD(&ce->signals);
+
+	/* Use the whole device by default */
+	ce->sseu = intel_device_default_sseu(ctx->i915);
+
+	i915_active_request_init(&ce->active_tracker,
+				 NULL, intel_context_retire);
+}
diff --git a/drivers/gpu/drm/i915/intel_context.h b/drivers/gpu/drm/i915/intel_context.h
index dd947692bb0b..c3fffd9b8ae4 100644
--- a/drivers/gpu/drm/i915/intel_context.h
+++ b/drivers/gpu/drm/i915/intel_context.h
@@ -7,7 +7,6 @@
 #ifndef __INTEL_CONTEXT_H__
 #define __INTEL_CONTEXT_H__
 
-#include "i915_gem_context_types.h"
 #include "intel_context_types.h"
 #include "intel_engine_types.h"
 
@@ -15,12 +14,36 @@ void intel_context_init(struct intel_context *ce,
 			struct i915_gem_context *ctx,
 			struct intel_engine_cs *engine);
 
-static inline struct intel_context *
-to_intel_context(struct i915_gem_context *ctx,
-		 const struct intel_engine_cs *engine)
-{
-	return &ctx->__engine[engine->id];
-}
+/**
+ * intel_context_lookup - Find the matching HW context for this (ctx, engine)
+ * @ctx - the parent GEM context
+ * @engine - the target HW engine
+ *
+ * May return NULL if the HW context hasn't been instantiated (i.e. unused).
+ */
+struct intel_context *
+intel_context_lookup(struct i915_gem_context *ctx,
+		     struct intel_engine_cs *engine);
+
+/**
+ * intel_context_instance - Lookup or allocate the HW context for (ctx, engine)
+ * @ctx - the parent GEM context
+ * @engine - the target HW engine
+ *
+ * Returns the existing HW context for this pair of (GEM context, engine), or
+ * allocates and initialises a fresh context. Once allocated, the HW context
+ * remains resident until the GEM context is destroyed.
+ */
+struct intel_context *
+intel_context_instance(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine);
+
+struct intel_context *
+__intel_context_insert(struct i915_gem_context *ctx,
+		       struct intel_engine_cs *engine,
+		       struct intel_context *ce);
+void
+__intel_context_remove(struct intel_context *ce);
 
 static inline struct intel_context *
 intel_context_pin(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
diff --git a/drivers/gpu/drm/i915/intel_context_types.h b/drivers/gpu/drm/i915/intel_context_types.h
index 16e1306e9595..857f5c335324 100644
--- a/drivers/gpu/drm/i915/intel_context_types.h
+++ b/drivers/gpu/drm/i915/intel_context_types.h
@@ -8,6 +8,7 @@
 #define __INTEL_CONTEXT_TYPES__
 
 #include <linux/list.h>
+#include <linux/rbtree.h>
 #include <linux/types.h>
 
 #include "i915_active_types.h"
@@ -52,6 +53,7 @@ struct intel_context {
 	struct i915_active_request active_tracker;
 
 	const struct intel_context_ops *ops;
+	struct rb_node node;
 
 	/** sseu: Control eu/slice partitioning */
 	struct intel_sseu sseu;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 0372aaa9756c..a5b2d50208ef 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -644,7 +644,7 @@ void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
 static void __intel_context_unpin(struct i915_gem_context *ctx,
 				  struct intel_engine_cs *engine)
 {
-	intel_context_unpin(to_intel_context(ctx, engine));
+	intel_context_unpin(intel_context_lookup(ctx, engine));
 }
 
 struct measure_breadcrumb {
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 88a13435e474..8f0aedb2c2d8 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -230,6 +230,11 @@ struct intel_engine_execlists {
 	 */
 	u32 *csb_status;
 
+	/**
+	 * @preempt_context: the HW context for injecting preempt-to-idle
+	 */
+	struct intel_context *preempt_context;
+
 	/**
 	 * @preempt_complete_status: expected CSB upon completing preemption
 	 */
diff --git a/drivers/gpu/drm/i915/intel_guc_ads.c b/drivers/gpu/drm/i915/intel_guc_ads.c
index f0db62887f50..da220561ac41 100644
--- a/drivers/gpu/drm/i915/intel_guc_ads.c
+++ b/drivers/gpu/drm/i915/intel_guc_ads.c
@@ -121,8 +121,8 @@ int intel_guc_ads_create(struct intel_guc *guc)
 	 * to find it. Note that we have to skip our header (1 page),
 	 * because our GuC shared data is there.
 	 */
-	kernel_ctx_vma = to_intel_context(dev_priv->kernel_context,
-					  dev_priv->engine[RCS])->state;
+	kernel_ctx_vma = intel_context_lookup(dev_priv->kernel_context,
+					      dev_priv->engine[RCS])->state;
 	blob->ads.golden_context_lrca =
 		intel_guc_ggtt_offset(guc, kernel_ctx_vma) + skipped_offset;
 
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index 4366db7978a8..fea07e51f109 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -382,7 +382,7 @@ static void guc_stage_desc_init(struct intel_guc_client *client)
 	desc->db_id = client->doorbell_id;
 
 	for_each_engine_masked(engine, dev_priv, client->engines, tmp) {
-		struct intel_context *ce = to_intel_context(ctx, engine);
+		struct intel_context *ce = intel_context_lookup(ctx, engine);
 		u32 guc_engine_id = engine->guc_id;
 		struct guc_execlist_context *lrc = &desc->lrc[guc_engine_id];
 
@@ -567,7 +567,7 @@ static void inject_preempt_context(struct work_struct *work)
 					     preempt_work[engine->id]);
 	struct intel_guc_client *client = guc->preempt_client;
 	struct guc_stage_desc *stage_desc = __get_stage_desc(client);
-	struct intel_context *ce = to_intel_context(client->owner, engine);
+	struct intel_context *ce = intel_context_lookup(client->owner, engine);
 	u32 data[7];
 
 	if (!ce->ring->emit) { /* recreate upon load/resume */
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index caec509543b5..effc71af2236 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -622,8 +622,7 @@ static void port_assign(struct execlist_port *port, struct i915_request *rq)
 static void inject_preempt_context(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists *execlists = &engine->execlists;
-	struct intel_context *ce =
-		to_intel_context(engine->i915->preempt_context, engine);
+	struct intel_context *ce = execlists->preempt_context;
 	unsigned int n;
 
 	GEM_BUG_ON(execlists->preempt_complete_status !=
@@ -1231,19 +1230,22 @@ static void execlists_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
-static void execlists_context_destroy(struct intel_context *ce)
+static void __execlists_context_fini(struct intel_context *ce)
 {
-	GEM_BUG_ON(ce->pin_count);
-
-	if (!ce->state)
-		return;
-
 	intel_ring_free(ce->ring);
 
 	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
 	i915_gem_object_put(ce->state->obj);
 }
 
+static void execlists_context_destroy(struct intel_context *ce)
+{
+	GEM_BUG_ON(ce->pin_count);
+
+	if (ce->state)
+		__execlists_context_fini(ce);
+}
+
 static void execlists_context_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine;
@@ -1386,7 +1388,11 @@ static struct intel_context *
 execlists_context_pin(struct intel_engine_cs *engine,
 		      struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 	GEM_BUG_ON(!ctx->ppgtt);
@@ -2427,8 +2433,9 @@ static int logical_ring_init(struct intel_engine_cs *engine)
 	execlists->preempt_complete_status = ~0u;
 	if (i915->preempt_context) {
 		struct intel_context *ce =
-			to_intel_context(i915->preempt_context, engine);
+			intel_context_lookup(i915->preempt_context, engine);
 
+		execlists->preempt_context = ce;
 		execlists->preempt_complete_status =
 			upper_32_bits(ce->lrc_desc);
 	}
@@ -2899,9 +2906,9 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	list_for_each_entry(ctx, &i915->contexts.list, link) {
 		for_each_engine(engine, i915, id) {
 			struct intel_context *ce =
-				to_intel_context(ctx, engine);
+				intel_context_lookup(ctx, engine);
 
-			if (!ce->state)
+			if (!ce || !ce->state)
 				continue;
 
 			intel_ring_reset(ce->ring, 0);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 4557f715663d..9a5b420273a8 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1348,15 +1348,18 @@ intel_ring_free(struct intel_ring *ring)
 	kfree(ring);
 }
 
+static void __intel_ring_context_fini(struct intel_context *ce)
+{
+	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
+	i915_gem_object_put(ce->state->obj);
+}
+
 static void intel_ring_context_destroy(struct intel_context *ce)
 {
 	GEM_BUG_ON(ce->pin_count);
 
-	if (!ce->state)
-		return;
-
-	GEM_BUG_ON(i915_gem_object_is_active(ce->state->obj));
-	i915_gem_object_put(ce->state->obj);
+	if (ce->state)
+		__intel_ring_context_fini(ce);
 }
 
 static int __context_pin_ppgtt(struct i915_gem_context *ctx)
@@ -1555,7 +1558,11 @@ static struct intel_context *
 intel_ring_context_pin(struct intel_engine_cs *engine,
 		       struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
+
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
 
 	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
 
@@ -1754,8 +1761,8 @@ static inline int mi_set_context(struct i915_request *rq, u32 flags)
 		 * placeholder we use to flush other contexts.
 		 */
 		*cs++ = MI_SET_CONTEXT;
-		*cs++ = i915_ggtt_offset(to_intel_context(i915->kernel_context,
-							  engine)->state) |
+		*cs++ = i915_ggtt_offset(intel_context_lookup(i915->kernel_context,
+							      engine)->state) |
 			MI_MM_SPACE_GTT |
 			MI_RESTORE_INHIBIT;
 	}
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 8137ff6f01b2..58d805757052 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -30,7 +30,6 @@ mock_context(struct drm_i915_private *i915,
 	     const char *name)
 {
 	struct i915_gem_context *ctx;
-	unsigned int n;
 	int ret;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
@@ -41,14 +40,14 @@ mock_context(struct drm_i915_private *i915,
 	INIT_LIST_HEAD(&ctx->link);
 	ctx->i915 = i915;
 
+	ctx->hw_contexts = RB_ROOT;
+	spin_lock_init(&ctx->hw_contexts_lock);
+
 	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
 	INIT_LIST_HEAD(&ctx->active_engines);
 
-	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
-		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
-
 	ret = i915_gem_context_pin_hw_id(ctx);
 	if (ret < 0)
 		goto err_handles;
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index b8c6769571c4..8f72d26c58fe 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -146,9 +146,13 @@ static struct intel_context *
 mock_context_pin(struct intel_engine_cs *engine,
 		 struct i915_gem_context *ctx)
 {
-	struct intel_context *ce = to_intel_context(ctx, engine);
+	struct intel_context *ce;
 	int err = -ENOMEM;
 
+	ce = intel_context_instance(ctx, engine);
+	if (IS_ERR(ce))
+		return ce;
+
 	if (ce->pin_count++)
 		return ce;
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 97+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (48 preceding siblings ...)
  2019-02-06 14:11 ` ✗ Fi.CI.SPARSE: warning " Patchwork
@ 2019-02-06 14:37 ` Patchwork
  2019-02-06 14:55 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 14:37 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
URL   : https://patchwork.freedesktop.org/series/56281/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
d557d3a829de drm/i915: Hack and slash, throttle execbuffer hogs
b4820cad7002 drm/i915: Revoke mmaps and prevent access to fence registers across reset
a5676a243ca9 drm/i915: Force the GPU reset upon wedging
0ebb8e590ea9 drm/i915: Uninterruptibly drain the timelines on unwedging
d12c0a13c719 drm/i915: Wait for old resets before applying debugfs/i915_wedged
52ba77ed3d1d drm/i915: Serialise resets with wedging
b3e799f976f3 drm/i915: Don't claim an unstarted request was guilty
eb622f2f3ea8 drm/i915/execlists: Suppress mere WAIT preemption
72bd8142da29 drm/i915/execlists: Suppress redundant preemption
21b926286743 drm/i915: Make request allocation caches global
-:162: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#162: 
new file mode 100644

-:167: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#167: FILE: drivers/gpu/drm/i915/i915_globals.c:1:
+/*

-:278: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#278: FILE: drivers/gpu/drm/i915/i915_globals.h:1:
+/*

-:619: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#619: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:619: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#619: FILE: drivers/gpu/drm/i915/i915_scheduler.h:95:
+#define priolist_for_each_request(it, plist, idx) \
+	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
+		list_for_each_entry(it, &(plist)->requests[idx], sched.link)

-:623: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'plist' - possible side-effects?
#623: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

-:623: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'idx' - possible side-effects?
#623: FILE: drivers/gpu/drm/i915/i915_scheduler.h:99:
+#define priolist_for_each_request_consume(it, n, plist, idx) \
+	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
+		list_for_each_entry_safe(it, n, \
+					 &(plist)->requests[idx - 1], \
+					 sched.link)

total: 0 errors, 3 warnings, 4 checks, 803 lines checked
080b49acfcea drm/i915: Keep timeline HWSP allocated until idle across the system
919f7b05e698 drm/i915/execlists: Refactor out can_merge_rq()
26f6a47bdaba drm/i915: Compute the global scheduler caps
217daeef0475 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-:333: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#333: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:109:
+#define   MI_SEMAPHORE_SAD_GT_SDD	(0<<12)
                                  	  ^

-:335: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#335: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:111:
+#define   MI_SEMAPHORE_SAD_LT_SDD	(2<<12)
                                  	  ^

-:336: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#336: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:112:
+#define   MI_SEMAPHORE_SAD_LTE_SDD	(3<<12)
                                   	  ^

-:337: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#337: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:113:
+#define   MI_SEMAPHORE_SAD_EQ_SDD	(4<<12)
                                  	  ^

-:338: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#338: FILE: drivers/gpu/drm/i915/intel_gpu_commands.h:114:
+#define   MI_SEMAPHORE_SAD_NEQ_SDD	(5<<12)
                                   	  ^

total: 0 errors, 0 warnings, 5 checks, 298 lines checked
13a17cc9d0e8 drm/i915: Prioritise non-busywait semaphore workloads
1efc4bfb4abc drm/i915: Show support for accurate sw PMU busyness tracking
9d9bf5e37162 drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
ca1dd369a8fa drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
ef1b7ea6c12a drm/i915/pmu: Always sample an active ringbuffer
77aea933c6b5 drm/i915: Remove access to global seqno in the HWSP
8f175c3d19cb drm/i915: Remove i915_request.global_seqno
9a0deb68148b drm/i915: Force GPU idle on suspend
ed75f37144ba drm/i915/selftests: Improve switch-to-kernel-context checking
905940663467 drm/i915: Do a synchronous switch-to-kernel-context on idling
2aae0d45d5e1 drm/i915: Store the BIT(engine->id) as the engine's mask
366712549703 drm/i915: Refactor common code to load initial power context
9472b88e9510 drm/i915: Reduce presumption of request ordering for barriers
-:8: WARNING:COMMIT_MESSAGE: Missing commit description - Add an appropriate one

-:265: ERROR:MISSING_SIGN_OFF: Missing Signed-off-by: line(s)

total: 1 errors, 1 warnings, 0 checks, 203 lines checked
f0e040b2b916 drm/i915: Remove has-kernel-context
438122eecc4a drm/i915: Introduce the i915_user_extension_method
-:58: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#58: 
new file mode 100644

-:63: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#63: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:1:
+/*

-:112: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#112: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:1:
+/*

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

total: 0 errors, 3 warnings, 3 checks, 109 lines checked
28c66fc3a304 drm/i915: Track active engines within a context
3bc89baea5ad drm/i915: Introduce a context barrier callback
1e28b2eb9aed drm/i915: Create/destroy VM (ppGTT) for use with contexts
-:37: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#37: FILE: drivers/gpu/drm/i915/i915_drv.h:220:
+	struct mutex vm_lock;

-:551: WARNING:LINE_SPACING: Missing a blank line after declarations
#551: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:503:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:627: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#627: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:
+		ncontexts = dw = 0;

-:678: WARNING:LINE_SPACING: Missing a blank line after declarations
#678: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:610:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:758: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#758: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:686:
+		ncontexts = dw = 0;

-:876: WARNING:LONG_LINE: line over 100 characters
#876: FILE: include/uapi/drm/i915_drm.h:402:
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)

-:877: WARNING:LONG_LINE: line over 100 characters
#877: FILE: include/uapi/drm/i915_drm.h:403:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:877: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#877: FILE: include/uapi/drm/i915_drm.h:403:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:877: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#877: FILE: include/uapi/drm/i915_drm.h:403:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

total: 1 errors, 5 warnings, 3 checks, 832 lines checked
b464aad73304 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
-:26: WARNING:LONG_LINE: line over 100 characters
#26: FILE: drivers/gpu/drm/i915/i915_drv.c:2997:
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),

-:522: WARNING:LONG_LINE: line over 100 characters
#522: FILE: include/uapi/drm/i915_drm.h:392:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:522: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#522: FILE: include/uapi/drm/i915_drm.h:392:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:522: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#522: FILE: include/uapi/drm/i915_drm.h:392:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

total: 1 errors, 3 warnings, 0 checks, 666 lines checked
7e0736603235 drm/i915: Allow contexts to share a single timeline across all engines
4a8db894f67f drm/i915: Fix I915_EXEC_RING_MASK
3b615f9cb0b6 drm/i915: Remove last traces of exec-id (GEM_BUSY)
cb4fc91f67fa drm/i915: Re-arrange execbuf so context is known before engine
3adceade6ab7 drm/i915: Allow a context to define its set of engines
5a1b17ecbcbd drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
8ac639a97b91 drm/i915: Pass around the intel_context
d56d1a629f1f drm/i915: Split struct intel_context definition to its own header
-:291: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#291: 
new file mode 100644

-:296: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#296: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:1:
+/*

-:557: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#557: FILE: drivers/gpu/drm/i915/i915_timeline_types.h:1:
+/*

-:581: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#581: FILE: drivers/gpu/drm/i915/i915_timeline_types.h:25:
+	spinlock_t lock;

-:642: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#642: FILE: drivers/gpu/drm/i915/intel_context.h:1:
+/*

-:695: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#695: FILE: drivers/gpu/drm/i915/intel_context_types.h:1:
+/*

-:761: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#761: FILE: drivers/gpu/drm/i915/intel_engine_types.h:1:
+/*

-:1048: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#1048: FILE: drivers/gpu/drm/i915/intel_engine_types.h:288:
+		spinlock_t irq_lock;

-:1264: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1264: FILE: drivers/gpu/drm/i915/intel_engine_types.h:504:
+#define instdone_slice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)

-:1268: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1268: FILE: drivers/gpu/drm/i915/intel_engine_types.h:508:
+#define instdone_subslice_mask(dev_priv__) \
+	(IS_GEN(dev_priv__, 7) ? \
+	 1 : RUNTIME_INFO(dev_priv__)->sseu.subslice_mask[0])

-:1272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'dev_priv__' - possible side-effects?
#1272: FILE: drivers/gpu/drm/i915/intel_engine_types.h:512:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'slice__' - possible side-effects?
#1272: FILE: drivers/gpu/drm/i915/intel_engine_types.h:512:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1272: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'subslice__' - possible side-effects?
#1272: FILE: drivers/gpu/drm/i915/intel_engine_types.h:512:
+#define for_each_instdone_slice_subslice(dev_priv__, slice__, subslice__) \
+	for ((slice__) = 0, (subslice__) = 0; \
+	     (slice__) < I915_MAX_SLICES; \
+	     (subslice__) = ((subslice__) + 1) < I915_MAX_SUBSLICES ? (subslice__) + 1 : 0, \
+	       (slice__) += ((subslice__) == 0)) \
+		for_each_if((BIT(slice__) & instdone_slice_mask(dev_priv__)) && \
+			    (BIT(subslice__) & instdone_subslice_mask(dev_priv__)))

-:1853: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#1853: FILE: drivers/gpu/drm/i915/intel_workarounds_types.h:1:
+/*

total: 0 errors, 7 warnings, 7 checks, 1796 lines checked
6fe3b4c65bbf drm/i915: Move over to intel_context_lookup()
-:245: CHECK:UNCOMMENTED_DEFINITION: spinlock_t definition without comment
#245: FILE: drivers/gpu/drm/i915/i915_gem_context_types.h:140:
+	spinlock_t hw_contexts_lock;

-:283: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#283: 
new file mode 100644

-:288: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#288: FILE: drivers/gpu/drm/i915/intel_context.c:1:
+/*

total: 0 errors, 2 warnings, 1 checks, 615 lines checked
520f60c930cd drm/i915: Load balancing across a virtual engine
-:825: WARNING:LINE_SPACING: Missing a blank line after declarations
#825: FILE: drivers/gpu/drm/i915/intel_lrc.c:3359:
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);

total: 0 errors, 1 warnings, 0 checks, 1016 lines checked
96abf66cf3a3 drm/i915: Extend execution fence to support a callback
501ddeed6a11 drm/i915/execlists: Virtual engine bonding
953cd4c6741f drm/i915: Allow specification of parallel execbuf
-:132: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#132: FILE: include/uapi/drm/i915_drm.h:1125:
+#define I915_EXEC_FENCE_SUBMIT		(1<<20)
                               		  ^

-:134: CHECK:SPACING: spaces preferred around that '<<' (ctx:VxV)
#134: FILE: include/uapi/drm/i915_drm.h:1127:
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT<<1))
                                                            ^

total: 0 errors, 0 warnings, 2 checks, 90 lines checked

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (49 preceding siblings ...)
  2019-02-06 14:37 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2) Patchwork
@ 2019-02-06 14:55 ` Patchwork
  2019-02-06 14:56 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-02-06 16:18 ` ✗ Fi.CI.IGT: failure " Patchwork
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 14:55 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
URL   : https://patchwork.freedesktop.org/series/56281/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Hack and slash, throttle execbuffer hogs
Okay!

Commit: drm/i915: Revoke mmaps and prevent access to fence registers across reset
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_gem.c:986:39: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_reset.c:1302:5: warning: context imbalance in 'i915_reset_trylock' - different lock contexts for basic block
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3565:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3559:16: warning: expression using sizeof(void)

Commit: drm/i915: Force the GPU reset upon wedging
Okay!

Commit: drm/i915: Uninterruptibly drain the timelines on unwedging
Okay!

Commit: drm/i915: Wait for old resets before applying debugfs/i915_wedged
Okay!

Commit: drm/i915: Serialise resets with wedging
Okay!

Commit: drm/i915: Don't claim an unstarted request was guilty
Okay!

Commit: drm/i915/execlists: Suppress mere WAIT preemption
Okay!

Commit: drm/i915/execlists: Suppress redundant preemption
Okay!

Commit: drm/i915: Make request allocation caches global
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3559:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3556:16: warning: expression using sizeof(void)

Commit: drm/i915: Keep timeline HWSP allocated until idle across the system
Okay!

Commit: drm/i915/execlists: Refactor out can_merge_rq()
Okay!

Commit: drm/i915: Compute the global scheduler caps
Okay!

Commit: drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
-O:drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/i915_drv.c:349:25: warning: expression using sizeof(void)

Commit: drm/i915: Prioritise non-busywait semaphore workloads
Okay!

Commit: drm/i915: Show support for accurate sw PMU busyness tracking
Okay!

Commit: drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3556:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3551:16: warning: expression using sizeof(void)

Commit: drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
Okay!

Commit: drm/i915/pmu: Always sample an active ringbuffer
Okay!

Commit: drm/i915: Remove access to global seqno in the HWSP
Okay!

Commit: drm/i915: Remove i915_request.global_seqno
Okay!

Commit: drm/i915: Force GPU idle on suspend
Okay!

Commit: drm/i915/selftests: Improve switch-to-kernel-context checking
Okay!

Commit: drm/i915: Do a synchronous switch-to-kernel-context on idling
Okay!

Commit: drm/i915: Store the BIT(engine->id) as the engine's mask
Okay!

Commit: drm/i915: Refactor common code to load initial power context
Okay!

Commit: drm/i915: Reduce presumption of request ordering for barriers
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3551:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3552:16: warning: expression using sizeof(void)

Commit: drm/i915: Remove has-kernel-context
Okay!

Commit: drm/i915: Introduce the i915_user_extension_method
Okay!

Commit: drm/i915: Track active engines within a context
Okay!

Commit: drm/i915: Introduce a context barrier callback
Okay!

Commit: drm/i915: Create/destroy VM (ppGTT) for use with contexts
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3552:16: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3555:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1264:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1264:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)

Commit: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Okay!

Commit: drm/i915: Allow contexts to share a single timeline across all engines
Okay!

Commit: drm/i915: Fix I915_EXEC_RING_MASK
Okay!

Commit: drm/i915: Remove last traces of exec-id (GEM_BUSY)
Okay!

Commit: drm/i915: Re-arrange execbuf so context is known before engine
Okay!

Commit: drm/i915: Allow a context to define its set of engines
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
Okay!

Commit: drm/i915: Pass around the intel_context
Okay!

Commit: drm/i915: Split struct intel_context definition to its own header
Okay!

Commit: drm/i915: Move over to intel_context_lookup()
+./include/uapi/linux/perf_event.h:147:56: warning: cast truncates bits from constant value (8000000000000000 becomes 0)

Commit: drm/i915: Load balancing across a virtual engine
+./include/linux/overflow.h:285:13: error: incorrect type in conditional
+./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/overflow.h:285:13:    got void
+./include/linux/overflow.h:285:13: warning: call with no type!
+./include/linux/overflow.h:287:13: error: incorrect type in conditional
+./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow'
+./include/linux/overflow.h:287:13:    got void
+./include/linux/overflow.h:287:13: warning: call with no type!

Commit: drm/i915: Extend execution fence to support a callback
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (50 preceding siblings ...)
  2019-02-06 14:55 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-02-06 14:56 ` Patchwork
  2019-02-06 16:18 ` ✗ Fi.CI.IGT: failure " Patchwork
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 14:56 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
URL   : https://patchwork.freedesktop.org/series/56281/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5551 -> Patchwork_12155
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/56281/revisions/2/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12155 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         PASS -> INCOMPLETE [fdo#103927]

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         PASS -> FAIL [fdo#103182] +1

  * igt@kms_flip@basic-flip-vs-modeset:
    - fi-skl-6700hq:      PASS -> DMESG-WARN [fdo#105998]

  * igt@kms_frontbuffer_tracking@basic:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103167]

  * igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362] +2

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s3:
    - fi-blb-e6850:       INCOMPLETE [fdo#107718] -> PASS

  * igt@kms_chamelium@hdmi-hpd-fast:
    - fi-kbl-7500u:       FAIL [fdo#109485] -> PASS

  * igt@pm_rpm@basic-pci-d3-state:
    - fi-bsw-kefka:       {SKIP} [fdo#109271] -> PASS

  * igt@pm_rpm@basic-rte:
    - fi-bsw-kefka:       FAIL [fdo#108800] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#105998]: https://bugs.freedesktop.org/show_bug.cgi?id=105998
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107718]: https://bugs.freedesktop.org/show_bug.cgi?id=107718
  [fdo#108622]: https://bugs.freedesktop.org/show_bug.cgi?id=108622
  [fdo#108800]: https://bugs.freedesktop.org/show_bug.cgi?id=108800
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109485]: https://bugs.freedesktop.org/show_bug.cgi?id=109485


Participating hosts (48 -> 45)
------------------------------

  Additional (1): fi-ivb-3770 
  Missing    (4): fi-icl-y fi-ilk-m540 fi-byt-squawks fi-bsw-cyan 


Build changes
-------------

    * Linux: CI_DRM_5551 -> Patchwork_12155

  CI_DRM_5551: 417d0e0cd0275705aed001d938e646879ee5afe9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4812: 592b854fead32c2b0dac7198edfb9a6bffd66932 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12155: 953cd4c6741fb00f1a09652ee676d06d50c72412 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

953cd4c6741f drm/i915: Allow specification of parallel execbuf
501ddeed6a11 drm/i915/execlists: Virtual engine bonding
96abf66cf3a3 drm/i915: Extend execution fence to support a callback
520f60c930cd drm/i915: Load balancing across a virtual engine
6fe3b4c65bbf drm/i915: Move over to intel_context_lookup()
d56d1a629f1f drm/i915: Split struct intel_context definition to its own header
8ac639a97b91 drm/i915: Pass around the intel_context
5a1b17ecbcbd drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
3adceade6ab7 drm/i915: Allow a context to define its set of engines
cb4fc91f67fa drm/i915: Re-arrange execbuf so context is known before engine
3b615f9cb0b6 drm/i915: Remove last traces of exec-id (GEM_BUSY)
4a8db894f67f drm/i915: Fix I915_EXEC_RING_MASK
7e0736603235 drm/i915: Allow contexts to share a single timeline across all engines
b464aad73304 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
1e28b2eb9aed drm/i915: Create/destroy VM (ppGTT) for use with contexts
3bc89baea5ad drm/i915: Introduce a context barrier callback
28c66fc3a304 drm/i915: Track active engines within a context
438122eecc4a drm/i915: Introduce the i915_user_extension_method
f0e040b2b916 drm/i915: Remove has-kernel-context
9472b88e9510 drm/i915: Reduce presumption of request ordering for barriers
366712549703 drm/i915: Refactor common code to load initial power context
2aae0d45d5e1 drm/i915: Store the BIT(engine->id) as the engine's mask
905940663467 drm/i915: Do a synchronous switch-to-kernel-context on idling
ed75f37144ba drm/i915/selftests: Improve switch-to-kernel-context checking
9a0deb68148b drm/i915: Force GPU idle on suspend
8f175c3d19cb drm/i915: Remove i915_request.global_seqno
77aea933c6b5 drm/i915: Remove access to global seqno in the HWSP
ef1b7ea6c12a drm/i915/pmu: Always sample an active ringbuffer
ca1dd369a8fa drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
9d9bf5e37162 drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
1efc4bfb4abc drm/i915: Show support for accurate sw PMU busyness tracking
13a17cc9d0e8 drm/i915: Prioritise non-busywait semaphore workloads
217daeef0475 drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+
26f6a47bdaba drm/i915: Compute the global scheduler caps
919f7b05e698 drm/i915/execlists: Refactor out can_merge_rq()
080b49acfcea drm/i915: Keep timeline HWSP allocated until idle across the system
21b926286743 drm/i915: Make request allocation caches global
72bd8142da29 drm/i915/execlists: Suppress redundant preemption
eb622f2f3ea8 drm/i915/execlists: Suppress mere WAIT preemption
b3e799f976f3 drm/i915: Don't claim an unstarted request was guilty
52ba77ed3d1d drm/i915: Serialise resets with wedging
d12c0a13c719 drm/i915: Wait for old resets before applying debugfs/i915_wedged
0ebb8e590ea9 drm/i915: Uninterruptibly drain the timelines on unwedging
a5676a243ca9 drm/i915: Force the GPU reset upon wedging
b4820cad7002 drm/i915: Revoke mmaps and prevent access to fence registers across reset
d557d3a829de drm/i915: Hack and slash, throttle execbuffer hogs

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12155/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-06 13:03 ` [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
@ 2019-02-06 15:56   ` Mika Kuoppala
  2019-02-06 16:08     ` Chris Wilson
  2019-02-26 19:53   ` Rodrigo Vivi
  1 sibling, 1 reply; 97+ messages in thread
From: Mika Kuoppala @ 2019-02-06 15:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx

Chris Wilson <chris@chris-wilson.co.uk> writes:

> Previously, we were able to rely on the recursive properties of
> struct_mutex to allow us to serialise revoking mmaps and reacquiring the
> FENCE registers with them being clobbered over a global device reset.
> I then proceeded to throw out the baby with the bath water in order to
> pursue a struct_mutex-less reset.
>
> Perusing LWN for alternative strategies, the dilemma on how to serialise
> access to a global resource on one side was answered by
> https://lwn.net/Articles/202847/ -- Sleepable RCU:
>
>     1  int readside(void) {
>     2      int idx;
>     3      rcu_read_lock();
>     4	   if (nomoresrcu) {
>     5          rcu_read_unlock();
>     6	       return -EINVAL;
>     7      }
>     8	   idx = srcu_read_lock(&ss);
>     9	   rcu_read_unlock();
>     10	   /* SRCU read-side critical section. */
>     11	   srcu_read_unlock(&ss, idx);
>     12	   return 0;
>     13 }
>     14
>     15 void cleanup(void)
>     16 {
>     17     nomoresrcu = 1;
>     18     synchronize_rcu();
>     19     synchronize_srcu(&ss);
>     20     cleanup_srcu_struct(&ss);
>     21 }
>
> No more worrying about stop_machine, just an uber-complex mutex,
> optimised for reads, with the overhead pushed to the rare reset path.
>
> However, we do run the risk of a deadlock as we allocate underneath the
> SRCU read lock, and the allocation may require a GPU reset, causing a
> dependency cycle via the in-flight requests. We resolve that by declaring
> the driver wedged and cancelling all in-flight rendering.
>
> v2: Use expedited rcu barriers to match our earlier timing
> characteristics.
> v3: Try to annotate locking contexts for sparse
> v4: Reduce selftest lock duration to avoid a reset deadlock with fences
> v5: s/srcu/reset_backoff_srcu/
>
> Testcase: igt/gem_mmap_gtt/hang
> Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c           |  12 +-
>  drivers/gpu/drm/i915/i915_drv.h               |  18 +--
>  drivers/gpu/drm/i915/i915_gem.c               |  56 +++------
>  drivers/gpu/drm/i915/i915_gem_fence_reg.c     |  31 +----
>  drivers/gpu/drm/i915/i915_gpu_error.h         |  12 +-
>  drivers/gpu/drm/i915/i915_reset.c             | 109 +++++++++++-------
>  drivers/gpu/drm/i915/i915_reset.h             |   4 +
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  |   5 +-
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |   1 +
>  9 files changed, 109 insertions(+), 139 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 0bd890c04fe4..a6fd157b1637 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1281,14 +1281,11 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  	intel_wakeref_t wakeref;
>  	enum intel_engine_id id;
>  
> +	seq_printf(m, "Reset flags: %lx\n", dev_priv->gpu_error.flags);
>  	if (test_bit(I915_WEDGED, &dev_priv->gpu_error.flags))
> -		seq_puts(m, "Wedged\n");
> +		seq_puts(m, "\tWedged\n");
>  	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
> -		seq_puts(m, "Reset in progress: struct_mutex backoff\n");
> -	if (waitqueue_active(&dev_priv->gpu_error.wait_queue))
> -		seq_puts(m, "Waiter holding struct mutex\n");
> -	if (waitqueue_active(&dev_priv->gpu_error.reset_queue))
> -		seq_puts(m, "struct_mutex blocked for reset\n");
> +		seq_puts(m, "\tDevice (global) reset in progress\n");
>  
>  	if (!i915_modparams.enable_hangcheck) {
>  		seq_puts(m, "Hangcheck disabled\n");
> @@ -3885,9 +3882,6 @@ i915_wedged_set(void *data, u64 val)
>  	 * while it is writing to 'i915_wedged'
>  	 */
>  
> -	if (i915_reset_backoff(&i915->gpu_error))
> -		return -EAGAIN;
> -
>  	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
>  			  "Manually set wedged engine mask = %llx", val);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a2293152cb6a..37230ae7fbe6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2989,7 +2989,12 @@ i915_gem_obj_finish_shmem_access(struct drm_i915_gem_object *obj)
>  	i915_gem_object_unpin_pages(obj);
>  }
>  
> -int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
> +static inline int __must_check
> +i915_mutex_lock_interruptible(struct drm_device *dev)
> +{
> +	return mutex_lock_interruptible(&dev->struct_mutex);
> +}
> +
>  int i915_gem_dumb_create(struct drm_file *file_priv,
>  			 struct drm_device *dev,
>  			 struct drm_mode_create_dumb *args);
> @@ -3006,21 +3011,11 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
>  struct i915_request *
>  i915_gem_find_active_request(struct intel_engine_cs *engine);
>  
> -static inline bool i915_reset_backoff(struct i915_gpu_error *error)
> -{
> -	return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags));
> -}
> -
>  static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
>  {
>  	return unlikely(test_bit(I915_WEDGED, &error->flags));
>  }
>  
> -static inline bool i915_reset_backoff_or_wedged(struct i915_gpu_error *error)
> -{
> -	return i915_reset_backoff(error) | i915_terminally_wedged(error);
> -}
> -
>  static inline u32 i915_reset_count(struct i915_gpu_error *error)
>  {
>  	return READ_ONCE(error->reset_count);
> @@ -3093,7 +3088,6 @@ struct drm_i915_fence_reg *
>  i915_reserve_fence(struct drm_i915_private *dev_priv);
>  void i915_unreserve_fence(struct drm_i915_fence_reg *fence);
>  
> -void i915_gem_revoke_fences(struct drm_i915_private *dev_priv);
>  void i915_gem_restore_fences(struct drm_i915_private *dev_priv);
>  
>  void i915_gem_detect_bit_6_swizzle(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 05ce9176ac4e..1eb3a5f8654c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -100,47 +100,6 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
>  	spin_unlock(&dev_priv->mm.object_stat_lock);
>  }
>  
> -static int
> -i915_gem_wait_for_error(struct i915_gpu_error *error)
> -{
> -	int ret;
> -
> -	might_sleep();
> -
> -	/*
> -	 * Only wait 10 seconds for the gpu reset to complete to avoid hanging
> -	 * userspace. If it takes that long something really bad is going on and
> -	 * we should simply try to bail out and fail as gracefully as possible.
> -	 */
> -	ret = wait_event_interruptible_timeout(error->reset_queue,
> -					       !i915_reset_backoff(error),
> -					       I915_RESET_TIMEOUT);
> -	if (ret == 0) {
> -		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
> -		return -EIO;
> -	} else if (ret < 0) {
> -		return ret;
> -	} else {
> -		return 0;
> -	}
> -}
> -
> -int i915_mutex_lock_interruptible(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	int ret;
> -
> -	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
> -	if (ret)
> -		return ret;
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	return 0;
> -}
> -
>  static u32 __i915_gem_park(struct drm_i915_private *i915)
>  {
>  	intel_wakeref_t wakeref;
> @@ -1869,6 +1828,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  	intel_wakeref_t wakeref;
>  	struct i915_vma *vma;
>  	pgoff_t page_offset;
> +	int srcu;
>  	int ret;
>  
>  	/* Sanity check that we allow writing into this object */
> @@ -1908,7 +1868,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  		goto err_unlock;
>  	}
>  
> -
>  	/* Now pin it into the GTT as needed */
>  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>  				       PIN_MAPPABLE |
> @@ -1946,9 +1905,15 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  	if (ret)
>  		goto err_unpin;
>  
> +	srcu = i915_reset_trylock(dev_priv);
> +	if (srcu < 0) {
> +		ret = srcu;
> +		goto err_unpin;
> +	}
> +
>  	ret = i915_vma_pin_fence(vma);
>  	if (ret)
> -		goto err_unpin;
> +		goto err_reset;
>  
>  	/* Finally, remap it using the new GTT offset */
>  	ret = remap_io_mapping(area,
> @@ -1969,6 +1934,8 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  
>  err_fence:
>  	i915_vma_unpin_fence(vma);
> +err_reset:
> +	i915_reset_unlock(dev_priv, srcu);
>  err_unpin:
>  	__i915_vma_unpin(vma);
>  err_unlock:
> @@ -5326,6 +5293,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>  	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
>  	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>  	mutex_init(&dev_priv->gpu_error.wedge_mutex);
> +	init_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
>  
>  	atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
>  
> @@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>  	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>  	WARN_ON(dev_priv->mm.object_count);
>

synchronize_rcu and srcu and GEM_BUG_ON for reset backoff?

> +	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
> +
>  	kmem_cache_destroy(dev_priv->priorities);
>  	kmem_cache_destroy(dev_priv->dependencies);
>  	kmem_cache_destroy(dev_priv->requests);
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> index e037e94792f3..36d548fa3aa2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -240,6 +240,10 @@ static int fence_update(struct drm_i915_fence_reg *fence,
>  		i915_vma_flush_writes(old);
>  	}
>  
> +	ret = i915_reset_trylock(fence->i915);
> +	if (ret < 0)
> +		return ret;
> +
>  	if (fence->vma && fence->vma != vma) {
>  		/* Ensure that all userspace CPU access is completed before
>  		 * stealing the fence.
> @@ -272,6 +276,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
>  		list_move_tail(&fence->link, &fence->i915->mm.fence_list);
>  	}
>  
> +	i915_reset_unlock(fence->i915, ret);
>  	return 0;
>  }
>  
> @@ -435,32 +440,6 @@ void i915_unreserve_fence(struct drm_i915_fence_reg *fence)
>  	list_add(&fence->link, &fence->i915->mm.fence_list);
>  }
>  
> -/**
> - * i915_gem_revoke_fences - revoke fence state
> - * @dev_priv: i915 device private
> - *
> - * Removes all GTT mmappings via the fence registers. This forces any user
> - * of the fence to reacquire that fence before continuing with their access.
> - * One use is during GPU reset where the fence register is lost and we need to
> - * revoke concurrent userspace access via GTT mmaps until the hardware has been
> - * reset and the fence registers have been restored.
> - */
> -void i915_gem_revoke_fences(struct drm_i915_private *dev_priv)
> -{
> -	int i;
> -
> -	lockdep_assert_held(&dev_priv->drm.struct_mutex);
> -
> -	for (i = 0; i < dev_priv->num_fence_regs; i++) {
> -		struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
> -
> -		GEM_BUG_ON(fence->vma && fence->vma->fence != fence);
> -
> -		if (fence->vma)
> -			i915_vma_revoke_mmap(fence->vma);
> -	}
> -}
> -
>  /**
>   * i915_gem_restore_fences - restore fence state
>   * @dev_priv: i915 device private
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 53b1f22dd365..d5c58e82508b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -231,12 +231,10 @@ struct i915_gpu_error {
>  	/**
>  	 * flags: Control various stages of the GPU reset
>  	 *
> -	 * #I915_RESET_BACKOFF - When we start a reset, we want to stop any
> -	 * other users acquiring the struct_mutex. To do this we set the
> -	 * #I915_RESET_BACKOFF bit in the error flags when we detect a reset
> -	 * and then check for that bit before acquiring the struct_mutex (in
> -	 * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
> -	 * secondary role in preventing two concurrent global reset attempts.
> +	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
> +	 * serialise with any other users attempting to do the same, and
> +	 * any global resources that may be clobber by the reset (such as
> +	 * FENCE registers).
>  	 *
>  	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
>  	 * acquire the struct_mutex to reset an engine, we need an explicit
> @@ -272,6 +270,8 @@ struct i915_gpu_error {
>  	 */
>  	wait_queue_head_t reset_queue;
>  
> +	struct srcu_struct reset_backoff_srcu;
> +
>  	struct i915_gpu_restart *restart;
>  };
>  
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 0e0ddf2e6815..272d00d4b8a3 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -639,6 +639,31 @@ static void reset_prepare_engine(struct intel_engine_cs *engine)
>  	engine->reset.prepare(engine);
>  }
>  
> +static void revoke_mmaps(struct drm_i915_private *i915)
> +{
> +	int i;
> +
> +	for (i = 0; i < i915->num_fence_regs; i++) {
> +		struct i915_vma *vma = i915->fence_regs[i].vma;
> +		struct drm_vma_offset_node *node;
> +		u64 vma_offset;
> +
> +		if (!vma)
> +			continue;
> +
> +		GEM_BUG_ON(vma->fence != &i915->fence_regs[i]);
> +		if (!i915_vma_has_userfault(vma))
> +			continue;
> +
> +		node = &vma->obj->base.vma_node;
> +		vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT;
> +		unmap_mapping_range(i915->drm.anon_inode->i_mapping,
> +				    drm_vma_node_offset_addr(node) + vma_offset,
> +				    vma->size,
> +				    1);
> +	}
> +}
> +
>  static void reset_prepare(struct drm_i915_private *i915)
>  {
>  	struct intel_engine_cs *engine;
> @@ -648,6 +673,7 @@ static void reset_prepare(struct drm_i915_private *i915)
>  		reset_prepare_engine(engine);
>  
>  	intel_uc_sanitize(i915);
> +	revoke_mmaps(i915);
>  }
>  
>  static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
> @@ -911,50 +937,22 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	return ret;
>  }
>  
> -struct __i915_reset {
> -	struct drm_i915_private *i915;
> -	unsigned int stalled_mask;
> -};
> -
> -static int __i915_reset__BKL(void *data)
> -{
> -	struct __i915_reset *arg = data;
> -	int err;
> -
> -	err = intel_gpu_reset(arg->i915, ALL_ENGINES);
> -	if (err)
> -		return err;
> -
> -	return gt_reset(arg->i915, arg->stalled_mask);
> -}
> -
> -#if RESET_UNDER_STOP_MACHINE
> -/*
> - * XXX An alternative to using stop_machine would be to park only the
> - * processes that have a GGTT mmap. By remote parking the threads (SIGSTOP)
> - * we should be able to prevent their memmory accesses via the lost fence
> - * registers over the course of the reset without the potential recursive
> - * of mutexes between the pagefault handler and reset.
> - *
> - * See igt/gem_mmap_gtt/hang
> - */
> -#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)
> -#else
> -#define __do_reset(fn, arg) fn(arg)
> -#endif
> -
>  static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
>  {
> -	struct __i915_reset arg = { i915, stalled_mask };
>  	int err, i;
>  
> -	err = __do_reset(__i915_reset__BKL, &arg);
> +	/* Flush everyone currently using a resource about to be clobbered */
> +	synchronize_srcu(&i915->gpu_error.reset_backoff_srcu);
> +
> +	err = intel_gpu_reset(i915, ALL_ENGINES);
>  	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
> -		msleep(100);
> -		err = __do_reset(__i915_reset__BKL, &arg);
> +		msleep(10 * (i + 1));
> +		err = intel_gpu_reset(i915, ALL_ENGINES);
>  	}
> +	if (err)
> +		return err;
>  
> -	return err;
> +	return gt_reset(i915, stalled_mask);
>  }
>  
>  /**
> @@ -966,8 +964,6 @@ static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
>   * Reset the chip.  Useful if a hang is detected. Marks the device as wedged
>   * on failure.
>   *
> - * Caller must hold the struct_mutex.
> - *
>   * Procedure is fairly simple:
>   *   - reset the chip using the reset reg
>   *   - re-init context state
> @@ -1274,9 +1270,12 @@ void i915_handle_error(struct drm_i915_private *i915,
>  		wait_event(i915->gpu_error.reset_queue,
>  			   !test_bit(I915_RESET_BACKOFF,
>  				     &i915->gpu_error.flags));
> -		goto out;
> +		goto out; /* piggy-back on the other reset */
>  	}
>

I should have been more specific in last run, sorry. Still errorneous
comment above test_and_set_bit(I915_RESET_BACKOFF).

Also as I understood from RCU/checklist.txt the rcu provides
no help for read ordering here. The test_and_set_bit provides
the mb on this side. But add smp_mb__before_atomic before
sampling the the bit in trylock?

-Mika


> +	/* Make sure i915_reset_trylock() sees the I915_RESET_BACKOFF */
> +	synchronize_rcu_expedited();
> +
>  	/* Prevent any other reset-engine attempt. */
>  	for_each_engine(engine, i915, tmp) {
>  		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> @@ -1300,6 +1299,36 @@ void i915_handle_error(struct drm_i915_private *i915,
>  	intel_runtime_pm_put(i915, wakeref);
>  }
>  
> +int i915_reset_trylock(struct drm_i915_private *i915)
> +{
> +	struct i915_gpu_error *error = &i915->gpu_error;
> +	int srcu;
> +
> +	rcu_read_lock();
> +	while (test_bit(I915_RESET_BACKOFF, &error->flags)) {
> +		rcu_read_unlock();
> +
> +		if (wait_event_interruptible(error->reset_queue,
> +					     !test_bit(I915_RESET_BACKOFF,
> +						       &error->flags)))
> +			return -EINTR;
> +
> +		rcu_read_lock();
> +	}
> +	srcu = srcu_read_lock(&error->reset_backoff_srcu);
> +	rcu_read_unlock();
> +
> +	return srcu;
> +}
> +
> +void i915_reset_unlock(struct drm_i915_private *i915, int tag)
> +__releases(&i915->gpu_error.reset_backoff_srcu)
> +{
> +	struct i915_gpu_error *error = &i915->gpu_error;
> +
> +	srcu_read_unlock(&error->reset_backoff_srcu, tag);
> +}
> +
>  bool i915_reset_flush(struct drm_i915_private *i915)
>  {
>  	int err;
> diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
> index f2d347f319df..893c5d1c2eb8 100644
> --- a/drivers/gpu/drm/i915/i915_reset.h
> +++ b/drivers/gpu/drm/i915/i915_reset.h
> @@ -9,6 +9,7 @@
>  
>  #include <linux/compiler.h>
>  #include <linux/types.h>
> +#include <linux/srcu.h>
>  
>  struct drm_i915_private;
>  struct intel_engine_cs;
> @@ -32,6 +33,9 @@ int i915_reset_engine(struct intel_engine_cs *engine,
>  void i915_reset_request(struct i915_request *rq, bool guilty);
>  bool i915_reset_flush(struct drm_i915_private *i915);
>  
> +int __must_check i915_reset_trylock(struct drm_i915_private *i915);
> +void i915_reset_unlock(struct drm_i915_private *i915, int tag);
> +
>  bool intel_has_gpu_reset(struct drm_i915_private *i915);
>  bool intel_has_reset_engine(struct drm_i915_private *i915);
>  
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 7b6f3bea9ef8..4886fac12628 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1039,8 +1039,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  
>  	/* Check that we can recover an unbind stuck on a hanging request */
>  
> -	igt_global_reset_lock(i915);
> -
>  	mutex_lock(&i915->drm.struct_mutex);
>  	err = hang_init(&h, i915);
>  	if (err)
> @@ -1138,7 +1136,9 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  	}
>  
>  out_reset:
> +	igt_global_reset_lock(i915);
>  	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
> +	igt_global_reset_unlock(i915);
>  
>  	if (tsk) {
>  		struct igt_wedge_me w;
> @@ -1159,7 +1159,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  	hang_fini(&h);
>  unlock:
>  	mutex_unlock(&i915->drm.struct_mutex);
> -	igt_global_reset_unlock(i915);
>  
>  	if (i915_terminally_wedged(&i915->gpu_error))
>  		return -EIO;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 14ae46fda49f..fc516a2970f4 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -189,6 +189,7 @@ struct drm_i915_private *mock_gem_device(void)
>  
>  	init_waitqueue_head(&i915->gpu_error.wait_queue);
>  	init_waitqueue_head(&i915->gpu_error.reset_queue);
> +	init_srcu_struct(&i915->gpu_error.reset_backoff_srcu);
>  	mutex_init(&i915->gpu_error.wedge_mutex);
>  
>  	i915->wq = alloc_ordered_workqueue("mock", 0);
> -- 
> 2.20.1
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-06 15:56   ` Mika Kuoppala
@ 2019-02-06 16:08     ` Chris Wilson
  2019-02-06 16:18       ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 16:08 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Mika Kuoppala (2019-02-06 15:56:28)
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> > @@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
> >       GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
> >       WARN_ON(dev_priv->mm.object_count);
> >
> 
> synchronize_rcu and srcu and GEM_BUG_ON for reset backoff?

There can't be any readers by this point, and we don't use call_[s]rcu
so no deferred tasks to worry about.

I don't see anything that we can bug on to prove that though.

> > @@ -1274,9 +1270,12 @@ void i915_handle_error(struct drm_i915_private *i915,
> >               wait_event(i915->gpu_error.reset_queue,
> >                          !test_bit(I915_RESET_BACKOFF,
> >                                    &i915->gpu_error.flags));
> > -             goto out;
> > +             goto out; /* piggy-back on the other reset */
> >       }
> >
> 
> I should have been more specific in last run, sorry. Still errorneous
> comment above test_and_set_bit(I915_RESET_BACKOFF).
> 
> Also as I understood from RCU/checklist.txt the rcu provides
> no help for read ordering here. The test_and_set_bit provides
> the mb on this side. But add smp_mb__before_atomic before
> sampling the the bit in trylock?

Wrong side, you apply that to the writer and we have the implicit mb in
the wakeup already for the write. The atomic reads are just fine in
conjunction with wait event serialisation.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-06 16:08     ` Chris Wilson
@ 2019-02-06 16:18       ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-06 16:18 UTC (permalink / raw)
  To: Mika Kuoppala, intel-gfx

Quoting Chris Wilson (2019-02-06 16:08:23)
> Quoting Mika Kuoppala (2019-02-06 15:56:28)
> > Chris Wilson <chris@chris-wilson.co.uk> writes:
> > > @@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
> > >       GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
> > >       WARN_ON(dev_priv->mm.object_count);
> > >
> > 
> > synchronize_rcu and srcu and GEM_BUG_ON for reset backoff?
> 
> There can't be any readers by this point, and we don't use call_[s]rcu
> so no deferred tasks to worry about.
> 
> I don't see anything that we can bug on to prove that though.
> 
> > > @@ -1274,9 +1270,12 @@ void i915_handle_error(struct drm_i915_private *i915,
> > >               wait_event(i915->gpu_error.reset_queue,
> > >                          !test_bit(I915_RESET_BACKOFF,
> > >                                    &i915->gpu_error.flags));
> > > -             goto out;
> > > +             goto out; /* piggy-back on the other reset */
> > >       }
> > >
> > 
> > I should have been more specific in last run, sorry. Still errorneous
> > comment above test_and_set_bit(I915_RESET_BACKOFF).
> > 
> > Also as I understood from RCU/checklist.txt the rcu provides
> > no help for read ordering here. The test_and_set_bit provides
> > the mb on this side. But add smp_mb__before_atomic before
> > sampling the the bit in trylock?
> 
> Wrong side, you apply that to the writer and we have the implicit mb in
> the wakeup already for the write. The atomic reads are just fine in
> conjunction with wait event serialisation.

It's even clearer than that since the write to the bit has an explicit
barrier.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
  2019-02-06 13:03 The road to load balancing Chris Wilson
                   ` (51 preceding siblings ...)
  2019-02-06 14:56 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-02-06 16:18 ` Patchwork
  52 siblings, 0 replies; 97+ messages in thread
From: Patchwork @ 2019-02-06 16:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2)
URL   : https://patchwork.freedesktop.org/series/56281/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5551_full -> Patchwork_12155_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12155_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12155_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12155_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_busy@extended-blt:
    - shard-glk:          PASS -> FAIL +8
    - shard-kbl:          NOTRUN -> FAIL

  * igt@gem_busy@extended-parallel-bsd2:
    - shard-kbl:          PASS -> FAIL +9

  * igt@gem_busy@extended-parallel-vebox:
    - shard-apl:          PASS -> FAIL +8

  * igt@gem_busy@extended-render:
    - shard-hsw:          PASS -> FAIL +8

  * igt@gem_exec_schedule@smoketest-render:
    - shard-kbl:          PASS -> TIMEOUT

  
#### Warnings ####

  * igt@gem_busy@extended-semaphore-blt:
    - shard-kbl:          {SKIP} [fdo#109271] -> FAIL +5
    - shard-glk:          {SKIP} [fdo#109271] -> FAIL +3

  * igt@gem_busy@extended-semaphore-vebox:
    - shard-apl:          {SKIP} [fdo#109271] -> FAIL +3

  
Known issues
------------

  Here are the changes found in Patchwork_12155_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_ctx_param@invalid-param-get:
    - shard-apl:          PASS -> FAIL [fdo#109559]
    - shard-glk:          PASS -> FAIL [fdo#109559]
    - shard-hsw:          PASS -> FAIL [fdo#109559]
    - shard-kbl:          PASS -> FAIL [fdo#109559]

  * igt@kms_cursor_crc@cursor-64x64-suspend:
    - shard-glk:          PASS -> FAIL [fdo#103232] +3

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-draw-mmap-gtt:
    - shard-apl:          PASS -> FAIL [fdo#103167]

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-mmap-wc:
    - shard-glk:          PASS -> FAIL [fdo#103167] +2

  * igt@kms_plane@pixel-format-pipe-b-planes-source-clamping:
    - shard-glk:          PASS -> FAIL [fdo#108948]

  * igt@kms_plane_alpha_blend@pipe-a-alpha-opaque-fb:
    - shard-glk:          PASS -> FAIL [fdo#108145] +1

  * igt@kms_plane_multiple@atomic-pipe-a-tiling-y:
    - shard-glk:          PASS -> FAIL [fdo#103166]
    - shard-apl:          PASS -> FAIL [fdo#103166] +2

  * igt@kms_vblank@pipe-a-ts-continuation-suspend:
    - shard-hsw:          PASS -> INCOMPLETE [fdo#103540]

  
#### Possible fixes ####

  * igt@gem_exec_schedule@pi-ringfull-blt:
    - shard-apl:          FAIL [fdo#103158] -> PASS +3

  * igt@gem_exec_schedule@pi-ringfull-bsd:
    - shard-glk:          FAIL [fdo#103158] -> PASS +2

  * igt@gem_exec_schedule@pi-ringfull-bsd1:
    - shard-kbl:          FAIL [fdo#103158] -> PASS +4

  * igt@gem_mmap_gtt@hang:
    - shard-kbl:          FAIL [fdo#109469] -> PASS
    - shard-hsw:          FAIL [fdo#109469] -> PASS
    - shard-apl:          FAIL [fdo#109469] -> PASS

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-kbl:          DMESG-WARN [fdo#107956] -> PASS +1

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-c:
    - shard-hsw:          DMESG-WARN [fdo#107956] -> PASS

  * igt@kms_cursor_crc@cursor-256x85-random:
    - shard-apl:          FAIL [fdo#103232] -> PASS +2
    - shard-glk:          FAIL [fdo#103232] -> PASS +2

  * igt@kms_cursor_crc@cursor-alpha-opaque:
    - shard-glk:          FAIL [fdo#109350] -> PASS

  * igt@kms_cursor_legacy@cursor-vs-flip-atomic:
    - shard-hsw:          FAIL [fdo#103355] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-move:
    - shard-apl:          FAIL [fdo#103167] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-cur-indfb-onoff:
    - shard-glk:          FAIL [fdo#103167] -> PASS +1

  * igt@kms_plane@plane-position-covered-pipe-c-planes:
    - shard-glk:          FAIL [fdo#103166] -> PASS +1

  * igt@kms_rotation_crc@multiplane-rotation:
    - shard-kbl:          DMESG-FAIL [fdo#105763] -> PASS

  * igt@kms_universal_plane@universal-plane-pipe-a-functional:
    - shard-apl:          FAIL [fdo#103166] -> PASS

  * igt@pm_rpm@system-suspend:
    - shard-kbl:          INCOMPLETE [fdo#103665] / [fdo#107807] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103158]: https://bugs.freedesktop.org/show_bug.cgi?id=103158
  [fdo#103166]: https://bugs.freedesktop.org/show_bug.cgi?id=103166
  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103355]: https://bugs.freedesktop.org/show_bug.cgi?id=103355
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#105763]: https://bugs.freedesktop.org/show_bug.cgi?id=105763
  [fdo#107807]: https://bugs.freedesktop.org/show_bug.cgi?id=107807
  [fdo#107956]: https://bugs.freedesktop.org/show_bug.cgi?id=107956
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108948]: https://bugs.freedesktop.org/show_bug.cgi?id=108948
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109350]: https://bugs.freedesktop.org/show_bug.cgi?id=109350
  [fdo#109469]: https://bugs.freedesktop.org/show_bug.cgi?id=109469
  [fdo#109559]: https://bugs.freedesktop.org/show_bug.cgi?id=109559


Participating hosts (6 -> 4)
------------------------------

  Missing    (2): shard-skl shard-iclb 


Build changes
-------------

    * Linux: CI_DRM_5551 -> Patchwork_12155

  CI_DRM_5551: 417d0e0cd0275705aed001d938e646879ee5afe9 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4812: 592b854fead32c2b0dac7198edfb9a6bffd66932 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12155: 953cd4c6741fb00f1a09652ee676d06d50c72412 @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12155/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-06 13:03 ` [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
@ 2019-02-11 11:19   ` Tvrtko Ursulin
  2019-02-19 10:22   ` Matthew Auld
  1 sibling, 0 replies; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 11:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> WAIT is occasionally suppressed by virtue of preempted requests being
> promoted to NEWCLIENT if they have not all ready received that boost.
> Make this consistent for all WAIT boosts that they are not allowed to
> preempt executing contexts and are merely granted the right to be at the
> front of the queue for the next execution slot. This is in keeping with
> the desire that the WAIT boost be a minor tweak that does not give
> excessive promotion to its user and open ourselves to trivial abuse.
> 
> The problem with the inconsistent WAIT preemption becomes more apparent
> as the preemption is propagated across the engines, where one engine may
> preempt and the other not, and we be relying on the exact execution
> order being consistent across engines (e.g. using HW semaphores to
> coordinate parallel execution).
> 
> v2: Also protect GuC submission from false preemption loops.
> v3: Build bug safeguards and better debug messages for st.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_request.c        |  12 ++
>   drivers/gpu/drm/i915/i915_scheduler.h      |   2 +
>   drivers/gpu/drm/i915/intel_lrc.c           |   9 +-
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 161 +++++++++++++++++++++
>   4 files changed, 183 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c2a5c48c7541..35acef74b93a 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -372,12 +372,24 @@ void __i915_request_submit(struct i915_request *request)
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> +
>   	request->global_seqno = seqno;
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
>   	    !i915_request_enable_breadcrumb(request))
>   		intel_engine_queue_breadcrumbs(engine);
> +
> +	/*
> +	 * As we do not allow WAIT to preempt inflight requests,
> +	 * once we have executed a request, along with triggering
> +	 * any execution callbacks, we must preserve its ordering
> +	 * within the non-preemptible FIFO.
> +	 */
> +	BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
> +	request->sched.attr.priority |= __NO_PREEMPTION;
> +
>   	spin_unlock(&request->lock);
>   
>   	engine->emit_fini_breadcrumb(request,
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index dbe9cb7ecd82..54bd6c89817e 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -33,6 +33,8 @@ enum {
>   #define I915_PRIORITY_WAIT	((u8)BIT(0))
>   #define I915_PRIORITY_NEWCLIENT	((u8)BIT(1))
>   
> +#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
> +
>   struct i915_sched_attr {
>   	/**
>   	 * @priority: execution and service priority
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5d5ce91a5dfa..afd05e25f911 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
>   	return rq->sched.attr.priority;
>   }
>   
> +static int effective_prio(const struct i915_request *rq)
> +{
> +	/* Restrict mere WAIT boosts from triggering preemption */
> +	return rq_prio(rq) | __NO_PREEMPTION;
> +}
> +
>   static int queue_prio(const struct intel_engine_execlists *execlists)
>   {
>   	struct i915_priolist *p;
> @@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
>   				const struct i915_request *rq)
>   {
> -	const int last_prio = rq_prio(rq);
> +	int last_prio;
>   
>   	if (!intel_engine_has_preemption(engine))
>   		return false;
> @@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	 * preempt. If that hint is stale or we may be trying to preempt
>   	 * ourselves, ignore the request.
>   	 */
> +	last_prio = effective_prio(rq);
>   	if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
>   				      last_prio))
>   		return false;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 58144e024751..263afd2f1596 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -407,6 +407,166 @@ static int live_suppress_self_preempt(void *arg)
>   	goto err_client_b;
>   }
>   
> +static int __i915_sw_fence_call
> +dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +	return NOTIFY_DONE;
> +}
> +
> +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> +{
> +	struct i915_request *rq;
> +
> +	rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> +	if (!rq)
> +		return NULL;
> +
> +	INIT_LIST_HEAD(&rq->active_list);
> +	rq->engine = engine;
> +
> +	i915_sched_node_init(&rq->sched);
> +
> +	/* mark this request as permanently incomplete */
> +	rq->fence.seqno = 1;
> +	BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> +	rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> +	GEM_BUG_ON(i915_request_completed(rq));
> +
> +	i915_sw_fence_init(&rq->submit, dummy_notify);
> +	i915_sw_fence_commit(&rq->submit);
> +
> +	return rq;
> +}
> +
> +static void dummy_request_free(struct i915_request *dummy)
> +{
> +	i915_request_mark_complete(dummy);
> +	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> +	kfree(dummy);
> +}
> +
> +static int live_suppress_wait_preempt(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct preempt_client client[4];
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	intel_wakeref_t wakeref;
> +	int err = -ENOMEM;
> +	int i;
> +
> +	/*
> +	 * Waiters are given a little priority nudge, but not enough
> +	 * to actually cause any preemption. Double check that we do
> +	 * not needlessly generate preempt-to-idle cycles.
> +	 */
> +
> +	if (!HAS_LOGICAL_RING_PREEMPTION(i915))
> +		return 0;
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +	wakeref = intel_runtime_pm_get(i915);
> +
> +	if (preempt_client_init(i915, &client[0])) /* ELSP[0] */
> +		goto err_unlock;
> +	if (preempt_client_init(i915, &client[1])) /* ELSP[1] */
> +		goto err_client_0;
> +	if (preempt_client_init(i915, &client[2])) /* head of queue */
> +		goto err_client_1;
> +	if (preempt_client_init(i915, &client[3])) /* bystander */
> +		goto err_client_2;
> +
> +	for_each_engine(engine, i915, id) {
> +		int depth;
> +
> +		if (!engine->emit_init_breadcrumb)
> +			continue;
> +
> +		for (depth = 0; depth < ARRAY_SIZE(client); depth++) {
> +			struct i915_request *rq[ARRAY_SIZE(client)];
> +			struct i915_request *dummy;
> +
> +			engine->execlists.preempt_hang.count = 0;
> +
> +			dummy = dummy_request(engine);
> +			if (!dummy)
> +				goto err_client_3;
> +
> +			for (i = 0; i < ARRAY_SIZE(client); i++) {
> +				rq[i] = igt_spinner_create_request(&client[i].spin,
> +								   client[i].ctx, engine,
> +								   MI_NOOP);
> +				if (IS_ERR(rq[i])) {
> +					err = PTR_ERR(rq[i]);
> +					goto err_wedged;
> +				}
> +
> +				/* Disable NEWCLIENT promotion */
> +				__i915_active_request_set(&rq[i]->timeline->last_request,
> +							  dummy);
> +				i915_request_add(rq[i]);
> +			}
> +
> +			dummy_request_free(dummy);
> +
> +			GEM_BUG_ON(i915_request_completed(rq[0]));
> +			if (!igt_wait_for_spinner(&client[0].spin, rq[0])) {
> +				pr_err("%s: First client failed to start\n",
> +				       engine->name);
> +				goto err_wedged;
> +			}
> +			GEM_BUG_ON(!i915_request_started(rq[0]));
> +
> +			if (i915_request_wait(rq[depth],
> +					      I915_WAIT_LOCKED |
> +					      I915_WAIT_PRIORITY,
> +					      1) != -ETIME) {
> +				pr_err("%s: Waiter depth:%d completed!\n",
> +				       engine->name, depth);
> +				goto err_wedged;
> +			}
> +
> +			for (i = 0; i < ARRAY_SIZE(client); i++)
> +				igt_spinner_end(&client[i].spin);
> +
> +			if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +				goto err_wedged;
> +
> +			if (engine->execlists.preempt_hang.count) {
> +				pr_err("%s: Preemption recorded x%d, depth %d; should have been suppressed!\n",
> +				       engine->name,
> +				       engine->execlists.preempt_hang.count,
> +				       depth);
> +				err = -EINVAL;
> +				goto err_client_3;
> +			}
> +		}
> +	}
> +
> +	err = 0;
> +err_client_3:
> +	preempt_client_fini(&client[3]);
> +err_client_2:
> +	preempt_client_fini(&client[2]);
> +err_client_1:
> +	preempt_client_fini(&client[1]);
> +err_client_0:
> +	preempt_client_fini(&client[0]);
> +err_unlock:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +	intel_runtime_pm_put(i915, wakeref);
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +
> +err_wedged:
> +	for (i = 0; i < ARRAY_SIZE(client); i++)
> +		igt_spinner_end(&client[i].spin);
> +	i915_gem_set_wedged(i915);
> +	err = -EIO;
> +	goto err_client_3;
> +}
> +
>   static int live_chain_preempt(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> @@ -887,6 +1047,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_preempt),
>   		SUBTEST(live_late_preempt),
>   		SUBTEST(live_suppress_self_preempt),
> +		SUBTEST(live_suppress_wait_preempt),
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 10/46] drm/i915: Make request allocation caches global
  2019-02-06 13:03 ` [PATCH 10/46] drm/i915: Make request allocation caches global Chris Wilson
@ 2019-02-11 11:43   ` Tvrtko Ursulin
  2019-02-11 12:40     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 11:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> As kmem_caches share the same properties (size, allocation/free behaviour)
> for all potential devices, we can use global caches. While this
> potential has worse fragmentation behaviour (one can argue that
> different devices would have different activity lifetimes, but you can
> also argue that activity is temporal across the system) it is the
> default behaviour of the system at large to amalgamate matching caches.
> 
> The benefit for us is much reduced pointer dancing along the frequent
> allocation paths.
> 
> v2: Defer shrinking until after a global grace period for futureproofing
> multiple consumers of the slab caches, similar to the current strategy
> for avoiding shrinking too early.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile                 |   1 +
>   drivers/gpu/drm/i915/i915_active.c            |   7 +-
>   drivers/gpu/drm/i915/i915_active.h            |   1 +
>   drivers/gpu/drm/i915/i915_drv.h               |   3 -
>   drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
>   drivers/gpu/drm/i915/i915_globals.c           | 105 ++++++++++++++++++
>   drivers/gpu/drm/i915/i915_globals.h           |  15 +++
>   drivers/gpu/drm/i915/i915_pci.c               |   8 +-
>   drivers/gpu/drm/i915/i915_request.c           |  53 +++++++--
>   drivers/gpu/drm/i915/i915_request.h           |  10 ++
>   drivers/gpu/drm/i915/i915_scheduler.c         |  66 ++++++++---
>   drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
>   drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
>   drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
>   drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  48 ++++----
>   .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 -----
>   drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
>   drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
>   20 files changed, 306 insertions(+), 152 deletions(-)
>   create mode 100644 drivers/gpu/drm/i915/i915_globals.c
>   create mode 100644 drivers/gpu/drm/i915/i915_globals.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 1787e1299b1b..a1d834068765 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -77,6 +77,7 @@ i915-y += \
>   	  i915_gem_tiling.o \
>   	  i915_gem_userptr.o \
>   	  i915_gemfs.o \
> +	  i915_globals.o \
>   	  i915_query.o \
>   	  i915_request.o \
>   	  i915_scheduler.o \
> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> index 215b6ff8aa73..9026787ebdf8 100644
> --- a/drivers/gpu/drm/i915/i915_active.c
> +++ b/drivers/gpu/drm/i915/i915_active.c
> @@ -280,7 +280,12 @@ int __init i915_global_active_init(void)
>   	return 0;
>   }
>   
> -void __exit i915_global_active_exit(void)
> +void i915_global_active_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_cache);
> +}
> +
> +void i915_global_active_exit(void)
>   {
>   	kmem_cache_destroy(global.slab_cache);
>   }
> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> index 12b5c1d287d1..5fbd9102384b 100644
> --- a/drivers/gpu/drm/i915/i915_active.h
> +++ b/drivers/gpu/drm/i915/i915_active.h
> @@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
>   #endif
>   
>   int i915_global_active_init(void);
> +void i915_global_active_shrink(void);
>   void i915_global_active_exit(void);
>   
>   #endif /* _I915_ACTIVE_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 37230ae7fbe6..a365b1a2ea9a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1459,9 +1459,6 @@ struct drm_i915_private {
>   	struct kmem_cache *objects;
>   	struct kmem_cache *vmas;
>   	struct kmem_cache *luts;
> -	struct kmem_cache *requests;
> -	struct kmem_cache *dependencies;
> -	struct kmem_cache *priorities;
>   
>   	const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
>   	struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 1eb3a5f8654c..d18c4ccff370 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -42,6 +42,7 @@
>   #include "i915_drv.h"
>   #include "i915_gem_clflush.h"
>   #include "i915_gemfs.h"
> +#include "i915_globals.h"
>   #include "i915_reset.h"
>   #include "i915_trace.h"
>   #include "i915_vgpu.h"
> @@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
>   	if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
>   		i915->gt.epoch = 1;
>   
> +	i915_globals_unpark();
> +
>   	intel_enable_gt_powersave(i915);
>   	i915_update_gfx_val(i915);
>   	if (INTEL_GEN(i915) >= 6)
> @@ -2916,12 +2919,11 @@ static void shrink_caches(struct drm_i915_private *i915)
>   	 * filled slabs to prioritise allocating from the mostly full slabs,
>   	 * with the aim of reducing fragmentation.
>   	 */
> -	kmem_cache_shrink(i915->priorities);
> -	kmem_cache_shrink(i915->dependencies);
> -	kmem_cache_shrink(i915->requests);
>   	kmem_cache_shrink(i915->luts);
>   	kmem_cache_shrink(i915->vmas);
>   	kmem_cache_shrink(i915->objects);
> +
> +	i915_globals_park();

Slightly confusing that the shrink caches path calls globals_park - ie 
after the device has been parked. Would i915_globals_shrink and 
__i915_globals_shrink be clearer? Not sure.

>   }
>   
>   struct sleep_rcu_work {
> @@ -5264,23 +5266,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   	if (!dev_priv->luts)
>   		goto err_vmas;
>   
> -	dev_priv->requests = KMEM_CACHE(i915_request,
> -					SLAB_HWCACHE_ALIGN |
> -					SLAB_RECLAIM_ACCOUNT |
> -					SLAB_TYPESAFE_BY_RCU);
> -	if (!dev_priv->requests)
> -		goto err_luts;
> -
> -	dev_priv->dependencies = KMEM_CACHE(i915_dependency,
> -					    SLAB_HWCACHE_ALIGN |
> -					    SLAB_RECLAIM_ACCOUNT);
> -	if (!dev_priv->dependencies)
> -		goto err_requests;
> -
> -	dev_priv->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
> -	if (!dev_priv->priorities)
> -		goto err_dependencies;
> -
>   	INIT_LIST_HEAD(&dev_priv->gt.active_rings);
>   	INIT_LIST_HEAD(&dev_priv->gt.closed_vma);
>   
> @@ -5305,12 +5290,6 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>   
>   	return 0;
>   
> -err_dependencies:
> -	kmem_cache_destroy(dev_priv->dependencies);
> -err_requests:
> -	kmem_cache_destroy(dev_priv->requests);
> -err_luts:
> -	kmem_cache_destroy(dev_priv->luts);
>   err_vmas:
>   	kmem_cache_destroy(dev_priv->vmas);
>   err_objects:
> @@ -5328,9 +5307,6 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>   
>   	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
>   
> -	kmem_cache_destroy(dev_priv->priorities);
> -	kmem_cache_destroy(dev_priv->dependencies);
> -	kmem_cache_destroy(dev_priv->requests);
>   	kmem_cache_destroy(dev_priv->luts);
>   	kmem_cache_destroy(dev_priv->vmas);
>   	kmem_cache_destroy(dev_priv->objects);
> diff --git a/drivers/gpu/drm/i915/i915_globals.c b/drivers/gpu/drm/i915/i915_globals.c
> new file mode 100644
> index 000000000000..82ee6b1e7227
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_globals.c
> @@ -0,0 +1,105 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/workqueue.h>
> +
> +#include "i915_active.h"
> +#include "i915_globals.h"
> +#include "i915_request.h"
> +#include "i915_scheduler.h"
> +
> +int __init i915_globals_init(void)
> +{
> +	int err;
> +
> +	err = i915_global_active_init();
> +	if (err)
> +		return err;
> +
> +	err = i915_global_request_init();
> +	if (err)
> +		goto err_active;
> +
> +	err = i915_global_scheduler_init();
> +	if (err)
> +		goto err_request;
> +
> +	return 0;
> +
> +err_request:
> +	i915_global_request_exit();
> +err_active:
> +	i915_global_active_exit();
> +	return err;
> +}
> +
> +static void i915_globals_shrink(void)
> +{
> +	i915_global_active_shrink();
> +	i915_global_request_shrink();
> +	i915_global_scheduler_shrink();
> +}
> +
> +static atomic_t active;
> +static atomic_t epoch;
> +struct park_work {
> +	struct rcu_work work;
> +	int epoch;
> +};
> +
> +static void __i915_globals_park(struct work_struct *work)
> +{
> +	struct park_work *wrk = container_of(work, typeof(*wrk), work.work);
> +
> +	/* Confirm nothing woke up in the last grace period */
> +	if (wrk->epoch == atomic_read(&epoch))
> +		i915_globals_shrink();
> +
> +	kfree(wrk);
> +}
> +
> +void i915_globals_park(void)
> +{
> +	struct park_work *wrk;
> +
> +	/*
> +	 * Defer shrinking the global slab caches (and other work) until
> +	 * after a RCU grace period has completed with no activity. This
> +	 * is to try and reduce the latency impact on the consumers caused
> +	 * by us shrinking the caches the same time as they are trying to
> +	 * allocate, with the assumption being that if we idle long enough
> +	 * for an RCU grace period to elapse since the last use, it is likely
> +	 * to be longer until we need the caches again.
> +	 */
> +	if (!atomic_dec_and_test(&active))
> +		return;
> +
> +	wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
> +	if (!wrk)
> +		return;
> +
> +	wrk->epoch = atomic_inc_return(&epoch);

Do you need to bump the epoch here? Unpark would bump it so 
automatically when rcu work gets to run it would fail already. Like this 
it sounds like double increment. I don't see a problem with the double 
increment I just failed to spot if it is actually needed for some subtle 
reason. There would be a potential race with multiple device park 
callers storing the same epoch but is that really a problem? Again, as 
soon as someone unparks it seems like it would be the right thing.

Regards,

Tvrtko

> +	INIT_RCU_WORK(&wrk->work, __i915_globals_park);
> +	queue_rcu_work(system_wq, &wrk->work);
> +}
> +
> +void i915_globals_unpark(void)
> +{
> +	atomic_inc(&epoch);
> +	atomic_inc(&active);
> +}
> +
> +void __exit i915_globals_exit(void)
> +{
> +	/* Flush any residual park_work */
> +	rcu_barrier();
> +	flush_scheduled_work();
> +
> +	i915_global_scheduler_exit();
> +	i915_global_request_exit();
> +	i915_global_active_exit();
> +}
> diff --git a/drivers/gpu/drm/i915/i915_globals.h b/drivers/gpu/drm/i915/i915_globals.h
> new file mode 100644
> index 000000000000..e468f0413a73
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_globals.h
> @@ -0,0 +1,15 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2019 Intel Corporation
> + */
> +
> +#ifndef _I915_GLOBALS_H_
> +#define _I915_GLOBALS_H_
> +
> +int i915_globals_init(void);
> +void i915_globals_park(void);
> +void i915_globals_unpark(void);
> +void i915_globals_exit(void);
> +
> +#endif /* _I915_GLOBALS_H_ */
> diff --git a/drivers/gpu/drm/i915/i915_pci.c b/drivers/gpu/drm/i915/i915_pci.c
> index 66f82f3f050f..b73e8d63b1af 100644
> --- a/drivers/gpu/drm/i915/i915_pci.c
> +++ b/drivers/gpu/drm/i915/i915_pci.c
> @@ -28,8 +28,8 @@
>   
>   #include <drm/drm_drv.h>
>   
> -#include "i915_active.h"
>   #include "i915_drv.h"
> +#include "i915_globals.h"
>   #include "i915_selftest.h"
>   
>   #define PLATFORM(x) .platform = (x), .platform_mask = BIT(x)
> @@ -801,7 +801,9 @@ static int __init i915_init(void)
>   	bool use_kms = true;
>   	int err;
>   
> -	i915_global_active_init();
> +	err = i915_globals_init();
> +	if (err)
> +		return err;
>   
>   	err = i915_mock_selftests();
>   	if (err)
> @@ -834,7 +836,7 @@ static void __exit i915_exit(void)
>   		return;
>   
>   	pci_unregister_driver(&i915_pci_driver);
> -	i915_global_active_exit();
> +	i915_globals_exit();
>   }
>   
>   module_init(i915_init);
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 35acef74b93a..174d15c9dd00 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -32,6 +32,11 @@
>   #include "i915_active.h"
>   #include "i915_reset.h"
>   
> +static struct i915_global_request {
> +	struct kmem_cache *slab_requests;
> +	struct kmem_cache *slab_dependencies;
> +} global;
> +
>   static const char *i915_fence_get_driver_name(struct dma_fence *fence)
>   {
>   	return "i915";
> @@ -84,7 +89,7 @@ static void i915_fence_release(struct dma_fence *fence)
>   	 */
>   	i915_sw_fence_fini(&rq->submit);
>   
> -	kmem_cache_free(rq->i915->requests, rq);
> +	kmem_cache_free(global.slab_requests, rq);
>   }
>   
>   const struct dma_fence_ops i915_fence_ops = {
> @@ -296,7 +301,7 @@ static void i915_request_retire(struct i915_request *request)
>   
>   	unreserve_gt(request->i915);
>   
> -	i915_sched_node_fini(request->i915, &request->sched);
> +	i915_sched_node_fini(&request->sched);
>   	i915_request_put(request);
>   }
>   
> @@ -530,7 +535,7 @@ i915_request_alloc_slow(struct intel_context *ce)
>   	ring_retire_requests(ring);
>   
>   out:
> -	return kmem_cache_alloc(ce->gem_context->i915->requests, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_requests, GFP_KERNEL);
>   }
>   
>   static int add_timeline_barrier(struct i915_request *rq)
> @@ -617,7 +622,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	 *
>   	 * Do not use kmem_cache_zalloc() here!
>   	 */
> -	rq = kmem_cache_alloc(i915->requests,
> +	rq = kmem_cache_alloc(global.slab_requests,
>   			      GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN);
>   	if (unlikely(!rq)) {
>   		rq = i915_request_alloc_slow(ce);
> @@ -705,7 +710,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	GEM_BUG_ON(!list_empty(&rq->sched.signalers_list));
>   	GEM_BUG_ON(!list_empty(&rq->sched.waiters_list));
>   
> -	kmem_cache_free(i915->requests, rq);
> +	kmem_cache_free(global.slab_requests, rq);
>   err_unreserve:
>   	unreserve_gt(i915);
>   	intel_context_unpin(ce);
> @@ -724,9 +729,7 @@ i915_request_await_request(struct i915_request *to, struct i915_request *from)
>   		return 0;
>   
>   	if (to->engine->schedule) {
> -		ret = i915_sched_node_add_dependency(to->i915,
> -						     &to->sched,
> -						     &from->sched);
> +		ret = i915_sched_node_add_dependency(&to->sched, &from->sched);
>   		if (ret < 0)
>   			return ret;
>   	}
> @@ -1199,3 +1202,37 @@ void i915_retire_requests(struct drm_i915_private *i915)
>   #include "selftests/mock_request.c"
>   #include "selftests/i915_request.c"
>   #endif
> +
> +int __init i915_global_request_init(void)
> +{
> +	global.slab_requests = KMEM_CACHE(i915_request,
> +					  SLAB_HWCACHE_ALIGN |
> +					  SLAB_RECLAIM_ACCOUNT |
> +					  SLAB_TYPESAFE_BY_RCU);
> +	if (!global.slab_requests)
> +		return -ENOMEM;
> +
> +	global.slab_dependencies = KMEM_CACHE(i915_dependency,
> +					      SLAB_HWCACHE_ALIGN |
> +					      SLAB_RECLAIM_ACCOUNT);
> +	if (!global.slab_dependencies)
> +		goto err_requests;
> +
> +	return 0;
> +
> +err_requests:
> +	kmem_cache_destroy(global.slab_requests);
> +	return -ENOMEM;
> +}
> +
> +void i915_global_request_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_requests);
> +}
> +
> +void i915_global_request_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_requests);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 40f3e8dcbdd5..071ff1064579 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -29,6 +29,7 @@
>   
>   #include "i915_gem.h"
>   #include "i915_scheduler.h"
> +#include "i915_selftest.h"
>   #include "i915_sw_fence.h"
>   
>   #include <uapi/drm/i915_drm.h>
> @@ -204,6 +205,11 @@ struct i915_request {
>   	struct drm_i915_file_private *file_priv;
>   	/** file_priv list entry for this request */
>   	struct list_head client_link;
> +
> +	I915_SELFTEST_DECLARE(struct {
> +		struct list_head link;
> +		unsigned long delay;
> +	} mock;)
>   };
>   
>   #define I915_FENCE_GFP (GFP_KERNEL | __GFP_RETRY_MAYFAIL | __GFP_NOWARN)
> @@ -403,4 +409,8 @@ static inline void i915_request_mark_complete(struct i915_request *rq)
>   
>   void i915_retire_requests(struct drm_i915_private *i915);
>   
> +int i915_global_request_init(void);
> +void i915_global_request_shrink(void);
> +void i915_global_request_exit(void);
> +
>   #endif /* I915_REQUEST_H */
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index d01683167c77..720cc91b4d10 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -10,6 +10,11 @@
>   #include "i915_request.h"
>   #include "i915_scheduler.h"
>   
> +static struct i915_global_scheduler {
> +	struct kmem_cache *slab_dependencies;
> +	struct kmem_cache *slab_priorities;
> +} global;
> +
>   static DEFINE_SPINLOCK(schedule_lock);
>   
>   static const struct i915_request *
> @@ -32,16 +37,15 @@ void i915_sched_node_init(struct i915_sched_node *node)
>   }
>   
>   static struct i915_dependency *
> -i915_dependency_alloc(struct drm_i915_private *i915)
> +i915_dependency_alloc(void)
>   {
> -	return kmem_cache_alloc(i915->dependencies, GFP_KERNEL);
> +	return kmem_cache_alloc(global.slab_dependencies, GFP_KERNEL);
>   }
>   
>   static void
> -i915_dependency_free(struct drm_i915_private *i915,
> -		     struct i915_dependency *dep)
> +i915_dependency_free(struct i915_dependency *dep)
>   {
> -	kmem_cache_free(i915->dependencies, dep);
> +	kmem_cache_free(global.slab_dependencies, dep);
>   }
>   
>   bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
> @@ -68,25 +72,23 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   	return ret;
>   }
>   
> -int i915_sched_node_add_dependency(struct drm_i915_private *i915,
> -				   struct i915_sched_node *node,
> +int i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				   struct i915_sched_node *signal)
>   {
>   	struct i915_dependency *dep;
>   
> -	dep = i915_dependency_alloc(i915);
> +	dep = i915_dependency_alloc();
>   	if (!dep)
>   		return -ENOMEM;
>   
>   	if (!__i915_sched_node_add_dependency(node, signal, dep,
>   					      I915_DEPENDENCY_ALLOC))
> -		i915_dependency_free(i915, dep);
> +		i915_dependency_free(dep);
>   
>   	return 0;
>   }
>   
> -void i915_sched_node_fini(struct drm_i915_private *i915,
> -			  struct i915_sched_node *node)
> +void i915_sched_node_fini(struct i915_sched_node *node)
>   {
>   	struct i915_dependency *dep, *tmp;
>   
> @@ -106,7 +108,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
>   
>   		list_del(&dep->wait_link);
>   		if (dep->flags & I915_DEPENDENCY_ALLOC)
> -			i915_dependency_free(i915, dep);
> +			i915_dependency_free(dep);
>   	}
>   
>   	/* Remove ourselves from everyone who depends upon us */
> @@ -116,7 +118,7 @@ void i915_sched_node_fini(struct drm_i915_private *i915,
>   
>   		list_del(&dep->signal_link);
>   		if (dep->flags & I915_DEPENDENCY_ALLOC)
> -			i915_dependency_free(i915, dep);
> +			i915_dependency_free(dep);
>   	}
>   
>   	spin_unlock(&schedule_lock);
> @@ -193,7 +195,7 @@ i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio)
>   	if (prio == I915_PRIORITY_NORMAL) {
>   		p = &execlists->default_priolist;
>   	} else {
> -		p = kmem_cache_alloc(engine->i915->priorities, GFP_ATOMIC);
> +		p = kmem_cache_alloc(global.slab_priorities, GFP_ATOMIC);
>   		/* Convert an allocation failure to a priority bump */
>   		if (unlikely(!p)) {
>   			prio = I915_PRIORITY_NORMAL; /* recurses just once */
> @@ -408,3 +410,39 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump)
>   
>   	spin_unlock_bh(&schedule_lock);
>   }
> +
> +void __i915_priolist_free(struct i915_priolist *p)
> +{
> +	kmem_cache_free(global.slab_priorities, p);
> +}
> +
> +int __init i915_global_scheduler_init(void)
> +{
> +	global.slab_dependencies = KMEM_CACHE(i915_dependency,
> +					      SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_dependencies)
> +		return -ENOMEM;
> +
> +	global.slab_priorities = KMEM_CACHE(i915_priolist,
> +					    SLAB_HWCACHE_ALIGN);
> +	if (!global.slab_priorities)
> +		goto err_priorities;
> +
> +	return 0;
> +
> +err_priorities:
> +	kmem_cache_destroy(global.slab_priorities);
> +	return -ENOMEM;
> +}
> +
> +void i915_global_scheduler_shrink(void)
> +{
> +	kmem_cache_shrink(global.slab_dependencies);
> +	kmem_cache_shrink(global.slab_priorities);
> +}
> +
> +void i915_global_scheduler_exit(void)
> +{
> +	kmem_cache_destroy(global.slab_dependencies);
> +	kmem_cache_destroy(global.slab_priorities);
> +}
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index 54bd6c89817e..5196ce07b6c2 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -85,6 +85,23 @@ struct i915_dependency {
>   #define I915_DEPENDENCY_ALLOC BIT(0)
>   };
>   
> +struct i915_priolist {
> +	struct list_head requests[I915_PRIORITY_COUNT];
> +	struct rb_node node;
> +	unsigned long used;
> +	int priority;
> +};
> +
> +#define priolist_for_each_request(it, plist, idx) \
> +	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
> +		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> +
> +#define priolist_for_each_request_consume(it, n, plist, idx) \
> +	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> +		list_for_each_entry_safe(it, n, \
> +					 &(plist)->requests[idx - 1], \
> +					 sched.link)
> +
>   void i915_sched_node_init(struct i915_sched_node *node);
>   
>   bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
> @@ -92,12 +109,10 @@ bool __i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				      struct i915_dependency *dep,
>   				      unsigned long flags);
>   
> -int i915_sched_node_add_dependency(struct drm_i915_private *i915,
> -				   struct i915_sched_node *node,
> +int i915_sched_node_add_dependency(struct i915_sched_node *node,
>   				   struct i915_sched_node *signal);
>   
> -void i915_sched_node_fini(struct drm_i915_private *i915,
> -			  struct i915_sched_node *node);
> +void i915_sched_node_fini(struct i915_sched_node *node);
>   
>   void i915_schedule(struct i915_request *request,
>   		   const struct i915_sched_attr *attr);
> @@ -107,4 +122,15 @@ void i915_schedule_bump_priority(struct i915_request *rq, unsigned int bump);
>   struct list_head *
>   i915_sched_lookup_priolist(struct intel_engine_cs *engine, int prio);
>   
> +void __i915_priolist_free(struct i915_priolist *p);
> +static inline void i915_priolist_free(struct i915_priolist *p)
> +{
> +	if (p->priority != I915_PRIORITY_NORMAL)
> +		__i915_priolist_free(p);
> +}
> +
> +int i915_global_scheduler_init(void);
> +void i915_global_scheduler_shrink(void);
> +void i915_global_scheduler_exit(void);
> +
>   #endif /* _I915_SCHEDULER_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 8bc8aa54aa35..4cf94513615d 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -781,8 +781,7 @@ static bool __guc_dequeue(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   done:
>   	execlists->queue_priority_hint =
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 58108aa290d8..553371e654d7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -806,8 +806,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   
>   done:
> @@ -966,8 +965,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   		}
>   
>   		rb_erase_cached(&p->node, &execlists->queue);
> -		if (p->priority != I915_PRIORITY_NORMAL)
> -			kmem_cache_free(engine->i915->priorities, p);
> +		i915_priolist_free(p);
>   	}
>   
>   	intel_write_status_page(engine,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 710ffb221775..e7d85aaee415 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -185,23 +185,6 @@ enum intel_engine_id {
>   #define _VECS(n) (VECS + (n))
>   };
>   
> -struct i915_priolist {
> -	struct list_head requests[I915_PRIORITY_COUNT];
> -	struct rb_node node;
> -	unsigned long used;
> -	int priority;
> -};
> -
> -#define priolist_for_each_request(it, plist, idx) \
> -	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
> -		list_for_each_entry(it, &(plist)->requests[idx], sched.link)
> -
> -#define priolist_for_each_request_consume(it, n, plist, idx) \
> -	for (; (idx = ffs((plist)->used)); (plist)->used &= ~BIT(idx - 1)) \
> -		list_for_each_entry_safe(it, n, \
> -					 &(plist)->requests[idx - 1], \
> -					 sched.link)
> -
>   struct st_preempt_hang {
>   	struct completion completion;
>   	unsigned int count;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 263afd2f1596..1a3af4b4107d 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -441,7 +441,7 @@ static struct i915_request *dummy_request(struct intel_engine_cs *engine)
>   static void dummy_request_free(struct i915_request *dummy)
>   {
>   	i915_request_mark_complete(dummy);
> -	i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> +	i915_sched_node_fini(&dummy->sched);
>   	kfree(dummy);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 08f0cab02e0f..0d35af07867b 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -76,28 +76,27 @@ static void mock_ring_free(struct intel_ring *base)
>   	kfree(ring);
>   }
>   
> -static struct mock_request *first_request(struct mock_engine *engine)
> +static struct i915_request *first_request(struct mock_engine *engine)
>   {
>   	return list_first_entry_or_null(&engine->hw_queue,
> -					struct mock_request,
> -					link);
> +					struct i915_request,
> +					mock.link);
>   }
>   
> -static void advance(struct mock_request *request)
> +static void advance(struct i915_request *request)
>   {
> -	list_del_init(&request->link);
> -	intel_engine_write_global_seqno(request->base.engine,
> -					request->base.global_seqno);
> -	i915_request_mark_complete(&request->base);
> -	GEM_BUG_ON(!i915_request_completed(&request->base));
> +	list_del_init(&request->mock.link);
> +	intel_engine_write_global_seqno(request->engine, request->global_seqno);
> +	i915_request_mark_complete(request);
> +	GEM_BUG_ON(!i915_request_completed(request));
>   
> -	intel_engine_queue_breadcrumbs(request->base.engine);
> +	intel_engine_queue_breadcrumbs(request->engine);
>   }
>   
>   static void hw_delay_complete(struct timer_list *t)
>   {
>   	struct mock_engine *engine = from_timer(engine, t, hw_delay);
> -	struct mock_request *request;
> +	struct i915_request *request;
>   	unsigned long flags;
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
> @@ -112,8 +111,9 @@ static void hw_delay_complete(struct timer_list *t)
>   	 * requeue the timer for the next delayed request.
>   	 */
>   	while ((request = first_request(engine))) {
> -		if (request->delay) {
> -			mod_timer(&engine->hw_delay, jiffies + request->delay);
> +		if (request->mock.delay) {
> +			mod_timer(&engine->hw_delay,
> +				  jiffies + request->mock.delay);
>   			break;
>   		}
>   
> @@ -171,10 +171,8 @@ mock_context_pin(struct intel_engine_cs *engine,
>   
>   static int mock_request_alloc(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
> -
> -	INIT_LIST_HEAD(&mock->link);
> -	mock->delay = 0;
> +	INIT_LIST_HEAD(&request->mock.link);
> +	request->mock.delay = 0;
>   
>   	return 0;
>   }
> @@ -192,7 +190,6 @@ static u32 *mock_emit_breadcrumb(struct i915_request *request, u32 *cs)
>   
>   static void mock_submit_request(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
>   	unsigned long flags;
> @@ -201,12 +198,13 @@ static void mock_submit_request(struct i915_request *request)
>   	GEM_BUG_ON(!request->global_seqno);
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
> -	list_add_tail(&mock->link, &engine->hw_queue);
> -	if (mock->link.prev == &engine->hw_queue) {
> -		if (mock->delay)
> -			mod_timer(&engine->hw_delay, jiffies + mock->delay);
> +	list_add_tail(&request->mock.link, &engine->hw_queue);
> +	if (list_is_first(&request->mock.link, &engine->hw_queue)) {
> +		if (request->mock.delay)
> +			mod_timer(&engine->hw_delay,
> +				  jiffies + request->mock.delay);
>   		else
> -			advance(mock);
> +			advance(request);
>   	}
>   	spin_unlock_irqrestore(&engine->hw_lock, flags);
>   }
> @@ -266,12 +264,12 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   {
>   	struct mock_engine *mock =
>   		container_of(engine, typeof(*mock), base);
> -	struct mock_request *request, *rn;
> +	struct i915_request *request, *rn;
>   
>   	del_timer_sync(&mock->hw_delay);
>   
>   	spin_lock_irq(&mock->hw_lock);
> -	list_for_each_entry_safe(request, rn, &mock->hw_queue, link)
> +	list_for_each_entry_safe(request, rn, &mock->hw_queue, mock.link)
>   		advance(request);
>   	spin_unlock_irq(&mock->hw_lock);
>   }
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index fc516a2970f4..5a98caba6d69 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -79,9 +79,6 @@ static void mock_device_release(struct drm_device *dev)
>   
>   	destroy_workqueue(i915->wq);
>   
> -	kmem_cache_destroy(i915->priorities);
> -	kmem_cache_destroy(i915->dependencies);
> -	kmem_cache_destroy(i915->requests);
>   	kmem_cache_destroy(i915->vmas);
>   	kmem_cache_destroy(i915->objects);
>   
> @@ -211,23 +208,6 @@ struct drm_i915_private *mock_gem_device(void)
>   	if (!i915->vmas)
>   		goto err_objects;
>   
> -	i915->requests = KMEM_CACHE(mock_request,
> -				    SLAB_HWCACHE_ALIGN |
> -				    SLAB_RECLAIM_ACCOUNT |
> -				    SLAB_TYPESAFE_BY_RCU);
> -	if (!i915->requests)
> -		goto err_vmas;
> -
> -	i915->dependencies = KMEM_CACHE(i915_dependency,
> -					SLAB_HWCACHE_ALIGN |
> -					SLAB_RECLAIM_ACCOUNT);
> -	if (!i915->dependencies)
> -		goto err_requests;
> -
> -	i915->priorities = KMEM_CACHE(i915_priolist, SLAB_HWCACHE_ALIGN);
> -	if (!i915->priorities)
> -		goto err_dependencies;
> -
>   	i915_timelines_init(i915);
>   
>   	INIT_LIST_HEAD(&i915->gt.active_rings);
> @@ -257,12 +237,6 @@ struct drm_i915_private *mock_gem_device(void)
>   err_unlock:
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	i915_timelines_fini(i915);
> -	kmem_cache_destroy(i915->priorities);
> -err_dependencies:
> -	kmem_cache_destroy(i915->dependencies);
> -err_requests:
> -	kmem_cache_destroy(i915->requests);
> -err_vmas:
>   	kmem_cache_destroy(i915->vmas);
>   err_objects:
>   	kmem_cache_destroy(i915->objects);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_request.c b/drivers/gpu/drm/i915/selftests/mock_request.c
> index 0dc29e242597..d1a7c9608712 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_request.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_request.c
> @@ -31,29 +31,25 @@ mock_request(struct intel_engine_cs *engine,
>   	     unsigned long delay)
>   {
>   	struct i915_request *request;
> -	struct mock_request *mock;
>   
>   	/* NB the i915->requests slab cache is enlarged to fit mock_request */
>   	request = i915_request_alloc(engine, context);
>   	if (IS_ERR(request))
>   		return NULL;
>   
> -	mock = container_of(request, typeof(*mock), base);
> -	mock->delay = delay;
> -
> -	return &mock->base;
> +	request->mock.delay = delay;
> +	return request;
>   }
>   
>   bool mock_cancel_request(struct i915_request *request)
>   {
> -	struct mock_request *mock = container_of(request, typeof(*mock), base);
>   	struct mock_engine *engine =
>   		container_of(request->engine, typeof(*engine), base);
>   	bool was_queued;
>   
>   	spin_lock_irq(&engine->hw_lock);
> -	was_queued = !list_empty(&mock->link);
> -	list_del_init(&mock->link);
> +	was_queued = !list_empty(&request->mock.link);
> +	list_del_init(&request->mock.link);
>   	spin_unlock_irq(&engine->hw_lock);
>   
>   	if (was_queued)
> diff --git a/drivers/gpu/drm/i915/selftests/mock_request.h b/drivers/gpu/drm/i915/selftests/mock_request.h
> index 995fb728380c..4acf0211df20 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_request.h
> +++ b/drivers/gpu/drm/i915/selftests/mock_request.h
> @@ -29,13 +29,6 @@
>   
>   #include "../i915_request.h"
>   
> -struct mock_request {
> -	struct i915_request base;
> -
> -	struct list_head link;
> -	unsigned long delay;
> -};
> -
>   struct i915_request *
>   mock_request(struct intel_engine_cs *engine,
>   	     struct i915_gem_context *context,
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 13/46] drm/i915: Compute the global scheduler caps
  2019-02-06 13:03 ` [PATCH 13/46] drm/i915: Compute the global scheduler caps Chris Wilson
@ 2019-02-11 12:24   ` Tvrtko Ursulin
  2019-02-11 12:33     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 12:24 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> Do a pass over all the engines upon starting to determine the global
> scheduler capability flags (those that are agreed upon by all).
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem.c         |  2 ++
>   drivers/gpu/drm/i915/intel_engine_cs.c  | 39 +++++++++++++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        |  6 ----
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
>   4 files changed, 43 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d18c4ccff370..04fa184fdff5 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4728,6 +4728,8 @@ static int __i915_gem_restart_engines(void *data)
>   		}
>   	}
>   
> +	intel_engines_set_scheduler_caps(i915);
> +
>   	return 0;
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 49fa43ff02ba..02ee86159adc 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -614,6 +614,45 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
>   	return err;
>   }
>   
> +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
> +{
> +	static const struct {
> +		u8 engine;
> +		u8 sched;
> +	} map[] = {
> +#define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
> +		MAP(PREEMPTION, PREEMPTION),
> +#undef MAP
> +	};
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	u32 enabled, disabled;
> +
> +	enabled = 0;
> +	disabled = 0;
> +	for_each_engine(engine, i915, id) { /* all engines must agree! */
> +		int i;
> +
> +		if (engine->schedule)
> +			enabled |= (I915_SCHEDULER_CAP_ENABLED |
> +				    I915_SCHEDULER_CAP_PRIORITY);
> +		else
> +			disabled |= (I915_SCHEDULER_CAP_ENABLED |
> +				     I915_SCHEDULER_CAP_PRIORITY);
> +
> +		for (i = 0; i < ARRAY_SIZE(map); i++) {
> +			if (engine->flags & BIT(map[i].engine))
> +				enabled |= BIT(map[i].sched);
> +			else
> +				disabled |= BIT(map[i].sched);
> +		}
> +	}
> +
> +	i915->caps.scheduler = enabled & ~disabled;
> +	if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED))
> +		i915->caps.scheduler = 0;

This effectively means that as soon engine->schedule is NULL for one 
engine, scheduler caps will be zero. I am thinking if potentially it 
would read clearer to just return from the if (engine->schedule) else 
branch in that case.

May or may not need to zero i915->caps.scheduler at the beginning of the 
function then - depending on whether we think configuration can change 
dynamically at runtime.

Regards,

Tvrtko


> +}
> +
>   static void __intel_context_unpin(struct i915_gem_context *ctx,
>   				  struct intel_engine_cs *engine)
>   {
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index da5120283263..59891cca35c1 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2341,12 +2341,6 @@ void intel_execlists_set_default_submission(struct intel_engine_cs *engine)
>   	engine->flags |= I915_ENGINE_SUPPORTS_STATS;
>   	if (engine->i915->preempt_context)
>   		engine->flags |= I915_ENGINE_HAS_PREEMPTION;
> -
> -	engine->i915->caps.scheduler =
> -		I915_SCHEDULER_CAP_ENABLED |
> -		I915_SCHEDULER_CAP_PRIORITY;
> -	if (intel_engine_has_preemption(engine))
> -		engine->i915->caps.scheduler |= I915_SCHEDULER_CAP_PREEMPTION;
>   }
>   
>   static void
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index e7d85aaee415..19faa19f2529 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -574,6 +574,8 @@ intel_engine_has_preemption(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_PREEMPTION;
>   }
>   
> +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915);
> +
>   static inline bool __execlists_need_preempt(int prio, int last)
>   {
>   	/*
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 13/46] drm/i915: Compute the global scheduler caps
  2019-02-11 12:24   ` Tvrtko Ursulin
@ 2019-02-11 12:33     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-11 12:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 12:24:31)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > Do a pass over all the engines upon starting to determine the global
> > scheduler capability flags (those that are agreed upon by all).
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.c         |  2 ++
> >   drivers/gpu/drm/i915/intel_engine_cs.c  | 39 +++++++++++++++++++++++++
> >   drivers/gpu/drm/i915/intel_lrc.c        |  6 ----
> >   drivers/gpu/drm/i915/intel_ringbuffer.h |  2 ++
> >   4 files changed, 43 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index d18c4ccff370..04fa184fdff5 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -4728,6 +4728,8 @@ static int __i915_gem_restart_engines(void *data)
> >               }
> >       }
> >   
> > +     intel_engines_set_scheduler_caps(i915);
> > +
> >       return 0;
> >   }
> >   
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index 49fa43ff02ba..02ee86159adc 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -614,6 +614,45 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
> >       return err;
> >   }
> >   
> > +void intel_engines_set_scheduler_caps(struct drm_i915_private *i915)
> > +{
> > +     static const struct {
> > +             u8 engine;
> > +             u8 sched;
> > +     } map[] = {
> > +#define MAP(x, y) { ilog2(I915_ENGINE_HAS_##x), ilog2(I915_SCHEDULER_CAP_##y) }
> > +             MAP(PREEMPTION, PREEMPTION),
> > +#undef MAP
> > +     };
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +     u32 enabled, disabled;
> > +
> > +     enabled = 0;
> > +     disabled = 0;
> > +     for_each_engine(engine, i915, id) { /* all engines must agree! */
> > +             int i;
> > +
> > +             if (engine->schedule)
> > +                     enabled |= (I915_SCHEDULER_CAP_ENABLED |
> > +                                 I915_SCHEDULER_CAP_PRIORITY);
> > +             else
> > +                     disabled |= (I915_SCHEDULER_CAP_ENABLED |
> > +                                  I915_SCHEDULER_CAP_PRIORITY);
> > +
> > +             for (i = 0; i < ARRAY_SIZE(map); i++) {
> > +                     if (engine->flags & BIT(map[i].engine))
> > +                             enabled |= BIT(map[i].sched);
> > +                     else
> > +                             disabled |= BIT(map[i].sched);
> > +             }
> > +     }
> > +
> > +     i915->caps.scheduler = enabled & ~disabled;
> > +     if (!(i915->caps.scheduler & I915_SCHEDULER_CAP_ENABLED))
> > +             i915->caps.scheduler = 0;
> 
> This effectively means that as soon engine->schedule is NULL for one 
> engine, scheduler caps will be zero. I am thinking if potentially it 
> would read clearer to just return from the if (engine->schedule) else 
> branch in that case.

I thought it was nice to have the same pattern throughout the loop with
the final fixup of making sure that all caps were zero if the global
scheduler was disabled.  Whether that fixup actually makes sense? As it
seems a little over protective as we already have an explicit on/off bit
(with the rest showing what you could have won!).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
  2019-02-06 13:03 ` [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
@ 2019-02-11 12:40   ` Tvrtko Ursulin
  2019-02-11 12:44     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 12:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> To determine whether an engine has 'stuck', we simply check whether or
> not is still on the same seqno for several seconds. To keep this simple
> mechanism intact over the loss of a global seqno, we can simply add a
> new global heartbeat seqno instead. As we cannot know the sequence in
> which requests will then be completed, we use a primitive random number
> generator instead (with a cycle long enough to not matter over an
> interval of a few thousand requests between hangcheck samples).

We couldn't keep the global seqno just for hangcheck puposes? I mean as 
long as it is unique, which would be guaranteed by obtaining an 
increment on every submission to hw and storing it in atomic_t 
i915->hangcheck_global_seqno / rq->hangcheck_global_seqno, hangcheck 
does not care about the order of execution, no?

Regards,

Tvrtko


> The alternative to using a dedicated seqno on every request is to issue
> a heartbeat request and query its progress through the system. Sadly
> this requires us to reduce struct_mutex so that we can issue requests
> without requiring that bkl.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  7 ++---
>   drivers/gpu/drm/i915/intel_engine_cs.c  |  5 ++--
>   drivers/gpu/drm/i915/intel_hangcheck.c  |  6 ++---
>   drivers/gpu/drm/i915/intel_lrc.c        | 15 +++++++++++
>   drivers/gpu/drm/i915/intel_ringbuffer.c | 36 +++++++++++++++++++++++--
>   drivers/gpu/drm/i915/intel_ringbuffer.h | 19 ++++++++++++-
>   6 files changed, 77 insertions(+), 11 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index af53a2d07f6b..846bd0de3cfa 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1295,7 +1295,7 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	with_intel_runtime_pm(dev_priv, wakeref) {
>   		for_each_engine(engine, dev_priv, id) {
>   			acthd[id] = intel_engine_get_active_head(engine);
> -			seqno[id] = intel_engine_get_seqno(engine);
> +			seqno[id] = intel_engine_get_hangcheck_seqno(engine);
>   		}
>   
>   		intel_engine_get_instdone(dev_priv->engine[RCS], &instdone);
> @@ -1315,8 +1315,9 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>   	for_each_engine(engine, dev_priv, id) {
>   		seq_printf(m, "%s:\n", engine->name);
>   		seq_printf(m, "\tseqno = %x [current %x, last %x], %dms ago\n",
> -			   engine->hangcheck.seqno, seqno[id],
> -			   intel_engine_last_submit(engine),
> +			   engine->hangcheck.last_seqno,
> +			   seqno[id],
> +			   engine->hangcheck.next_seqno,
>   			   jiffies_to_msecs(jiffies -
>   					    engine->hangcheck.action_timestamp));
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 57cfc4c551c9..e1e54b7448b4 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1538,10 +1538,11 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	if (i915_terminally_wedged(&engine->i915->gpu_error))
>   		drm_printf(m, "*** WEDGED ***\n");
>   
> -	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x [%d ms]\n",
> +	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x/%x [%d ms]\n",
>   		   intel_engine_get_seqno(engine),
>   		   intel_engine_last_submit(engine),
> -		   engine->hangcheck.seqno,
> +		   engine->hangcheck.last_seqno,
> +		   engine->hangcheck.next_seqno,
>   		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
>   	drm_printf(m, "\tReset count: %d (global %d)\n",
>   		   i915_reset_engine_count(error, engine),
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index a219c796e56d..e04b2560369e 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -133,21 +133,21 @@ static void hangcheck_load_sample(struct intel_engine_cs *engine,
>   				  struct hangcheck *hc)
>   {
>   	hc->acthd = intel_engine_get_active_head(engine);
> -	hc->seqno = intel_engine_get_seqno(engine);
> +	hc->seqno = intel_engine_get_hangcheck_seqno(engine);
>   }
>   
>   static void hangcheck_store_sample(struct intel_engine_cs *engine,
>   				   const struct hangcheck *hc)
>   {
>   	engine->hangcheck.acthd = hc->acthd;
> -	engine->hangcheck.seqno = hc->seqno;
> +	engine->hangcheck.last_seqno = hc->seqno;
>   }
>   
>   static enum intel_engine_hangcheck_action
>   hangcheck_get_action(struct intel_engine_cs *engine,
>   		     const struct hangcheck *hc)
>   {
> -	if (engine->hangcheck.seqno != hc->seqno)
> +	if (engine->hangcheck.last_seqno != hc->seqno)
>   		return ENGINE_ACTIVE_SEQNO;
>   
>   	if (intel_engine_is_idle(engine))
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index d105187070d4..342d3a91be03 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -180,6 +180,12 @@ static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
>   		I915_GEM_HWS_INDEX_ADDR);
>   }
>   
> +static inline u32 intel_hws_hangcheck_address(struct intel_engine_cs *engine)
> +{
> +	return (i915_ggtt_offset(engine->status_page.vma) +
> +		I915_GEM_HWS_HANGCHECK_ADDR);
> +}
> +
>   static inline struct i915_priolist *to_priolist(struct rb_node *rb)
>   {
>   	return rb_entry(rb, struct i915_priolist, node);
> @@ -2235,6 +2241,10 @@ static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
>   				  request->fence.seqno,
>   				  request->timeline->hwsp_offset);
>   
> +	cs = gen8_emit_ggtt_write(cs,
> +				  intel_engine_next_hangcheck_seqno(request->engine),
> +				  intel_hws_hangcheck_address(request->engine));
> +
>   	cs = gen8_emit_ggtt_write(cs,
>   				  request->global_seqno,
>   				  intel_hws_seqno_address(request->engine));
> @@ -2259,6 +2269,11 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   				      PIPE_CONTROL_FLUSH_ENABLE |
>   				      PIPE_CONTROL_CS_STALL);
>   
> +	cs = gen8_emit_ggtt_write_rcs(cs,
> +				      intel_engine_next_hangcheck_seqno(request->engine),
> +				      intel_hws_hangcheck_address(request->engine),
> +				      PIPE_CONTROL_CS_STALL);
> +
>   	cs = gen8_emit_ggtt_write_rcs(cs,
>   				      request->global_seqno,
>   				      intel_hws_seqno_address(request->engine),
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 7f841dba87b3..870184bbd169 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -43,6 +43,12 @@
>    */
>   #define LEGACY_REQUEST_SIZE 200
>   
> +static inline u32 hws_hangcheck_address(struct intel_engine_cs *engine)
> +{
> +	return (i915_ggtt_offset(engine->status_page.vma) +
> +		I915_GEM_HWS_HANGCHECK_ADDR);
> +}
> +
>   static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
>   {
>   	return (i915_ggtt_offset(engine->status_page.vma) +
> @@ -316,6 +322,11 @@ static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = rq->timeline->hwsp_offset | PIPE_CONTROL_GLOBAL_GTT;
>   	*cs++ = rq->fence.seqno;
>   
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = PIPE_CONTROL_QW_WRITE;
> +	*cs++ = hws_hangcheck_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
> +	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> +
>   	*cs++ = GFX_OP_PIPE_CONTROL(4);
>   	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
>   	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
> @@ -422,6 +433,11 @@ static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = rq->timeline->hwsp_offset;
>   	*cs++ = rq->fence.seqno;
>   
> +	*cs++ = GFX_OP_PIPE_CONTROL(4);
> +	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_GLOBAL_GTT_IVB;
> +	*cs++ = hws_hangcheck_address(rq->engine);
> +	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> +
>   	*cs++ = GFX_OP_PIPE_CONTROL(4);
>   	*cs++ = (PIPE_CONTROL_QW_WRITE |
>   		 PIPE_CONTROL_GLOBAL_GTT_IVB |
> @@ -447,12 +463,15 @@ static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = rq->fence.seqno;
>   
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> +
>   	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>   	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = rq->global_seqno;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> @@ -472,6 +491,10 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_SEQNO_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = rq->fence.seqno;
>   
> +	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> +	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
> +	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> +
>   	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
>   	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = rq->global_seqno;
> @@ -487,6 +510,7 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = 0;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> @@ -930,11 +954,16 @@ static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
>   	*cs++ = rq->fence.seqno;
>   
> +	*cs++ = MI_STORE_DWORD_INDEX;
> +	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
> +	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> +
>   	*cs++ = MI_STORE_DWORD_INDEX;
>   	*cs++ = I915_GEM_HWS_INDEX_ADDR;
>   	*cs++ = rq->global_seqno;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> @@ -956,6 +985,10 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
>   	*cs++ = rq->fence.seqno;
>   
> +	*cs++ = MI_STORE_DWORD_INDEX;
> +	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
> +	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> +
>   	BUILD_BUG_ON(GEN5_WA_STORES < 1);
>   	for (i = 0; i < GEN5_WA_STORES; i++) {
>   		*cs++ = MI_STORE_DWORD_INDEX;
> @@ -964,7 +997,6 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	}
>   
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 493a72ed01af..b30c37ac55a3 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -6,6 +6,7 @@
>   
>   #include <linux/hashtable.h>
>   #include <linux/irq_work.h>
> +#include <linux/random.h>
>   #include <linux/seqlock.h>
>   
>   #include "i915_gem_batch_pool.h"
> @@ -119,7 +120,8 @@ struct intel_instdone {
>   
>   struct intel_engine_hangcheck {
>   	u64 acthd;
> -	u32 seqno;
> +	u32 last_seqno;
> +	u32 next_seqno;
>   	unsigned long action_timestamp;
>   	struct intel_instdone instdone;
>   };
> @@ -718,6 +720,8 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>   #define I915_GEM_HWS_INDEX_ADDR		(I915_GEM_HWS_INDEX * sizeof(u32))
>   #define I915_GEM_HWS_PREEMPT		0x32
>   #define I915_GEM_HWS_PREEMPT_ADDR	(I915_GEM_HWS_PREEMPT * sizeof(u32))
> +#define I915_GEM_HWS_HANGCHECK		0x34
> +#define I915_GEM_HWS_HANGCHECK_ADDR	(I915_GEM_HWS_HANGCHECK * sizeof(u32))
>   #define I915_GEM_HWS_SEQNO		0x40
>   #define I915_GEM_HWS_SEQNO_ADDR		(I915_GEM_HWS_SEQNO * sizeof(u32))
>   #define I915_GEM_HWS_SCRATCH		0x80
> @@ -1078,4 +1082,17 @@ static inline bool inject_preempt_hang(struct intel_engine_execlists *execlists)
>   
>   #endif
>   
> +static inline u32
> +intel_engine_next_hangcheck_seqno(struct intel_engine_cs *engine)
> +{
> +	return engine->hangcheck.next_seqno =
> +		next_pseudo_random32(engine->hangcheck.next_seqno);
> +}
> +
> +static inline u32
> +intel_engine_get_hangcheck_seqno(struct intel_engine_cs *engine)
> +{
> +	return intel_read_status_page(engine, I915_GEM_HWS_HANGCHECK);
> +}
> +
>   #endif /* _INTEL_RINGBUFFER_H_ */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 10/46] drm/i915: Make request allocation caches global
  2019-02-11 11:43   ` Tvrtko Ursulin
@ 2019-02-11 12:40     ` Chris Wilson
  2019-02-11 17:02       ` Tvrtko Ursulin
  0 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-11 12:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 11:43:41)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > As kmem_caches share the same properties (size, allocation/free behaviour)
> > for all potential devices, we can use global caches. While this
> > potential has worse fragmentation behaviour (one can argue that
> > different devices would have different activity lifetimes, but you can
> > also argue that activity is temporal across the system) it is the
> > default behaviour of the system at large to amalgamate matching caches.
> > 
> > The benefit for us is much reduced pointer dancing along the frequent
> > allocation paths.
> > 
> > v2: Defer shrinking until after a global grace period for futureproofing
> > multiple consumers of the slab caches, similar to the current strategy
> > for avoiding shrinking too early.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/Makefile                 |   1 +
> >   drivers/gpu/drm/i915/i915_active.c            |   7 +-
> >   drivers/gpu/drm/i915/i915_active.h            |   1 +
> >   drivers/gpu/drm/i915/i915_drv.h               |   3 -
> >   drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
> >   drivers/gpu/drm/i915/i915_globals.c           | 105 ++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_globals.h           |  15 +++
> >   drivers/gpu/drm/i915/i915_pci.c               |   8 +-
> >   drivers/gpu/drm/i915/i915_request.c           |  53 +++++++--
> >   drivers/gpu/drm/i915/i915_request.h           |  10 ++
> >   drivers/gpu/drm/i915/i915_scheduler.c         |  66 ++++++++---
> >   drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
> >   drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
> >   drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
> >   drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
> >   drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
> >   drivers/gpu/drm/i915/selftests/mock_engine.c  |  48 ++++----
> >   .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 -----
> >   drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
> >   drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
> >   20 files changed, 306 insertions(+), 152 deletions(-)
> >   create mode 100644 drivers/gpu/drm/i915/i915_globals.c
> >   create mode 100644 drivers/gpu/drm/i915/i915_globals.h
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index 1787e1299b1b..a1d834068765 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -77,6 +77,7 @@ i915-y += \
> >         i915_gem_tiling.o \
> >         i915_gem_userptr.o \
> >         i915_gemfs.o \
> > +       i915_globals.o \
> >         i915_query.o \
> >         i915_request.o \
> >         i915_scheduler.o \
> > diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> > index 215b6ff8aa73..9026787ebdf8 100644
> > --- a/drivers/gpu/drm/i915/i915_active.c
> > +++ b/drivers/gpu/drm/i915/i915_active.c
> > @@ -280,7 +280,12 @@ int __init i915_global_active_init(void)
> >       return 0;
> >   }
> >   
> > -void __exit i915_global_active_exit(void)
> > +void i915_global_active_shrink(void)
> > +{
> > +     kmem_cache_shrink(global.slab_cache);
> > +}
> > +
> > +void i915_global_active_exit(void)
> >   {
> >       kmem_cache_destroy(global.slab_cache);
> >   }
> > diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> > index 12b5c1d287d1..5fbd9102384b 100644
> > --- a/drivers/gpu/drm/i915/i915_active.h
> > +++ b/drivers/gpu/drm/i915/i915_active.h
> > @@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
> >   #endif
> >   
> >   int i915_global_active_init(void);
> > +void i915_global_active_shrink(void);
> >   void i915_global_active_exit(void);
> >   
> >   #endif /* _I915_ACTIVE_H_ */
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 37230ae7fbe6..a365b1a2ea9a 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -1459,9 +1459,6 @@ struct drm_i915_private {
> >       struct kmem_cache *objects;
> >       struct kmem_cache *vmas;
> >       struct kmem_cache *luts;
> > -     struct kmem_cache *requests;
> > -     struct kmem_cache *dependencies;
> > -     struct kmem_cache *priorities;
> >   
> >       const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
> >       struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 1eb3a5f8654c..d18c4ccff370 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -42,6 +42,7 @@
> >   #include "i915_drv.h"
> >   #include "i915_gem_clflush.h"
> >   #include "i915_gemfs.h"
> > +#include "i915_globals.h"
> >   #include "i915_reset.h"
> >   #include "i915_trace.h"
> >   #include "i915_vgpu.h"
> > @@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
> >       if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
> >               i915->gt.epoch = 1;
> >   
> > +     i915_globals_unpark();
> > +
> >       intel_enable_gt_powersave(i915);
> >       i915_update_gfx_val(i915);
> >       if (INTEL_GEN(i915) >= 6)
> > @@ -2916,12 +2919,11 @@ static void shrink_caches(struct drm_i915_private *i915)
> >        * filled slabs to prioritise allocating from the mostly full slabs,
> >        * with the aim of reducing fragmentation.
> >        */
> > -     kmem_cache_shrink(i915->priorities);
> > -     kmem_cache_shrink(i915->dependencies);
> > -     kmem_cache_shrink(i915->requests);
> >       kmem_cache_shrink(i915->luts);
> >       kmem_cache_shrink(i915->vmas);
> >       kmem_cache_shrink(i915->objects);
> > +
> > +     i915_globals_park();
> 
> Slightly confusing that the shrink caches path calls globals_park - ie 
> after the device has been parked. Would i915_globals_shrink and 
> __i915_globals_shrink be clearer? Not sure.

Final destination is __i915_gem_park. I could stick it there now, but
felt it clearer to have it as a sideways move atm.

With the last 3 slab caches converted over to globals, they all sit
behind the same rcu_work and we can remove our open-coded variant
(rcu_work is a recent invention).

> > +void i915_globals_park(void)
> > +{
> > +     struct park_work *wrk;
> > +
> > +     /*
> > +      * Defer shrinking the global slab caches (and other work) until
> > +      * after a RCU grace period has completed with no activity. This
> > +      * is to try and reduce the latency impact on the consumers caused
> > +      * by us shrinking the caches the same time as they are trying to
> > +      * allocate, with the assumption being that if we idle long enough
> > +      * for an RCU grace period to elapse since the last use, it is likely
> > +      * to be longer until we need the caches again.
> > +      */
> > +     if (!atomic_dec_and_test(&active))
> > +             return;
> > +
> > +     wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
> > +     if (!wrk)
> > +             return;
> > +
> > +     wrk->epoch = atomic_inc_return(&epoch);
> 
> Do you need to bump the epoch here?

Strictly, no. It doesn't harm, provides an explicit mb and a known
uniqueness to our sampling.

> Unpark would bump it so 
> automatically when rcu work gets to run it would fail already. Like this 
> it sounds like double increment. I don't see a problem with the double 
> increment I just failed to spot if it is actually needed for some subtle 
> reason. There would be a potential race with multiple device park 
> callers storing the same epoch but is that really a problem? Again, as 
> soon as someone unparks it seems like it would be the right thing.

I did wonder if we could make use of it, but for the moment, all I can
say is that it may make debugging slightly easier.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
  2019-02-11 12:40   ` Tvrtko Ursulin
@ 2019-02-11 12:44     ` Chris Wilson
  2019-02-11 16:56       ` Tvrtko Ursulin
  0 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-11 12:44 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 12:40:07)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > To determine whether an engine has 'stuck', we simply check whether or
> > not is still on the same seqno for several seconds. To keep this simple
> > mechanism intact over the loss of a global seqno, we can simply add a
> > new global heartbeat seqno instead. As we cannot know the sequence in
> > which requests will then be completed, we use a primitive random number
> > generator instead (with a cycle long enough to not matter over an
> > interval of a few thousand requests between hangcheck samples).
> 
> We couldn't keep the global seqno just for hangcheck puposes? I mean as 
> long as it is unique, which would be guaranteed by obtaining an 
> increment on every submission to hw and storing it in atomic_t 
> i915->hangcheck_global_seqno / rq->hangcheck_global_seqno, hangcheck 
> does not care about the order of execution, no?

s/global_seqno/hangcheck_seqno/ ?

(a) the goal is to kill off global_seqno entirely so we are all sure
there is no such seqno or ordering anymore
(b) this is a temporary patch and we kill off hangcheck_seqno, just as
soon as I can submit requests without struct_mutex
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
  2019-02-11 12:44     ` Chris Wilson
@ 2019-02-11 16:56       ` Tvrtko Ursulin
  2019-02-12 13:36         ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 16:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/02/2019 12:44, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-02-11 12:40:07)
>>
>> On 06/02/2019 13:03, Chris Wilson wrote:
>>> To determine whether an engine has 'stuck', we simply check whether or
>>> not is still on the same seqno for several seconds. To keep this simple
>>> mechanism intact over the loss of a global seqno, we can simply add a
>>> new global heartbeat seqno instead. As we cannot know the sequence in
>>> which requests will then be completed, we use a primitive random number
>>> generator instead (with a cycle long enough to not matter over an
>>> interval of a few thousand requests between hangcheck samples).
>>
>> We couldn't keep the global seqno just for hangcheck puposes? I mean as
>> long as it is unique, which would be guaranteed by obtaining an
>> increment on every submission to hw and storing it in atomic_t
>> i915->hangcheck_global_seqno / rq->hangcheck_global_seqno, hangcheck
>> does not care about the order of execution, no?
> 
> s/global_seqno/hangcheck_seqno/ ?

Yes sure, I was just trying to express the idea that a "globally" unique 
number is all that I thought we need. Like:

     rq->hangcheck_seqno = atomic_inc_return(&i915->hangcheck_seqno);

Did I get that right then? That we don't really need the pseudo random 
number solution? We could even avoid calling it a seqno if desired. 
rq->unique, wait.. we possibly had this name for something in the past..

> (a) the goal is to kill off global_seqno entirely so we are all sure
> there is no such seqno or ordering anymore
> (b) this is a temporary patch and we kill off hangcheck_seqno, just as
> soon as I can submit requests without struct_mutex

The heartbeat request solution? Is that better than the hangcheck seqno?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 10/46] drm/i915: Make request allocation caches global
  2019-02-11 12:40     ` Chris Wilson
@ 2019-02-11 17:02       ` Tvrtko Ursulin
  2019-02-12 11:51         ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 17:02 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/02/2019 12:40, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-02-11 11:43:41)
>>
>> On 06/02/2019 13:03, Chris Wilson wrote:
>>> As kmem_caches share the same properties (size, allocation/free behaviour)
>>> for all potential devices, we can use global caches. While this
>>> potential has worse fragmentation behaviour (one can argue that
>>> different devices would have different activity lifetimes, but you can
>>> also argue that activity is temporal across the system) it is the
>>> default behaviour of the system at large to amalgamate matching caches.
>>>
>>> The benefit for us is much reduced pointer dancing along the frequent
>>> allocation paths.
>>>
>>> v2: Defer shrinking until after a global grace period for futureproofing
>>> multiple consumers of the slab caches, similar to the current strategy
>>> for avoiding shrinking too early.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/Makefile                 |   1 +
>>>    drivers/gpu/drm/i915/i915_active.c            |   7 +-
>>>    drivers/gpu/drm/i915/i915_active.h            |   1 +
>>>    drivers/gpu/drm/i915/i915_drv.h               |   3 -
>>>    drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
>>>    drivers/gpu/drm/i915/i915_globals.c           | 105 ++++++++++++++++++
>>>    drivers/gpu/drm/i915/i915_globals.h           |  15 +++
>>>    drivers/gpu/drm/i915/i915_pci.c               |   8 +-
>>>    drivers/gpu/drm/i915/i915_request.c           |  53 +++++++--
>>>    drivers/gpu/drm/i915/i915_request.h           |  10 ++
>>>    drivers/gpu/drm/i915/i915_scheduler.c         |  66 ++++++++---
>>>    drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
>>>    drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
>>>    drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
>>>    drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
>>>    drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
>>>    drivers/gpu/drm/i915/selftests/mock_engine.c  |  48 ++++----
>>>    .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 -----
>>>    drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
>>>    drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
>>>    20 files changed, 306 insertions(+), 152 deletions(-)
>>>    create mode 100644 drivers/gpu/drm/i915/i915_globals.c
>>>    create mode 100644 drivers/gpu/drm/i915/i915_globals.h
>>>
>>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
>>> index 1787e1299b1b..a1d834068765 100644
>>> --- a/drivers/gpu/drm/i915/Makefile
>>> +++ b/drivers/gpu/drm/i915/Makefile
>>> @@ -77,6 +77,7 @@ i915-y += \
>>>          i915_gem_tiling.o \
>>>          i915_gem_userptr.o \
>>>          i915_gemfs.o \
>>> +       i915_globals.o \
>>>          i915_query.o \
>>>          i915_request.o \
>>>          i915_scheduler.o \
>>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
>>> index 215b6ff8aa73..9026787ebdf8 100644
>>> --- a/drivers/gpu/drm/i915/i915_active.c
>>> +++ b/drivers/gpu/drm/i915/i915_active.c
>>> @@ -280,7 +280,12 @@ int __init i915_global_active_init(void)
>>>        return 0;
>>>    }
>>>    
>>> -void __exit i915_global_active_exit(void)
>>> +void i915_global_active_shrink(void)
>>> +{
>>> +     kmem_cache_shrink(global.slab_cache);
>>> +}
>>> +
>>> +void i915_global_active_exit(void)
>>>    {
>>>        kmem_cache_destroy(global.slab_cache);
>>>    }
>>> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
>>> index 12b5c1d287d1..5fbd9102384b 100644
>>> --- a/drivers/gpu/drm/i915/i915_active.h
>>> +++ b/drivers/gpu/drm/i915/i915_active.h
>>> @@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
>>>    #endif
>>>    
>>>    int i915_global_active_init(void);
>>> +void i915_global_active_shrink(void);
>>>    void i915_global_active_exit(void);
>>>    
>>>    #endif /* _I915_ACTIVE_H_ */
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 37230ae7fbe6..a365b1a2ea9a 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1459,9 +1459,6 @@ struct drm_i915_private {
>>>        struct kmem_cache *objects;
>>>        struct kmem_cache *vmas;
>>>        struct kmem_cache *luts;
>>> -     struct kmem_cache *requests;
>>> -     struct kmem_cache *dependencies;
>>> -     struct kmem_cache *priorities;
>>>    
>>>        const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
>>>        struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 1eb3a5f8654c..d18c4ccff370 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -42,6 +42,7 @@
>>>    #include "i915_drv.h"
>>>    #include "i915_gem_clflush.h"
>>>    #include "i915_gemfs.h"
>>> +#include "i915_globals.h"
>>>    #include "i915_reset.h"
>>>    #include "i915_trace.h"
>>>    #include "i915_vgpu.h"
>>> @@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
>>>        if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
>>>                i915->gt.epoch = 1;
>>>    
>>> +     i915_globals_unpark();
>>> +
>>>        intel_enable_gt_powersave(i915);
>>>        i915_update_gfx_val(i915);
>>>        if (INTEL_GEN(i915) >= 6)
>>> @@ -2916,12 +2919,11 @@ static void shrink_caches(struct drm_i915_private *i915)
>>>         * filled slabs to prioritise allocating from the mostly full slabs,
>>>         * with the aim of reducing fragmentation.
>>>         */
>>> -     kmem_cache_shrink(i915->priorities);
>>> -     kmem_cache_shrink(i915->dependencies);
>>> -     kmem_cache_shrink(i915->requests);
>>>        kmem_cache_shrink(i915->luts);
>>>        kmem_cache_shrink(i915->vmas);
>>>        kmem_cache_shrink(i915->objects);
>>> +
>>> +     i915_globals_park();
>>
>> Slightly confusing that the shrink caches path calls globals_park - ie
>> after the device has been parked. Would i915_globals_shrink and
>> __i915_globals_shrink be clearer? Not sure.
> 
> Final destination is __i915_gem_park. I could stick it there now, but
> felt it clearer to have it as a sideways move atm.
> 
> With the last 3 slab caches converted over to globals, they all sit
> behind the same rcu_work and we can remove our open-coded variant
> (rcu_work is a recent invention).

Is there some downside to calling i915_globals_park directly from the 
idle work handler straight away? (I mean in this patch.)

Converting the idle __sleep_rcu path to queue_rcu_work is then a 
completely separate task.

Regards,

Tvrtko

>>> +void i915_globals_park(void)
>>> +{
>>> +     struct park_work *wrk;
>>> +
>>> +     /*
>>> +      * Defer shrinking the global slab caches (and other work) until
>>> +      * after a RCU grace period has completed with no activity. This
>>> +      * is to try and reduce the latency impact on the consumers caused
>>> +      * by us shrinking the caches the same time as they are trying to
>>> +      * allocate, with the assumption being that if we idle long enough
>>> +      * for an RCU grace period to elapse since the last use, it is likely
>>> +      * to be longer until we need the caches again.
>>> +      */
>>> +     if (!atomic_dec_and_test(&active))
>>> +             return;
>>> +
>>> +     wrk = kmalloc(sizeof(*wrk), GFP_KERNEL);
>>> +     if (!wrk)
>>> +             return;
>>> +
>>> +     wrk->epoch = atomic_inc_return(&epoch);
>>
>> Do you need to bump the epoch here?
> 
> Strictly, no. It doesn't harm, provides an explicit mb and a known
> uniqueness to our sampling.
> 
>> Unpark would bump it so
>> automatically when rcu work gets to run it would fail already. Like this
>> it sounds like double increment. I don't see a problem with the double
>> increment I just failed to spot if it is actually needed for some subtle
>> reason. There would be a potential race with multiple device park
>> callers storing the same epoch but is that really a problem? Again, as
>> soon as someone unparks it seems like it would be the right thing.
> 
> I did wonder if we could make use of it, but for the moment, all I can
> say is that it may make debugging slightly easier.
> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout()
  2019-02-06 13:03 ` [PATCH 17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() Chris Wilson
@ 2019-02-11 18:06   ` Tvrtko Ursulin
  0 siblings, 0 replies; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 18:06 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Eero Tamminen


On 06/02/2019 13:03, Chris Wilson wrote:
> As time goes by, usage of generic ioctls such as drm_syncobj and
> sync_file are on the increase bypassing i915-specific ioctls like
> GEM_WAIT. Currently, we only apply waitboosting to our driver ioctls as
> we track the file/client and account the waitboosting to them. However,
> since commit 7b92c1bd0540 ("drm/i915: Avoid keeping waitboost active for
> signaling threads"), we no longer have been applying the client
> ratelimiting on waitboosts and so that information has only been used
> for debug tracking.
> 
> Push the application of waitboosting down to the common
> i915_request_wait, and apply it to all foreign fence waits as well.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Eero Tamminen <eero.t.tamminen@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c  | 19 +-----
>   drivers/gpu/drm/i915/i915_drv.h      |  7 +--
>   drivers/gpu/drm/i915/i915_gem.c      | 86 ++++++----------------------
>   drivers/gpu/drm/i915/i915_request.c  | 21 ++++++-
>   drivers/gpu/drm/i915/intel_display.c |  2 +-
>   drivers/gpu/drm/i915/intel_drv.h     |  2 +-
>   drivers/gpu/drm/i915/intel_pm.c      |  5 +-
>   7 files changed, 44 insertions(+), 98 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 8a488ffc8b7d..af53a2d07f6b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -2020,11 +2020,9 @@ static const char *rps_power_to_str(unsigned int power)
>   static int i915_rps_boost_info(struct seq_file *m, void *data)
>   {
>   	struct drm_i915_private *dev_priv = node_to_i915(m->private);
> -	struct drm_device *dev = &dev_priv->drm;
>   	struct intel_rps *rps = &dev_priv->gt_pm.rps;
>   	u32 act_freq = rps->cur_freq;
>   	intel_wakeref_t wakeref;
> -	struct drm_file *file;
>   
>   	with_intel_runtime_pm_if_in_use(dev_priv, wakeref) {
>   		if (IS_VALLEYVIEW(dev_priv) || IS_CHERRYVIEW(dev_priv)) {
> @@ -2058,22 +2056,7 @@ static int i915_rps_boost_info(struct seq_file *m, void *data)
>   		   intel_gpu_freq(dev_priv, rps->efficient_freq),
>   		   intel_gpu_freq(dev_priv, rps->boost_freq));
>   
> -	mutex_lock(&dev->filelist_mutex);
> -	list_for_each_entry_reverse(file, &dev->filelist, lhead) {
> -		struct drm_i915_file_private *file_priv = file->driver_priv;
> -		struct task_struct *task;
> -
> -		rcu_read_lock();
> -		task = pid_task(file->pid, PIDTYPE_PID);
> -		seq_printf(m, "%s [%d]: %d boosts\n",
> -			   task ? task->comm : "<unknown>",
> -			   task ? task->pid : -1,
> -			   atomic_read(&file_priv->rps_client.boosts));
> -		rcu_read_unlock();
> -	}
> -	seq_printf(m, "Kernel (anonymous) boosts: %d\n",
> -		   atomic_read(&rps->boosts));
> -	mutex_unlock(&dev->filelist_mutex);
> +	seq_printf(m, "Wait boosts: %d\n", atomic_read(&rps->boosts));
>   
>   	if (INTEL_GEN(dev_priv) >= 6 &&
>   	    rps->enabled &&
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a365b1a2ea9a..4d697b1002af 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -217,10 +217,6 @@ struct drm_i915_file_private {
>   	} mm;
>   	struct idr context_idr;
>   
> -	struct intel_rps_client {
> -		atomic_t boosts;
> -	} rps_client;
> -
>   	unsigned int bsd_engine;
>   
>   /*
> @@ -3041,8 +3037,7 @@ void i915_gem_resume(struct drm_i915_private *dev_priv);
>   vm_fault_t i915_gem_fault(struct vm_fault *vmf);
>   int i915_gem_object_wait(struct drm_i915_gem_object *obj,
>   			 unsigned int flags,
> -			 long timeout,
> -			 struct intel_rps_client *rps);
> +			 long timeout);
>   int i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
>   				  unsigned int flags,
>   				  const struct i915_sched_attr *attr);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 04fa184fdff5..78b9aa57932d 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -421,8 +421,7 @@ int i915_gem_object_unbind(struct drm_i915_gem_object *obj)
>   static long
>   i915_gem_object_wait_fence(struct dma_fence *fence,
>   			   unsigned int flags,
> -			   long timeout,
> -			   struct intel_rps_client *rps_client)
> +			   long timeout)
>   {
>   	struct i915_request *rq;
>   
> @@ -440,27 +439,6 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>   	if (i915_request_completed(rq))
>   		goto out;
>   
> -	/*
> -	 * This client is about to stall waiting for the GPU. In many cases
> -	 * this is undesirable and limits the throughput of the system, as
> -	 * many clients cannot continue processing user input/output whilst
> -	 * blocked. RPS autotuning may take tens of milliseconds to respond
> -	 * to the GPU load and thus incurs additional latency for the client.
> -	 * We can circumvent that by promoting the GPU frequency to maximum
> -	 * before we wait. This makes the GPU throttle up much more quickly
> -	 * (good for benchmarks and user experience, e.g. window animations),
> -	 * but at a cost of spending more power processing the workload
> -	 * (bad for battery). Not all clients even want their results
> -	 * immediately and for them we should just let the GPU select its own
> -	 * frequency to maximise efficiency. To prevent a single client from
> -	 * forcing the clocks too high for the whole system, we only allow
> -	 * each client to waitboost once in a busy period.
> -	 */
> -	if (rps_client && !i915_request_started(rq)) {
> -		if (INTEL_GEN(rq->i915) >= 6)
> -			gen6_rps_boost(rq, rps_client);
> -	}
> -
>   	timeout = i915_request_wait(rq, flags, timeout);
>   
>   out:
> @@ -473,8 +451,7 @@ i915_gem_object_wait_fence(struct dma_fence *fence,
>   static long
>   i915_gem_object_wait_reservation(struct reservation_object *resv,
>   				 unsigned int flags,
> -				 long timeout,
> -				 struct intel_rps_client *rps_client)
> +				 long timeout)
>   {
>   	unsigned int seq = __read_seqcount_begin(&resv->seq);
>   	struct dma_fence *excl;
> @@ -492,8 +469,7 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
>   
>   		for (i = 0; i < count; i++) {
>   			timeout = i915_gem_object_wait_fence(shared[i],
> -							     flags, timeout,
> -							     rps_client);
> +							     flags, timeout);
>   			if (timeout < 0)
>   				break;
>   
> @@ -519,8 +495,7 @@ i915_gem_object_wait_reservation(struct reservation_object *resv,
>   	}
>   
>   	if (excl && timeout >= 0)
> -		timeout = i915_gem_object_wait_fence(excl, flags, timeout,
> -						     rps_client);
> +		timeout = i915_gem_object_wait_fence(excl, flags, timeout);
>   
>   	dma_fence_put(excl);
>   
> @@ -614,30 +589,19 @@ i915_gem_object_wait_priority(struct drm_i915_gem_object *obj,
>    * @obj: i915 gem object
>    * @flags: how to wait (under a lock, for all rendering or just for writes etc)
>    * @timeout: how long to wait
> - * @rps_client: client (user process) to charge for any waitboosting
>    */
>   int
>   i915_gem_object_wait(struct drm_i915_gem_object *obj,
>   		     unsigned int flags,
> -		     long timeout,
> -		     struct intel_rps_client *rps_client)
> +		     long timeout)
>   {
>   	might_sleep();
>   	GEM_BUG_ON(timeout < 0);
>   
> -	timeout = i915_gem_object_wait_reservation(obj->resv,
> -						   flags, timeout,
> -						   rps_client);
> +	timeout = i915_gem_object_wait_reservation(obj->resv, flags, timeout);
>   	return timeout < 0 ? timeout : 0;
>   }
>   
> -static struct intel_rps_client *to_rps_client(struct drm_file *file)
> -{
> -	struct drm_i915_file_private *fpriv = file->driver_priv;
> -
> -	return &fpriv->rps_client;
> -}
> -
>   static int
>   i915_gem_phys_pwrite(struct drm_i915_gem_object *obj,
>   		     struct drm_i915_gem_pwrite *args,
> @@ -843,8 +807,7 @@ int i915_gem_obj_prepare_shmem_read(struct drm_i915_gem_object *obj,
>   	ret = i915_gem_object_wait(obj,
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_LOCKED,
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   NULL);
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		return ret;
>   
> @@ -896,8 +859,7 @@ int i915_gem_obj_prepare_shmem_write(struct drm_i915_gem_object *obj,
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_LOCKED |
>   				   I915_WAIT_ALL,
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   NULL);
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		return ret;
>   
> @@ -1159,8 +1121,7 @@ i915_gem_pread_ioctl(struct drm_device *dev, void *data,
>   
>   	ret = i915_gem_object_wait(obj,
>   				   I915_WAIT_INTERRUPTIBLE,
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   to_rps_client(file));
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		goto out;
>   
> @@ -1459,8 +1420,7 @@ i915_gem_pwrite_ioctl(struct drm_device *dev, void *data,
>   	ret = i915_gem_object_wait(obj,
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_ALL,
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   to_rps_client(file));
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		goto err;
>   
> @@ -1558,8 +1518,7 @@ i915_gem_set_domain_ioctl(struct drm_device *dev, void *data,
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_PRIORITY |
>   				   (write_domain ? I915_WAIT_ALL : 0),
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   to_rps_client(file));
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (err)
>   		goto out;
>   
> @@ -1850,8 +1809,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>   	 */
>   	ret = i915_gem_object_wait(obj,
>   				   I915_WAIT_INTERRUPTIBLE,
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   NULL);
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		goto err;
>   
> @@ -3181,8 +3139,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_PRIORITY |
>   				   I915_WAIT_ALL,
> -				   to_wait_timeout(args->timeout_ns),
> -				   to_rps_client(file));
> +				   to_wait_timeout(args->timeout_ns));
>   
>   	if (args->timeout_ns > 0) {
>   		args->timeout_ns -= ktime_to_ns(ktime_sub(ktime_get(), start));
> @@ -3251,7 +3208,7 @@ wait_for_timelines(struct drm_i915_private *i915,
>   		 * stalls, so allow the gpu to boost to maximum clocks.
>   		 */
>   		if (flags & I915_WAIT_FOR_IDLE_BOOST)
> -			gen6_rps_boost(rq, NULL);
> +			gen6_rps_boost(rq);
>   
>   		timeout = i915_request_wait(rq, flags, timeout);
>   		i915_request_put(rq);
> @@ -3346,8 +3303,7 @@ i915_gem_object_set_to_wc_domain(struct drm_i915_gem_object *obj, bool write)
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_LOCKED |
>   				   (write ? I915_WAIT_ALL : 0),
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   NULL);
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		return ret;
>   
> @@ -3409,8 +3365,7 @@ i915_gem_object_set_to_gtt_domain(struct drm_i915_gem_object *obj, bool write)
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_LOCKED |
>   				   (write ? I915_WAIT_ALL : 0),
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   NULL);
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		return ret;
>   
> @@ -3525,8 +3480,7 @@ int i915_gem_object_set_cache_level(struct drm_i915_gem_object *obj,
>   					   I915_WAIT_INTERRUPTIBLE |
>   					   I915_WAIT_LOCKED |
>   					   I915_WAIT_ALL,
> -					   MAX_SCHEDULE_TIMEOUT,
> -					   NULL);
> +					   MAX_SCHEDULE_TIMEOUT);
>   		if (ret)
>   			return ret;
>   
> @@ -3664,8 +3618,7 @@ int i915_gem_set_caching_ioctl(struct drm_device *dev, void *data,
>   
>   	ret = i915_gem_object_wait(obj,
>   				   I915_WAIT_INTERRUPTIBLE,
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   to_rps_client(file));
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		goto out;
>   
> @@ -3791,8 +3744,7 @@ i915_gem_object_set_to_cpu_domain(struct drm_i915_gem_object *obj, bool write)
>   				   I915_WAIT_INTERRUPTIBLE |
>   				   I915_WAIT_LOCKED |
>   				   (write ? I915_WAIT_ALL : 0),
> -				   MAX_SCHEDULE_TIMEOUT,
> -				   NULL);
> +				   MAX_SCHEDULE_TIMEOUT);
>   	if (ret)
>   		return ret;
>   
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 678da705e222..eed66d3606d9 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -81,7 +81,9 @@ static signed long i915_fence_wait(struct dma_fence *fence,
>   				   bool interruptible,
>   				   signed long timeout)
>   {
> -	return i915_request_wait(to_request(fence), interruptible, timeout);
> +	return i915_request_wait(to_request(fence),
> +				 interruptible | I915_WAIT_PRIORITY,
> +				 timeout);
>   }
>   
>   static void i915_fence_release(struct dma_fence *fence)
> @@ -1288,8 +1290,23 @@ long i915_request_wait(struct i915_request *rq,
>   	if (__i915_spin_request(rq, state, 5))
>   		goto out;
>   
> -	if (flags & I915_WAIT_PRIORITY)
> +	/*
> +	 * This client is about to stall waiting for the GPU. In many cases
> +	 * this is undesirable and limits the throughput of the system, as
> +	 * many clients cannot continue processing user input/output whilst
> +	 * blocked. RPS autotuning may take tens of milliseconds to respond
> +	 * to the GPU load and thus incurs additional latency for the client.
> +	 * We can circumvent that by promoting the GPU frequency to maximum
> +	 * before we sleep. This makes the GPU throttle up much more quickly
> +	 * (good for benchmarks and user experience, e.g. window animations),
> +	 * but at a cost of spending more power processing the workload
> +	 * (bad for battery).
> +	 */
> +	if (flags & I915_WAIT_PRIORITY) {
> +		if (!i915_request_started(rq) && INTEL_GEN(rq->i915) >= 6)
> +			gen6_rps_boost(rq);
>   		i915_schedule_bump_priority(rq, I915_PRIORITY_WAIT);
> +	}
>   
>   	wait.tsk = current;
>   	if (dma_fence_add_callback(&rq->fence, &wait.cb, request_wait_wake))
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index 4d5ec929f987..f8657e53fe68 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -13396,7 +13396,7 @@ static int do_rps_boost(struct wait_queue_entry *_wait,
>   	 * vblank without our intervention, so leave RPS alone.
>   	 */
>   	if (!i915_request_started(rq))
> -		gen6_rps_boost(rq, NULL);
> +		gen6_rps_boost(rq);
>   	i915_request_put(rq);
>   
>   	drm_crtc_vblank_put(wait->crtc);
> diff --git a/drivers/gpu/drm/i915/intel_drv.h b/drivers/gpu/drm/i915/intel_drv.h
> index 90ba5436370e..47b4f88da7eb 100644
> --- a/drivers/gpu/drm/i915/intel_drv.h
> +++ b/drivers/gpu/drm/i915/intel_drv.h
> @@ -2249,7 +2249,7 @@ void intel_suspend_gt_powersave(struct drm_i915_private *dev_priv);
>   void gen6_rps_busy(struct drm_i915_private *dev_priv);
>   void gen6_rps_reset_ei(struct drm_i915_private *dev_priv);
>   void gen6_rps_idle(struct drm_i915_private *dev_priv);
> -void gen6_rps_boost(struct i915_request *rq, struct intel_rps_client *rps);
> +void gen6_rps_boost(struct i915_request *rq);
>   void g4x_wm_get_hw_state(struct drm_i915_private *dev_priv);
>   void vlv_wm_get_hw_state(struct drm_i915_private *dev_priv);
>   void ilk_wm_get_hw_state(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 737005bf6816..58514a17f134 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6693,8 +6693,7 @@ void gen6_rps_idle(struct drm_i915_private *dev_priv)
>   	mutex_unlock(&dev_priv->pcu_lock);
>   }
>   
> -void gen6_rps_boost(struct i915_request *rq,
> -		    struct intel_rps_client *rps_client)
> +void gen6_rps_boost(struct i915_request *rq)
>   {
>   	struct intel_rps *rps = &rq->i915->gt_pm.rps;
>   	unsigned long flags;
> @@ -6723,7 +6722,7 @@ void gen6_rps_boost(struct i915_request *rq,
>   	if (READ_ONCE(rps->cur_freq) < rps->boost_freq)
>   		schedule_work(&rps->work);
>   
> -	atomic_inc(rps_client ? &rps_client->boosts : &rps->boosts);
> +	atomic_inc(&rps->boosts);
>   }
>   
>   int intel_set_rps(struct drm_i915_private *dev_priv, u8 val)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer
  2019-02-06 13:03 ` [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer Chris Wilson
@ 2019-02-11 18:18   ` Tvrtko Ursulin
  2019-02-12 13:40     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 18:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> As we no longer have a precise indication of requests queued to an
> engine, make no presumptions and just sample the ring registers to see
> if the engine is busy.
> 
> v2: Report busy while the ring is idling on a semaphore/event.

I was planning to take care of this detail but cool, no complaints. :)

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_pmu.c | 55 +++++++++++++--------------------
>   1 file changed, 21 insertions(+), 34 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> index 13d70b90dd0f..157cbfa155d9 100644
> --- a/drivers/gpu/drm/i915/i915_pmu.c
> +++ b/drivers/gpu/drm/i915/i915_pmu.c
> @@ -101,7 +101,7 @@ static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
>   	 *
>   	 * Use RCS as proxy for all engines.
>   	 */
> -	else if (intel_engine_supports_stats(i915->engine[RCS]))
> +	else if (i915->caps.scheduler & I915_SCHEDULER_CAP_PMU)

Need to nuke the comment as well.

But my problem is I still think I915_SCHEDULER_CAP_PMU is wrong name and 
level. It is neither a scheduler feature, nor the whole PMU. Maybe 
I915_SCHEDULER_CAP_ENGINE_STATS removes one contention point, but still 
I am wondering if I could refactor how the PMU tracks the need for 
having the sampling timer and so remove the need for proxying from RCS 
via that route.

>   		enable &= ~BIT(I915_SAMPLE_BUSY);
>   
>   	/*
> @@ -148,14 +148,6 @@ void i915_pmu_gt_unparked(struct drm_i915_private *i915)
>   	spin_unlock_irq(&i915->pmu.lock);
>   }
>   
> -static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
> -{
> -	if (!fw)
> -		intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> -
> -	return true;
> -}
> -
>   static void
>   add_sample(struct i915_pmu_sample *sample, u32 val)
>   {
> @@ -168,7 +160,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>   	struct intel_engine_cs *engine;
>   	enum intel_engine_id id;
>   	intel_wakeref_t wakeref;
> -	bool fw = false;
>   
>   	if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
>   		return;
> @@ -181,36 +172,32 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
>   		return;
>   
>   	for_each_engine(engine, dev_priv, id) {
> -		u32 current_seqno = intel_engine_get_seqno(engine);
> -		u32 last_seqno = intel_engine_last_submit(engine);
> +		typeof(engine->pmu) *pmu = &engine->pmu;

I would also prefer we did not start introducing the idiom of declaring 
locals outside macros with typeof.

> +		bool busy;
>   		u32 val;
>   
> -		val = !i915_seqno_passed(current_seqno, last_seqno);
> -
> -		if (val)
> -			add_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
> -				   period_ns);
> -
> -		if (val && (engine->pmu.enable &
> -		    (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
> -			fw = grab_forcewake(dev_priv, fw);
> -
> -			val = I915_READ_FW(RING_CTL(engine->mmio_base));
> -		} else {
> -			val = 0;
> -		}
> +		val = I915_READ_FW(RING_CTL(engine->mmio_base));
> +		if (val == 0 || val == ~0u) /* outside of powerwell */
> +			continue;
Would /* Powerwell not awake. */ be clearer?

So the claim is we can rely on register being either all zeros or all 
ones when powered down? Absolutely 100%? Is this documented somewhere? 
But still need the runtime pm ref?

Regards,

Tvrtko

>   
>   		if (val & RING_WAIT)
> -			add_sample(&engine->pmu.sample[I915_SAMPLE_WAIT],
> -				   period_ns);
> -
> +			add_sample(&pmu->sample[I915_SAMPLE_WAIT], period_ns);
>   		if (val & RING_WAIT_SEMAPHORE)
> -			add_sample(&engine->pmu.sample[I915_SAMPLE_SEMA],
> -				   period_ns);
> -	}
> +			add_sample(&pmu->sample[I915_SAMPLE_SEMA], period_ns);
>   
> -	if (fw)
> -		intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
> +		/*
> +		 * MI_MODE reports IDLE if the ring is waiting, but we regard
> +		 * this as being busy instead, as the engine is busy with the
> +		 * user request.
> +		 */
> +		busy = val & (RING_WAIT_SEMAPHORE | RING_WAIT);
> +		if (!busy) {
> +			val = I915_READ_FW(RING_MI_MODE(engine->mmio_base));
> +			busy = !(val & MODE_IDLE);
> +		}
> +		if (busy)
> +			add_sample(&pmu->sample[I915_SAMPLE_BUSY], period_ns);
> +	}
>   
>   	intel_runtime_pm_put(dev_priv, wakeref);
>   }
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 20/46] drm/i915: Remove access to global seqno in the HWSP
  2019-02-06 13:03 ` [PATCH 20/46] drm/i915: Remove access to global seqno in the HWSP Chris Wilson
@ 2019-02-11 18:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 18:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> Stop accessing the HWSP to read the global seqno, and stop tracking the
> mirror in the engine's execution timeline -- it is unused.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gpu_error.c         |  4 --
>   drivers/gpu/drm/i915/i915_gpu_error.h         |  3 --
>   drivers/gpu/drm/i915/i915_request.c           | 27 +++++--------
>   drivers/gpu/drm/i915/i915_reset.c             |  1 -
>   drivers/gpu/drm/i915/intel_engine_cs.c        | 14 +------
>   drivers/gpu/drm/i915/intel_lrc.c              | 21 +++-------
>   drivers/gpu/drm/i915/intel_ringbuffer.c       |  7 +---
>   drivers/gpu/drm/i915/intel_ringbuffer.h       | 40 -------------------
>   drivers/gpu/drm/i915/selftests/i915_request.c |  3 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  2 -
>   10 files changed, 19 insertions(+), 103 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index 9a65341fec09..a674c78ca1f8 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -533,8 +533,6 @@ static void error_print_engine(struct drm_i915_error_state_buf *m,
>   				   ee->vm_info.pp_dir_base);
>   		}
>   	}
> -	err_printf(m, "  seqno: 0x%08x\n", ee->seqno);
> -	err_printf(m, "  last_seqno: 0x%08x\n", ee->last_seqno);
>   	err_printf(m, "  ring->head: 0x%08x\n", ee->cpu_ring_head);
>   	err_printf(m, "  ring->tail: 0x%08x\n", ee->cpu_ring_tail);
>   	err_printf(m, "  hangcheck timestamp: %dms (%lu%s)\n",
> @@ -1227,8 +1225,6 @@ static void error_record_engine_registers(struct i915_gpu_state *error,
>   
>   	ee->instpm = I915_READ(RING_INSTPM(engine->mmio_base));
>   	ee->acthd = intel_engine_get_active_head(engine);
> -	ee->seqno = intel_engine_get_seqno(engine);
> -	ee->last_seqno = intel_engine_last_submit(engine);
>   	ee->start = I915_READ_START(engine);
>   	ee->head = I915_READ_HEAD(engine);
>   	ee->tail = I915_READ_TAIL(engine);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index d5c58e82508b..4dbbd0f02edb 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -94,8 +94,6 @@ struct i915_gpu_state {
>   		u32 cpu_ring_head;
>   		u32 cpu_ring_tail;
>   
> -		u32 last_seqno;
> -
>   		/* Register state */
>   		u32 start;
>   		u32 tail;
> @@ -108,7 +106,6 @@ struct i915_gpu_state {
>   		u32 bbstate;
>   		u32 instpm;
>   		u32 instps;
> -		u32 seqno;
>   		u64 bbaddr;
>   		u64 acthd;
>   		u32 fault_reg;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index eed66d3606d9..85cf5cfbc7ed 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -192,12 +192,11 @@ static void free_capture_list(struct i915_request *request)
>   static void __retire_engine_request(struct intel_engine_cs *engine,
>   				    struct i915_request *rq)
>   {
> -	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d:%d\n",
> +	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
>   		  __func__, engine->name,
>   		  rq->fence.context, rq->fence.seqno,
>   		  rq->global_seqno,
> -		  hwsp_seqno(rq),
> -		  intel_engine_get_seqno(engine));
> +		  hwsp_seqno(rq));
>   
>   	GEM_BUG_ON(!i915_request_completed(rq));
>   
> @@ -256,12 +255,11 @@ static void i915_request_retire(struct i915_request *request)
>   {
>   	struct i915_active_request *active, *next;
>   
> -	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
> +	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
>   		  request->engine->name,
>   		  request->fence.context, request->fence.seqno,
>   		  request->global_seqno,
> -		  hwsp_seqno(request),
> -		  intel_engine_get_seqno(request->engine));
> +		  hwsp_seqno(request));
>   
>   	lockdep_assert_held(&request->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!i915_sw_fence_signaled(&request->submit));
> @@ -320,12 +318,11 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	struct intel_ring *ring = rq->ring;
>   	struct i915_request *tmp;
>   
> -	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d:%d\n",
> +	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
>   		  rq->engine->name,
>   		  rq->fence.context, rq->fence.seqno,
>   		  rq->global_seqno,
> -		  hwsp_seqno(rq),
> -		  intel_engine_get_seqno(rq->engine));
> +		  hwsp_seqno(rq));
>   
>   	lockdep_assert_held(&rq->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!i915_request_completed(rq));
> @@ -427,12 +424,11 @@ void __i915_request_submit(struct i915_request *request)
>   	struct intel_engine_cs *engine = request->engine;
>   	u32 seqno;
>   
> -	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d:%d\n",
> +	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
>   		  engine->name,
>   		  request->fence.context, request->fence.seqno,
>   		  engine->timeline.seqno + 1,
> -		  hwsp_seqno(request),
> -		  intel_engine_get_seqno(engine));
> +		  hwsp_seqno(request));
>   
>   	GEM_BUG_ON(!irqs_disabled());
>   	lockdep_assert_held(&engine->timeline.lock);
> @@ -441,7 +437,6 @@ void __i915_request_submit(struct i915_request *request)
>   
>   	seqno = next_global_seqno(&engine->timeline);
>   	GEM_BUG_ON(!seqno);
> -	GEM_BUG_ON(intel_engine_signaled(engine, seqno));
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> @@ -492,12 +487,11 @@ void __i915_request_unsubmit(struct i915_request *request)
>   {
>   	struct intel_engine_cs *engine = request->engine;
>   
> -	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d:%d\n",
> +	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
>   		  engine->name,
>   		  request->fence.context, request->fence.seqno,
>   		  request->global_seqno,
> -		  hwsp_seqno(request),
> -		  intel_engine_get_seqno(engine));
> +		  hwsp_seqno(request));
>   
>   	GEM_BUG_ON(!irqs_disabled());
>   	lockdep_assert_held(&engine->timeline.lock);
> @@ -508,7 +502,6 @@ void __i915_request_unsubmit(struct i915_request *request)
>   	 */
>   	GEM_BUG_ON(!request->global_seqno);
>   	GEM_BUG_ON(request->global_seqno != engine->timeline.seqno);
> -	GEM_BUG_ON(intel_engine_has_completed(engine, request->global_seqno));
>   	engine->timeline.seqno--;
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index b629f25a81f0..7051c0a43941 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -787,7 +787,6 @@ static void nop_submit_request(struct i915_request *request)
>   	spin_lock_irqsave(&engine->timeline.lock, flags);
>   	__i915_request_submit(request);
>   	i915_request_mark_complete(request);
> -	intel_engine_write_global_seqno(engine, request->global_seqno);
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   
>   	intel_engine_queue_breadcrumbs(engine);
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index e1e54b7448b4..ea370ed094a5 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -455,12 +455,6 @@ int intel_engines_init(struct drm_i915_private *dev_priv)
>   	return err;
>   }
>   
> -void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno)
> -{
> -	intel_write_status_page(engine, I915_GEM_HWS_INDEX, seqno);
> -	GEM_BUG_ON(intel_engine_get_seqno(engine) != seqno);
> -}
> -
>   static void intel_engine_init_batch_pool(struct intel_engine_cs *engine)
>   {
>   	i915_gem_batch_pool_init(&engine->batch_pool, engine);
> @@ -1053,10 +1047,6 @@ bool intel_engine_is_idle(struct intel_engine_cs *engine)
>   	if (i915_terminally_wedged(&dev_priv->gpu_error))
>   		return true;
>   
> -	/* Any inflight/incomplete requests? */
> -	if (!intel_engine_signaled(engine, intel_engine_last_submit(engine)))
> -		return false;
> -
>   	/* Waiting to drain ELSP? */
>   	if (READ_ONCE(engine->execlists.active)) {
>   		struct tasklet_struct *t = &engine->execlists.tasklet;
> @@ -1538,9 +1528,7 @@ void intel_engine_dump(struct intel_engine_cs *engine,
>   	if (i915_terminally_wedged(&engine->i915->gpu_error))
>   		drm_printf(m, "*** WEDGED ***\n");
>   
> -	drm_printf(m, "\tcurrent seqno %x, last %x, hangcheck %x/%x [%d ms]\n",
> -		   intel_engine_get_seqno(engine),
> -		   intel_engine_last_submit(engine),
> +	drm_printf(m, "\tHangcheck %x:%x [%d ms]\n",
>   		   engine->hangcheck.last_seqno,
>   		   engine->hangcheck.next_seqno,
>   		   jiffies_to_msecs(jiffies - engine->hangcheck.action_timestamp));
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 342d3a91be03..2f2c27e6ae6d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -565,13 +565,12 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>   			desc = execlists_update_context(rq);
>   			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
>   
> -			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
> +			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
>   				  engine->name, n,
>   				  port[n].context_id, count,
>   				  rq->global_seqno,
>   				  rq->fence.context, rq->fence.seqno,
>   				  hwsp_seqno(rq),
> -				  intel_engine_get_seqno(engine),
>   				  rq_prio(rq));
>   		} else {
>   			GEM_BUG_ON(!n);
> @@ -876,13 +875,12 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
>   	while (num_ports-- && port_isset(port)) {
>   		struct i915_request *rq = port_request(port);
>   
> -		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d:%d)\n",
> +		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
>   			  rq->engine->name,
>   			  (unsigned int)(port - execlists->port),
>   			  rq->global_seqno,
>   			  rq->fence.context, rq->fence.seqno,
> -			  hwsp_seqno(rq),
> -			  intel_engine_get_seqno(rq->engine));
> +			  hwsp_seqno(rq));
>   
>   		GEM_BUG_ON(!execlists->active);
>   		execlists_context_schedule_out(rq,
> @@ -938,8 +936,7 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   	struct rb_node *rb;
>   	unsigned long flags;
>   
> -	GEM_TRACE("%s current %d\n",
> -		  engine->name, intel_engine_get_seqno(engine));
> +	GEM_TRACE("%s\n", engine->name);
>   
>   	/*
>   	 * Before we call engine->cancel_requests(), we should have exclusive
> @@ -987,10 +984,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   		i915_priolist_free(p);
>   	}
>   
> -	intel_write_status_page(engine,
> -				I915_GEM_HWS_INDEX,
> -				intel_engine_last_submit(engine));
> -
>   	/* Remaining _unready_ requests will be nop'ed when submitted */
>   
>   	execlists->queue_priority_hint = INT_MIN;
> @@ -1106,14 +1099,13 @@ static void process_csb(struct intel_engine_cs *engine)
>   						EXECLISTS_ACTIVE_USER));
>   
>   		rq = port_unpack(port, &count);
> -		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d:%d), prio=%d\n",
> +		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
>   			  engine->name,
>   			  port->context_id, count,
>   			  rq ? rq->global_seqno : 0,
>   			  rq ? rq->fence.context : 0,
>   			  rq ? rq->fence.seqno : 0,
>   			  rq ? hwsp_seqno(rq) : 0,
> -			  intel_engine_get_seqno(engine),
>   			  rq ? rq_prio(rq) : 0);
>   
>   		/* Check the context/desc id for this event matches */
> @@ -1975,10 +1967,9 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
>   	/* Following the reset, we need to reload the CSB read/write pointers */
>   	reset_csb_pointers(&engine->execlists);
>   
> -	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
> +	GEM_TRACE("%s seqno=%d, stalled? %s\n",
>   		  engine->name,
>   		  rq ? rq->global_seqno : 0,
> -		  intel_engine_get_seqno(engine),
>   		  yesno(stalled));
>   	if (!rq)
>   		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 870184bbd169..2d59e2990448 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -782,10 +782,9 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
>   		}
>   	}
>   
> -	GEM_TRACE("%s seqno=%d, current=%d, stalled? %s\n",
> +	GEM_TRACE("%s seqno=%d, stalled? %s\n",
>   		  engine->name,
>   		  rq ? rq->global_seqno : 0,
> -		  intel_engine_get_seqno(engine),
>   		  yesno(stalled));
>   	/*
>   	 * The guilty request will get skipped on a hung engine.
> @@ -924,10 +923,6 @@ static void cancel_requests(struct intel_engine_cs *engine)
>   		i915_request_mark_complete(request);
>   	}
>   
> -	intel_write_status_page(engine,
> -				I915_GEM_HWS_INDEX,
> -				intel_engine_last_submit(engine));
> -
>   	/* Remaining _unready_ requests will be nop'ed when submitted */
>   
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b30c37ac55a3..26bae7772208 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -840,8 +840,6 @@ __intel_ring_space(unsigned int head, unsigned int tail, unsigned int size)
>   	return (head - tail - CACHELINE_BYTES) & (size - 1);
>   }
>   
> -void intel_engine_write_global_seqno(struct intel_engine_cs *engine, u32 seqno);
> -
>   int intel_engine_setup_common(struct intel_engine_cs *engine);
>   int intel_engine_init_common(struct intel_engine_cs *engine);
>   void intel_engine_cleanup_common(struct intel_engine_cs *engine);
> @@ -859,44 +857,6 @@ void intel_engine_set_hwsp_writemask(struct intel_engine_cs *engine, u32 mask);
>   u64 intel_engine_get_active_head(const struct intel_engine_cs *engine);
>   u64 intel_engine_get_last_batch_head(const struct intel_engine_cs *engine);
>   
> -static inline u32 intel_engine_last_submit(struct intel_engine_cs *engine)
> -{
> -	/*
> -	 * We are only peeking at the tail of the submit queue (and not the
> -	 * queue itself) in order to gain a hint as to the current active
> -	 * state of the engine. Callers are not expected to be taking
> -	 * engine->timeline->lock, nor are they expected to be concerned
> -	 * wtih serialising this hint with anything, so document it as
> -	 * a hint and nothing more.
> -	 */
> -	return READ_ONCE(engine->timeline.seqno);
> -}
> -
> -static inline u32 intel_engine_get_seqno(struct intel_engine_cs *engine)
> -{
> -	return intel_read_status_page(engine, I915_GEM_HWS_INDEX);
> -}
> -
> -static inline bool intel_engine_signaled(struct intel_engine_cs *engine,
> -					 u32 seqno)
> -{
> -	return i915_seqno_passed(intel_engine_get_seqno(engine), seqno);
> -}
> -
> -static inline bool intel_engine_has_completed(struct intel_engine_cs *engine,
> -					      u32 seqno)
> -{
> -	GEM_BUG_ON(!seqno);
> -	return intel_engine_signaled(engine, seqno);
> -}
> -
> -static inline bool intel_engine_has_started(struct intel_engine_cs *engine,
> -					    u32 seqno)
> -{
> -	GEM_BUG_ON(!seqno);
> -	return intel_engine_signaled(engine, seqno - 1);
> -}
> -
>   void intel_engine_get_instdone(struct intel_engine_cs *engine,
>   			       struct intel_instdone *instdone);
>   
> diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
> index 6733dc5b6b4c..074d393f4a02 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_request.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_request.c
> @@ -226,8 +226,7 @@ static int igt_request_rewind(void *arg)
>   	mutex_unlock(&i915->drm.struct_mutex);
>   
>   	if (i915_request_wait(vip, 0, HZ) == -ETIME) {
> -		pr_err("timed out waiting for high priority request, vip.seqno=%d, current seqno=%d\n",
> -		       vip->global_seqno, intel_engine_get_seqno(i915->engine[RCS]));
> +		pr_err("timed out waiting for high priority request\n");
>   		goto err;
>   	}
>   
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 0d35af07867b..f055da01ced9 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -86,7 +86,6 @@ static struct i915_request *first_request(struct mock_engine *engine)
>   static void advance(struct i915_request *request)
>   {
>   	list_del_init(&request->mock.link);
> -	intel_engine_write_global_seqno(request->engine, request->global_seqno);
>   	i915_request_mark_complete(request);
>   	GEM_BUG_ON(!i915_request_completed(request));
>   
> @@ -276,7 +275,6 @@ void mock_engine_flush(struct intel_engine_cs *engine)
>   
>   void mock_engine_reset(struct intel_engine_cs *engine)
>   {
> -	intel_engine_write_global_seqno(engine, 0);
>   }
>   
>   void mock_engine_free(struct intel_engine_cs *engine)
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 21/46] drm/i915: Remove i915_request.global_seqno
  2019-02-06 13:03 ` [PATCH 21/46] drm/i915: Remove i915_request.global_seqno Chris Wilson
@ 2019-02-11 18:44   ` Tvrtko Ursulin
  2019-02-12 13:45     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 18:44 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> Having weaned the interrupt handling off using a single global execution
> queue, we no longer need to emit a global_seqno.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gpu_error.c         | 35 ++-----------
>   drivers/gpu/drm/i915/i915_gpu_error.h         |  2 -
>   drivers/gpu/drm/i915/i915_request.c           | 31 ++----------
>   drivers/gpu/drm/i915/i915_request.h           | 32 ------------
>   drivers/gpu/drm/i915/i915_trace.h             | 25 +++-------
>   drivers/gpu/drm/i915/intel_engine_cs.c        |  5 +-
>   drivers/gpu/drm/i915/intel_guc_submission.c   |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c              | 34 ++-----------
>   drivers/gpu/drm/i915/intel_ringbuffer.c       | 50 +++----------------
>   drivers/gpu/drm/i915/intel_ringbuffer.h       |  2 -
>   .../gpu/drm/i915/selftests/intel_hangcheck.c  |  5 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c  |  1 -
>   12 files changed, 31 insertions(+), 193 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index a674c78ca1f8..8792ad12373d 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -380,19 +380,16 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
>   	err_printf(m, "%s [%d]:\n", name, count);
>   
>   	while (count--) {
> -		err_printf(m, "    %08x_%08x %8u %02x %02x %02x",
> +		err_printf(m, "    %08x_%08x %8u %02x %02x",
>   			   upper_32_bits(err->gtt_offset),
>   			   lower_32_bits(err->gtt_offset),
>   			   err->size,
>   			   err->read_domains,
> -			   err->write_domain,
> -			   err->wseqno);
> +			   err->write_domain);
>   		err_puts(m, tiling_flag(err->tiling));
>   		err_puts(m, dirty_flag(err->dirty));
>   		err_puts(m, purgeable_flag(err->purgeable));
>   		err_puts(m, err->userptr ? " userptr" : "");
> -		err_puts(m, err->engine != -1 ? " " : "");
> -		err_puts(m, engine_name(m->i915, err->engine));

Why remove this information?

>   		err_puts(m, i915_cache_level_str(m->i915, err->cache_level));
>   
>   		if (err->name)
> @@ -1059,27 +1056,6 @@ i915_error_object_create(struct drm_i915_private *i915,
>   	return dst;
>   }
>   
> -/* The error capture is special as tries to run underneath the normal
> - * locking rules - so we use the raw version of the i915_active_request lookup.
> - */
> -static inline u32
> -__active_get_seqno(struct i915_active_request *active)
> -{
> -	struct i915_request *request;
> -
> -	request = __i915_active_request_peek(active);
> -	return request ? request->global_seqno : 0;
> -}
> -
> -static inline int
> -__active_get_engine_id(struct i915_active_request *active)
> -{
> -	struct i915_request *request;
> -
> -	request = __i915_active_request_peek(active);
> -	return request ? request->engine->id : -1;
> -}
> -
>   static void capture_bo(struct drm_i915_error_buffer *err,
>   		       struct i915_vma *vma)
>   {
> @@ -1088,9 +1064,6 @@ static void capture_bo(struct drm_i915_error_buffer *err,
>   	err->size = obj->base.size;
>   	err->name = obj->base.name;
>   
> -	err->wseqno = __active_get_seqno(&obj->frontbuffer_write);
> -	err->engine = __active_get_engine_id(&obj->frontbuffer_write);
> -
>   	err->gtt_offset = vma->node.start;
>   	err->read_domains = obj->read_domains;
>   	err->write_domain = obj->write_domain;
> @@ -1295,10 +1268,10 @@ static void record_request(struct i915_request *request,
>   	struct i915_gem_context *ctx = request->gem_context;
>   
>   	erq->flags = request->fence.flags;
> -	erq->context = ctx->hw_id;
> +	erq->context = request->fence.context;
> +	erq->seqno = request->fence.seqno;
>   	erq->sched_attr = request->sched.attr;
>   	erq->ban_score = atomic_read(&ctx->ban_score);
> -	erq->seqno = request->global_seqno;
>   	erq->jiffies = request->emitted_jiffies;
>   	erq->start = i915_ggtt_offset(request->ring->vma);
>   	erq->head = request->head;
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 4dbbd0f02edb..34fec5f00ef2 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -167,7 +167,6 @@ struct i915_gpu_state {
>   	struct drm_i915_error_buffer {
>   		u32 size;
>   		u32 name;
> -		u32 wseqno;
>   		u64 gtt_offset;
>   		u32 read_domains;
>   		u32 write_domain;
> @@ -176,7 +175,6 @@ struct i915_gpu_state {
>   		u32 dirty:1;
>   		u32 purgeable:1;
>   		u32 userptr:1;
> -		s32 engine:4;
>   		u32 cache_level:3;
>   	} *active_bo[I915_NUM_ENGINES], *pinned_bo;
>   	u32 active_bo_count[I915_NUM_ENGINES], pinned_bo_count;
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 85cf5cfbc7ed..8321f2d8a301 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -192,10 +192,9 @@ static void free_capture_list(struct i915_request *request)
>   static void __retire_engine_request(struct intel_engine_cs *engine,
>   				    struct i915_request *rq)
>   {
> -	GEM_TRACE("%s(%s) fence %llx:%lld, global=%d, current %d\n",
> +	GEM_TRACE("%s(%s) fence %llx:%lld, current %d\n",
>   		  __func__, engine->name,
>   		  rq->fence.context, rq->fence.seqno,
> -		  rq->global_seqno,
>   		  hwsp_seqno(rq));
>   
>   	GEM_BUG_ON(!i915_request_completed(rq));
> @@ -255,10 +254,9 @@ static void i915_request_retire(struct i915_request *request)
>   {
>   	struct i915_active_request *active, *next;
>   
> -	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld, current %d\n",
>   		  request->engine->name,
>   		  request->fence.context, request->fence.seqno,
> -		  request->global_seqno,
>   		  hwsp_seqno(request));
>   
>   	lockdep_assert_held(&request->i915->drm.struct_mutex);
> @@ -318,10 +316,9 @@ void i915_request_retire_upto(struct i915_request *rq)
>   	struct intel_ring *ring = rq->ring;
>   	struct i915_request *tmp;
>   
> -	GEM_TRACE("%s fence %llx:%lld, global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld, current %d\n",
>   		  rq->engine->name,
>   		  rq->fence.context, rq->fence.seqno,
> -		  rq->global_seqno,
>   		  hwsp_seqno(rq));
>   
>   	lockdep_assert_held(&rq->i915->drm.struct_mutex);
> @@ -412,17 +409,9 @@ static void move_to_timeline(struct i915_request *request,
>   	spin_unlock(&request->timeline->lock);
>   }
>   
> -static u32 next_global_seqno(struct i915_timeline *tl)
> -{
> -	if (!++tl->seqno)
> -		++tl->seqno;
> -	return tl->seqno;
> -}
> -
>   void __i915_request_submit(struct i915_request *request)
>   {
>   	struct intel_engine_cs *engine = request->engine;
> -	u32 seqno;
>   
>   	GEM_TRACE("%s fence %llx:%lld -> global=%d, current %d\n",
>   		  engine->name,
> @@ -433,18 +422,12 @@ void __i915_request_submit(struct i915_request *request)
>   	GEM_BUG_ON(!irqs_disabled());
>   	lockdep_assert_held(&engine->timeline.lock);
>   
> -	GEM_BUG_ON(request->global_seqno);
> -
> -	seqno = next_global_seqno(&engine->timeline);
> -	GEM_BUG_ON(!seqno);
> -
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
>   
>   	GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>   	set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
>   
> -	request->global_seqno = seqno;
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
>   	    !i915_request_enable_breadcrumb(request))
>   		intel_engine_queue_breadcrumbs(engine);
> @@ -487,10 +470,9 @@ void __i915_request_unsubmit(struct i915_request *request)
>   {
>   	struct intel_engine_cs *engine = request->engine;
>   
> -	GEM_TRACE("%s fence %llx:%lld <- global=%d, current %d\n",
> +	GEM_TRACE("%s fence %llx:%lld <- current %d\n",
>   		  engine->name,
>   		  request->fence.context, request->fence.seqno,
> -		  request->global_seqno,
>   		  hwsp_seqno(request));
>   
>   	GEM_BUG_ON(!irqs_disabled());
> @@ -500,13 +482,9 @@ void __i915_request_unsubmit(struct i915_request *request)
>   	 * Only unwind in reverse order, required so that the per-context list
>   	 * is kept in seqno/ring order.
>   	 */
> -	GEM_BUG_ON(!request->global_seqno);
> -	GEM_BUG_ON(request->global_seqno != engine->timeline.seqno);
> -	engine->timeline.seqno--;
>   
>   	/* We may be recursing from the signal callback of another i915 fence */
>   	spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> -	request->global_seqno = 0;
>   	if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags))
>   		i915_request_cancel_breadcrumb(request);
>   	GEM_BUG_ON(!test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
> @@ -724,7 +702,6 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	i915_sched_node_init(&rq->sched);
>   
>   	/* No zalloc, must clear what we need by hand */
> -	rq->global_seqno = 0;
>   	rq->file_priv = NULL;
>   	rq->batch = NULL;
>   	rq->capture_list = NULL;
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index df52776b26cf..5a32167ee892 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -149,14 +149,6 @@ struct i915_request {
>   	 */
>   	const u32 *hwsp_seqno;
>   
> -	/**
> -	 * GEM sequence number associated with this request on the
> -	 * global execution timeline. It is zero when the request is not
> -	 * on the HW queue (i.e. not on the engine timeline list).
> -	 * Its value is guarded by the timeline spinlock.
> -	 */
> -	u32 global_seqno;
> -
>   	/** Position in the ring of the start of the request */
>   	u32 head;
>   
> @@ -254,30 +246,6 @@ i915_request_put(struct i915_request *rq)
>   	dma_fence_put(&rq->fence);
>   }
>   
> -/**
> - * i915_request_global_seqno - report the current global seqno
> - * @request - the request
> - *
> - * A request is assigned a global seqno only when it is on the hardware
> - * execution queue. The global seqno can be used to maintain a list of
> - * requests on the same engine in retirement order, for example for
> - * constructing a priority queue for waiting. Prior to its execution, or
> - * if it is subsequently removed in the event of preemption, its global
> - * seqno is zero. As both insertion and removal from the execution queue
> - * may operate in IRQ context, it is not guarded by the usual struct_mutex
> - * BKL. Instead those relying on the global seqno must be prepared for its
> - * value to change between reads. Only when the request is complete can
> - * the global seqno be stable (due to the memory barriers on submitting
> - * the commands to the hardware to write the breadcrumb, if the HWS shows
> - * that it has passed the global seqno and the global seqno is unchanged
> - * after the read, it is indeed complete).
> - */
> -static inline u32
> -i915_request_global_seqno(const struct i915_request *request)
> -{
> -	return READ_ONCE(request->global_seqno);
> -}
> -
>   int i915_request_await_object(struct i915_request *to,
>   			      struct drm_i915_gem_object *obj,
>   			      bool write);
> diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
> index eab313c3163c..d1d0d9e5f384 100644
> --- a/drivers/gpu/drm/i915/i915_trace.h
> +++ b/drivers/gpu/drm/i915/i915_trace.h
> @@ -627,7 +627,6 @@ DECLARE_EVENT_CLASS(i915_request,
>   			     __field(u16, class)
>   			     __field(u16, instance)
>   			     __field(u32, seqno)
> -			     __field(u32, global)
>   			     ),
>   
>   	    TP_fast_assign(
> @@ -637,13 +636,11 @@ DECLARE_EVENT_CLASS(i915_request,
>   			   __entry->instance = rq->engine->instance;
>   			   __entry->ctx = rq->fence.context;
>   			   __entry->seqno = rq->fence.seqno;
> -			   __entry->global = rq->global_seqno;
>   			   ),
>   
> -	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, global=%u",
> +	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u",
>   		      __entry->dev, __entry->class, __entry->instance,
> -		      __entry->hw_id, __entry->ctx, __entry->seqno,
> -		      __entry->global)
> +		      __entry->hw_id, __entry->ctx, __entry->seqno)
>   );
>   
>   DEFINE_EVENT(i915_request, i915_request_add,
> @@ -673,7 +670,6 @@ TRACE_EVENT(i915_request_in,
>   			     __field(u16, class)
>   			     __field(u16, instance)
>   			     __field(u32, seqno)
> -			     __field(u32, global_seqno)
>   			     __field(u32, port)
>   			     __field(u32, prio)
>   			    ),
> @@ -685,15 +681,14 @@ TRACE_EVENT(i915_request_in,
>   			   __entry->instance = rq->engine->instance;
>   			   __entry->ctx = rq->fence.context;
>   			   __entry->seqno = rq->fence.seqno;
> -			   __entry->global_seqno = rq->global_seqno;
>   			   __entry->prio = rq->sched.attr.priority;
>   			   __entry->port = port;
>   			   ),
>   
> -	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, prio=%u, global=%u, port=%u",
> +	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, prio=%u, port=%u",
>   		      __entry->dev, __entry->class, __entry->instance,
>   		      __entry->hw_id, __entry->ctx, __entry->seqno,
> -		      __entry->prio, __entry->global_seqno, __entry->port)
> +		      __entry->prio, __entry->port)
>   );
>   
>   TRACE_EVENT(i915_request_out,
> @@ -707,7 +702,6 @@ TRACE_EVENT(i915_request_out,
>   			     __field(u16, class)
>   			     __field(u16, instance)
>   			     __field(u32, seqno)
> -			     __field(u32, global_seqno)
>   			     __field(u32, completed)
>   			    ),
>   
> @@ -718,14 +712,13 @@ TRACE_EVENT(i915_request_out,
>   			   __entry->instance = rq->engine->instance;
>   			   __entry->ctx = rq->fence.context;
>   			   __entry->seqno = rq->fence.seqno;
> -			   __entry->global_seqno = rq->global_seqno;
>   			   __entry->completed = i915_request_completed(rq);
>   			   ),
>   
> -		    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, global=%u, completed?=%u",
> +		    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, completed?=%u",
>   			      __entry->dev, __entry->class, __entry->instance,
>   			      __entry->hw_id, __entry->ctx, __entry->seqno,
> -			      __entry->global_seqno, __entry->completed)
> +			      __entry->completed)
>   );
>   
>   #else
> @@ -768,7 +761,6 @@ TRACE_EVENT(i915_request_wait_begin,
>   			     __field(u16, class)
>   			     __field(u16, instance)
>   			     __field(u32, seqno)
> -			     __field(u32, global)
>   			     __field(unsigned int, flags)
>   			     ),
>   
> @@ -785,14 +777,13 @@ TRACE_EVENT(i915_request_wait_begin,
>   			   __entry->instance = rq->engine->instance;
>   			   __entry->ctx = rq->fence.context;
>   			   __entry->seqno = rq->fence.seqno;
> -			   __entry->global = rq->global_seqno;
>   			   __entry->flags = flags;
>   			   ),
>   
> -	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, global=%u, blocking=%u, flags=0x%x",
> +	    TP_printk("dev=%u, engine=%u:%u, hw_id=%u, ctx=%llu, seqno=%u, blocking=%u, flags=0x%x",
>   		      __entry->dev, __entry->class, __entry->instance,
>   		      __entry->hw_id, __entry->ctx, __entry->seqno,
> -		      __entry->global, !!(__entry->flags & I915_WAIT_LOCKED),
> +		      !!(__entry->flags & I915_WAIT_LOCKED),
>   		      __entry->flags)
>   );
>   
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index ea370ed094a5..ce7c19f2ae49 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -1313,15 +1313,14 @@ static void print_request(struct drm_printer *m,
>   
>   	x = print_sched_attr(rq->i915, &rq->sched.attr, buf, x, sizeof(buf));
>   
> -	drm_printf(m, "%s%x%s%s [%llx:%llx]%s @ %dms: %s\n",
> +	drm_printf(m, "%s %llx:%llx%s%s %s @ %dms: %s\n",
>   		   prefix,
> -		   rq->global_seqno,
> +		   rq->fence.context, rq->fence.seqno,
>   		   i915_request_completed(rq) ? "!" :
>   		   i915_request_started(rq) ? "*" :
>   		   "",
>   		   test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT,
>   			    &rq->fence.flags) ?  "+" : "",
> -		   rq->fence.context, rq->fence.seqno,
>   		   buf,
>   		   jiffies_to_msecs(jiffies - rq->emitted_jiffies),
>   		   name);
> diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
> index 4cf94513615d..4366db7978a8 100644
> --- a/drivers/gpu/drm/i915/intel_guc_submission.c
> +++ b/drivers/gpu/drm/i915/intel_guc_submission.c
> @@ -535,7 +535,7 @@ static void guc_add_request(struct intel_guc *guc, struct i915_request *rq)
>   	spin_lock(&client->wq_lock);
>   
>   	guc_wq_item_append(client, engine->guc_id, ctx_desc,
> -			   ring_tail, rq->global_seqno);
> +			   ring_tail, rq->fence.seqno);
>   	guc_ring_doorbell(client);
>   
>   	client->submissions[engine->id] += 1;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2f2c27e6ae6d..2424eb2b1fc6 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -174,12 +174,6 @@ static void execlists_init_reg_state(u32 *reg_state,
>   				     struct intel_engine_cs *engine,
>   				     struct intel_ring *ring);
>   
> -static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
> -{
> -	return (i915_ggtt_offset(engine->status_page.vma) +
> -		I915_GEM_HWS_INDEX_ADDR);
> -}
> -
>   static inline u32 intel_hws_hangcheck_address(struct intel_engine_cs *engine)
>   {
>   	return (i915_ggtt_offset(engine->status_page.vma) +
> @@ -565,10 +559,9 @@ static void execlists_submit_ports(struct intel_engine_cs *engine)
>   			desc = execlists_update_context(rq);
>   			GEM_DEBUG_EXEC(port[n].context_id = upper_32_bits(desc));
>   
> -			GEM_TRACE("%s in[%d]:  ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
> +			GEM_TRACE("%s in[%d]:  ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n",
>   				  engine->name, n,
>   				  port[n].context_id, count,
> -				  rq->global_seqno,
>   				  rq->fence.context, rq->fence.seqno,
>   				  hwsp_seqno(rq),
>   				  rq_prio(rq));
> @@ -875,10 +868,9 @@ execlists_cancel_port_requests(struct intel_engine_execlists * const execlists)
>   	while (num_ports-- && port_isset(port)) {
>   		struct i915_request *rq = port_request(port);
>   
> -		GEM_TRACE("%s:port%u global=%d (fence %llx:%lld), (current %d)\n",
> +		GEM_TRACE("%s:port%u fence %llx:%lld, (current %d)\n",
>   			  rq->engine->name,
>   			  (unsigned int)(port - execlists->port),
> -			  rq->global_seqno,
>   			  rq->fence.context, rq->fence.seqno,
>   			  hwsp_seqno(rq));
>   
> @@ -960,8 +952,6 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   
>   	/* Mark all executing requests as skipped. */
>   	list_for_each_entry(rq, &engine->timeline.requests, link) {
> -		GEM_BUG_ON(!rq->global_seqno);
> -
>   		if (!i915_request_signaled(rq))
>   			dma_fence_set_error(&rq->fence, -EIO);
>   
> @@ -1099,10 +1089,9 @@ static void process_csb(struct intel_engine_cs *engine)
>   						EXECLISTS_ACTIVE_USER));
>   
>   		rq = port_unpack(port, &count);
> -		GEM_TRACE("%s out[0]: ctx=%d.%d, global=%d (fence %llx:%lld) (current %d), prio=%d\n",
> +		GEM_TRACE("%s out[0]: ctx=%d.%d, fence %llx:%lld (current %d), prio=%d\n",
>   			  engine->name,
>   			  port->context_id, count,
> -			  rq ? rq->global_seqno : 0,
>   			  rq ? rq->fence.context : 0,
>   			  rq ? rq->fence.seqno : 0,
>   			  rq ? hwsp_seqno(rq) : 0,
> @@ -1967,10 +1956,7 @@ static void execlists_reset(struct intel_engine_cs *engine, bool stalled)
>   	/* Following the reset, we need to reload the CSB read/write pointers */
>   	reset_csb_pointers(&engine->execlists);
>   
> -	GEM_TRACE("%s seqno=%d, stalled? %s\n",
> -		  engine->name,
> -		  rq ? rq->global_seqno : 0,
> -		  yesno(stalled));
> +	GEM_TRACE("%s stalled? %s\n", engine->name, yesno(stalled));
>   	if (!rq)
>   		goto out_unlock;
>   
> @@ -2225,9 +2211,6 @@ static u32 *gen8_emit_wa_tail(struct i915_request *request, u32 *cs)
>   
>   static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
>   {
> -	/* w/a: bit 5 needs to be zero for MI_FLUSH_DW address. */
> -	BUILD_BUG_ON(I915_GEM_HWS_INDEX_ADDR & (1 << 5));
> -
>   	cs = gen8_emit_ggtt_write(cs,
>   				  request->fence.seqno,
>   				  request->timeline->hwsp_offset);
> @@ -2236,10 +2219,6 @@ static u32 *gen8_emit_fini_breadcrumb(struct i915_request *request, u32 *cs)
>   				  intel_engine_next_hangcheck_seqno(request->engine),
>   				  intel_hws_hangcheck_address(request->engine));
>   
> -	cs = gen8_emit_ggtt_write(cs,
> -				  request->global_seqno,
> -				  intel_hws_seqno_address(request->engine));
> -
>   	*cs++ = MI_USER_INTERRUPT;
>   	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   
> @@ -2265,11 +2244,6 @@ static u32 *gen8_emit_fini_breadcrumb_rcs(struct i915_request *request, u32 *cs)
>   				      intel_hws_hangcheck_address(request->engine),
>   				      PIPE_CONTROL_CS_STALL);
>   
> -	cs = gen8_emit_ggtt_write_rcs(cs,
> -				      request->global_seqno,
> -				      intel_hws_seqno_address(request->engine),
> -				      PIPE_CONTROL_CS_STALL);
> -
>   	*cs++ = MI_USER_INTERRUPT;
>   	*cs++ = MI_ARB_ON_OFF | MI_ARB_ENABLE;
>   
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 2d59e2990448..1b96b0960adc 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -49,12 +49,6 @@ static inline u32 hws_hangcheck_address(struct intel_engine_cs *engine)
>   		I915_GEM_HWS_HANGCHECK_ADDR);
>   }
>   
> -static inline u32 intel_hws_seqno_address(struct intel_engine_cs *engine)
> -{
> -	return (i915_ggtt_offset(engine->status_page.vma) +
> -		I915_GEM_HWS_INDEX_ADDR);
> -}
> -
>   unsigned int intel_ring_update_space(struct intel_ring *ring)
>   {
>   	unsigned int space;
> @@ -327,11 +321,6 @@ static u32 *gen6_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = hws_hangcheck_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
>   	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
>   
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = PIPE_CONTROL_QW_WRITE | PIPE_CONTROL_CS_STALL;
> -	*cs++ = intel_hws_seqno_address(rq->engine) | PIPE_CONTROL_GLOBAL_GTT;
> -	*cs++ = rq->global_seqno;
> -
>   	*cs++ = MI_USER_INTERRUPT;
>   	*cs++ = MI_NOOP;
>   
> @@ -438,13 +427,6 @@ static u32 *gen7_rcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = hws_hangcheck_address(rq->engine);
>   	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
>   
> -	*cs++ = GFX_OP_PIPE_CONTROL(4);
> -	*cs++ = (PIPE_CONTROL_QW_WRITE |
> -		 PIPE_CONTROL_GLOBAL_GTT_IVB |
> -		 PIPE_CONTROL_CS_STALL);
> -	*cs++ = intel_hws_seqno_address(rq->engine);
> -	*cs++ = rq->global_seqno;
> -
>   	*cs++ = MI_USER_INTERRUPT;
>   	*cs++ = MI_NOOP;
>   
> @@ -467,11 +449,8 @@ static u32 *gen6_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
>   
> -	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> -	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
> -	*cs++ = rq->global_seqno;
> -
>   	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> @@ -495,10 +474,6 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR | MI_FLUSH_DW_USE_GTT;
>   	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
>   
> -	*cs++ = MI_FLUSH_DW | MI_FLUSH_DW_OP_STOREDW | MI_FLUSH_DW_STORE_INDEX;
> -	*cs++ = I915_GEM_HWS_INDEX_ADDR | MI_FLUSH_DW_USE_GTT;
> -	*cs++ = rq->global_seqno;
> -
>   	for (i = 0; i < GEN7_XCS_WA; i++) {
>   		*cs++ = MI_STORE_DWORD_INDEX;
>   		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> @@ -510,7 +485,6 @@ static u32 *gen7_xcs_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = 0;
>   
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> @@ -782,10 +756,8 @@ static void reset_ring(struct intel_engine_cs *engine, bool stalled)
>   		}
>   	}
>   
> -	GEM_TRACE("%s seqno=%d, stalled? %s\n",
> -		  engine->name,
> -		  rq ? rq->global_seqno : 0,
> -		  yesno(stalled));
> +	GEM_TRACE("%s stalled? %s\n", engine->name, yesno(stalled));
> +
>   	/*
>   	 * The guilty request will get skipped on a hung engine.
>   	 *
> @@ -915,8 +887,6 @@ static void cancel_requests(struct intel_engine_cs *engine)
>   
>   	/* Mark all submitted requests as skipped. */
>   	list_for_each_entry(request, &engine->timeline.requests, link) {
> -		GEM_BUG_ON(!request->global_seqno);
> -
>   		if (!i915_request_signaled(request))
>   			dma_fence_set_error(&request->fence, -EIO);
>   
> @@ -953,12 +923,7 @@ static u32 *i9xx_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
>   	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
>   
> -	*cs++ = MI_STORE_DWORD_INDEX;
> -	*cs++ = I915_GEM_HWS_INDEX_ADDR;
> -	*cs++ = rq->global_seqno;
> -
>   	*cs++ = MI_USER_INTERRUPT;
> -	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> @@ -976,10 +941,6 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   
>   	*cs++ = MI_FLUSH;
>   
> -	*cs++ = MI_STORE_DWORD_INDEX;
> -	*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> -	*cs++ = rq->fence.seqno;
> -
>   	*cs++ = MI_STORE_DWORD_INDEX;
>   	*cs++ = I915_GEM_HWS_HANGCHECK_ADDR;
>   	*cs++ = intel_engine_next_hangcheck_seqno(rq->engine);
> @@ -987,11 +948,12 @@ static u32 *gen5_emit_breadcrumb(struct i915_request *rq, u32 *cs)
>   	BUILD_BUG_ON(GEN5_WA_STORES < 1);
>   	for (i = 0; i < GEN5_WA_STORES; i++) {
>   		*cs++ = MI_STORE_DWORD_INDEX;
> -		*cs++ = I915_GEM_HWS_INDEX_ADDR;
> -		*cs++ = rq->global_seqno;
> +		*cs++ = I915_GEM_HWS_SEQNO_ADDR;
> +		*cs++ = rq->fence.seqno;
>   	}
>   
>   	*cs++ = MI_USER_INTERRUPT;
> +	*cs++ = MI_NOOP;
>   
>   	rq->tail = intel_ring_offset(rq, cs);
>   	assert_ring_tail_valid(rq->ring, rq->tail);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 26bae7772208..39a9ee7b61e2 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -716,8 +716,6 @@ intel_write_status_page(struct intel_engine_cs *engine, int reg, u32 value)
>    *
>    * The area from dword 0x30 to 0x3ff is available for driver usage.
>    */
> -#define I915_GEM_HWS_INDEX		0x30
> -#define I915_GEM_HWS_INDEX_ADDR		(I915_GEM_HWS_INDEX * sizeof(u32))
>   #define I915_GEM_HWS_PREEMPT		0x32
>   #define I915_GEM_HWS_PREEMPT_ADDR	(I915_GEM_HWS_PREEMPT * sizeof(u32))
>   #define I915_GEM_HWS_HANGCHECK		0x34
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 36c17bfe05a7..4aa57d0d1b92 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -571,11 +571,10 @@ static int active_request_put(struct i915_request *rq)
>   		return 0;
>   
>   	if (i915_request_wait(rq, 0, 5 * HZ) < 0) {
> -		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld, seqno %d.\n",
> +		GEM_TRACE("%s timed out waiting for completion of fence %llx:%lld\n",
>   			  rq->engine->name,
>   			  rq->fence.context,
> -			  rq->fence.seqno,
> -			  i915_request_global_seqno(rq));
> +			  rq->fence.seqno);
>   		GEM_TRACE_DUMP();
>   
>   		i915_gem_set_wedged(rq->i915);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index f055da01ced9..ec1ae948954c 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -194,7 +194,6 @@ static void mock_submit_request(struct i915_request *request)
>   	unsigned long flags;
>   
>   	i915_request_submit(request);
> -	GEM_BUG_ON(!request->global_seqno);
>   
>   	spin_lock_irqsave(&engine->hw_lock, flags);
>   	list_add_tail(&request->mock.link, &engine->hw_queue);
> 

Rest looks fine.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask
  2019-02-06 13:03 ` [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
@ 2019-02-11 18:51   ` Tvrtko Ursulin
  2019-02-12 13:51     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 18:51 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> In the next patch, we are introducing a broad virtual engine to encompass
> multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
> reflect the broader set of engines implied by the virtual instance, lets
> store the full bitmask.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_reset.c                | 4 ++--
>   drivers/gpu/drm/i915/intel_engine_cs.c           | 3 +++
>   drivers/gpu/drm/i915/intel_hangcheck.c           | 8 ++++----
>   drivers/gpu/drm/i915/intel_ringbuffer.c          | 4 ++--
>   drivers/gpu/drm/i915/intel_ringbuffer.h          | 7 +------
>   drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
>   drivers/gpu/drm/i915/selftests/mock_engine.c     | 1 +
>   7 files changed, 14 insertions(+), 15 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 7051c0a43941..78c9689629a0 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -1053,7 +1053,7 @@ void i915_reset(struct drm_i915_private *i915,
>   static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
>   					struct intel_engine_cs *engine)
>   {
> -	return intel_gpu_reset(i915, intel_engine_flag(engine));
> +	return intel_gpu_reset(i915, engine->mask);
>   }
>   
>   /**
> @@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
>   				continue;
>   
>   			if (i915_reset_engine(engine, msg) == 0)
> -				engine_mask &= ~intel_engine_flag(engine);
> +				engine_mask &= ~engine->mask;
>   
>   			clear_bit(I915_RESET_ENGINE + engine->id,
>   				  &error->flags);
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index ce7c19f2ae49..45e38877ab17 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -313,7 +313,10 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
>   	if (!engine)
>   		return -ENOMEM;
>   
> +	BUILD_BUG_ON(BITS_PER_TYPE(engine->mask) < I915_NUM_ENGINES);
> +
>   	engine->id = id;
> +	engine->mask = BIT(id);
>   	engine->i915 = dev_priv;
>   	__sprint_engine_name(engine->name, info);
>   	engine->hw_id = engine->guc_id = info->hw_id;
> diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> index e04b2560369e..58b6ff8453dc 100644
> --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> @@ -120,7 +120,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
>   	 */
>   	tmp = I915_READ_CTL(engine);
>   	if (tmp & RING_WAIT) {
> -		i915_handle_error(dev_priv, BIT(engine->id), 0,
> +		i915_handle_error(dev_priv, engine->mask, 0,
>   				  "stuck wait on %s", engine->name);
>   		I915_WRITE_CTL(engine, tmp);
>   		return ENGINE_WAIT_KICK;
> @@ -282,13 +282,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
>   		hangcheck_store_sample(engine, &hc);
>   
>   		if (hc.stalled) {
> -			hung |= intel_engine_flag(engine);
> +			hung |= engine->mask;
>   			if (hc.action != ENGINE_DEAD)
> -				stuck |= intel_engine_flag(engine);
> +				stuck |= engine->mask;
>   		}
>   
>   		if (hc.wedged)
> -			wedged |= intel_engine_flag(engine);
> +			wedged |= engine->mask;
>   	}
>   
>   	if (GEM_SHOW_DEBUG() && (hung | stuck)) {
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 1b96b0960adc..91c49f644898 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1859,8 +1859,8 @@ static int switch_context(struct i915_request *rq)
>   				goto err;
>   		} while (--loops);
>   
> -		if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
> -			unwind_mm = intel_engine_flag(engine);
> +		if (ppgtt->pd_dirty_rings & engine->mask) {
> +			unwind_mm = engine->mask;
>   			ppgtt->pd_dirty_rings &= ~unwind_mm;
>   			hw_flags = MI_FORCE_RESTORE;
>   		}
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 39a9ee7b61e2..7777d46784f9 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -334,6 +334,7 @@ struct intel_engine_cs {
>   	enum intel_engine_id id;
>   	unsigned int hw_id;
>   	unsigned int guc_id;
> +	unsigned long mask;

Could use intel_ring_mask_t - if we renamed it to intel_engine_mask_t - 
which is already checked with a BUILD_BUG_ON.

>   
>   	u8 uabi_id;
>   	u8 uabi_class;
> @@ -668,12 +669,6 @@ execlists_port_complete(struct intel_engine_execlists * const execlists,
>   	return port;
>   }
>   
> -static inline unsigned int
> -intel_engine_flag(const struct intel_engine_cs *engine)
> -{
> -	return BIT(engine->id);
> -}
> -
>   static inline u32
>   intel_read_status_page(const struct intel_engine_cs *engine, int reg)
>   {
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 4aa57d0d1b92..50a7f57a00a4 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1142,7 +1142,7 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>   
>   out_reset:
>   	igt_global_reset_lock(i915);
> -	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
> +	fake_hangcheck(rq->i915, rq->engine->mask);
>   	igt_global_reset_unlock(i915);
>   
>   	if (tsk) {
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index ec1ae948954c..c2c954f64226 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -223,6 +223,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.i915 = i915;
>   	snprintf(engine->base.name, sizeof(engine->base.name), "%s", name);
>   	engine->base.id = id;
> +	engine->base.mask = BIT(id);
>   	engine->base.status_page.addr = (void *)(engine + 1);
>   
>   	engine->base.context_pin = mock_context_pin;
> 

No other suggestions.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method
  2019-02-06 13:03 ` [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-02-11 19:00   ` Tvrtko Ursulin
  2019-02-12 13:56     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 19:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> An idea for extending uABI inspired by Vulkan's extension chains.
> Instead of expanding the data struct for each ioctl every time we need
> to add a new feature, define an extension chain instead. As we add
> optional interfaces to control the ioctl, we define a new extension
> struct that can be linked into the ioctl data only when required by the
> user. The key advantage being able to ignore large control structs for
> optional interfaces/extensions, while being able to process them in a
> consistent manner.
> 
> In comparison to other extensible ioctls, the key difference is the
> use of a linked chain of extension structs vs an array of tagged
> pointers. For example,
> 
> struct drm_amdgpu_cs_chunk {
>          __u32           chunk_id;
>          __u32           length_dw;
>          __u64           chunk_data;
> };
> 
> struct drm_amdgpu_cs_in {
>          __u32           ctx_id;
>          __u32           bo_list_handle;
>          __u32           num_chunks;
>          __u32           _pad;
>          __u64           chunks;
> };
> 
> allows userspace to pass in array of pointers to extension structs, but
> must therefore keep constructing that array along side the command stream.
> In dynamic situations like that, a linked list is preferred and does not
> similar from extra cache line misses as the extension structs themselves
> must still be loaded separate to the chunks array.
> 
> v2: Apply the tail call optimisation directly to nip the worry of stack
> overflow in the bud.
> v3: Defend against recursion.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile               |  1 +
>   drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
>   drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
>   include/uapi/drm/i915_drm.h                 | 20 ++++++++++
>   5 files changed, 91 insertions(+)
>   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
>   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index a1d834068765..89105b1aaf12 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -46,6 +46,7 @@ i915-y := i915_drv.o \
>   	  i915_sw_fence.o \
>   	  i915_syncmap.o \
>   	  i915_sysfs.o \
> +	  i915_user_extensions.o \
>   	  intel_csr.o \
>   	  intel_device_info.o \
>   	  intel_pm.o \
> diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
> new file mode 100644
> index 000000000000..879b4094b2d7
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_user_extensions.c
> @@ -0,0 +1,43 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#include <linux/sched/signal.h>
> +#include <linux/uaccess.h>
> +#include <uapi/drm/i915_drm.h>
> +
> +#include "i915_user_extensions.h"
> +
> +int i915_user_extensions(struct i915_user_extension __user *ext,
> +			 const i915_user_extension_fn *tbl,
> +			 unsigned long count,
> +			 void *data)
> +{
> +	unsigned int stackdepth = 512;
> +
> +	while (ext) {
> +		int err;
> +		u64 x;
> +
> +		if (!stackdepth--) /* recursion vs useful flexibility */
> +			return -EINVAL;

I don't get this - which recursion? Variable is also a local automatic.

> +
> +		if (get_user(x, &ext->name))
> +			return -EFAULT;
> +
> +		err = -EINVAL;
> +		if (x < count && tbl[x])
> +			err = tbl[x](ext, data);
> +		if (err)
> +			return err;

Do you plan to add the unwind as previously discussed? Or we define the 
interface as having undefined state if one extension failed? I would be 
a bit suboptimal for userspace since it would mean having to throw away 
and recreate the object in use cases for user extensions capable ioctl 
is executed post creation of some object.

Regards,

Tvrtko

> +
> +		if (get_user(x, &ext->next_extension))
> +			return -EFAULT;
> +
> +		ext = u64_to_user_ptr(x);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
> new file mode 100644
> index 000000000000..313a510b068a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_user_extensions.h
> @@ -0,0 +1,20 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#ifndef I915_USER_EXTENSIONS_H
> +#define I915_USER_EXTENSIONS_H
> +
> +struct i915_user_extension;
> +
> +typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
> +				      void *data);
> +
> +int i915_user_extensions(struct i915_user_extension __user *ext,
> +			 const i915_user_extension_fn *tbl,
> +			 unsigned long count,
> +			 void *data);
> +
> +#endif /* I915_USER_EXTENSIONS_H */
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 9726df37c4c4..fcc751aa1ea8 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -105,6 +105,13 @@
>   	__T;								\
>   })
>   
> +#define container_of_user(ptr, type, member) ({				\
> +	void __user *__mptr = (void __user *)(ptr);			\
> +	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> +			 !__same_type(*(ptr), void),			\
> +			 "pointer type mismatch in container_of()");	\
> +	((type __user *)(__mptr - offsetof(type, member))); })
> +
>   static inline u64 ptr_to_u64(const void *ptr)
>   {
>   	return (uintptr_t)ptr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index b641c55420b6..be2fcdf3ba90 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -62,6 +62,26 @@ extern "C" {
>   #define I915_ERROR_UEVENT		"ERROR"
>   #define I915_RESET_UEVENT		"RESET"
>   
> +/*
> + * i915_user_extension: Base class for defining a chain of extensions
> + *
> + * Many interfaces need to grow over time. In most cases we can simply
> + * extend the struct and have userspace pass in more data. Another option,
> + * as demonstrated by Vulkan's approach to providing extensions for forward
> + * and backward compatibility, is to use a list of optional structs to
> + * provide those extra details.
> + *
> + * The key advantage to using an extension chain is that it allows us to
> + * redefine the interface more easily than an ever growing struct of
> + * increasing complexity, and for large parts of that interface to be
> + * entirely optional. The downside is more pointer chasing; chasing across
> + * the __user boundary with pointers encapsulated inside u64.
> + */
> +struct i915_user_extension {
> +	__u64 next_extension;
> +	__u64 name;
> +};
> +
>   /*
>    * MOCS indexes used for GPU surfaces, defining the cacheability of the
>    * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 30/46] drm/i915: Track active engines within a context
  2019-02-06 13:03 ` [PATCH 30/46] drm/i915: Track active engines within a context Chris Wilson
@ 2019-02-11 19:11   ` Tvrtko Ursulin
  2019-02-12 13:59     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-11 19:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> For use in the next patch, if we track which engines have been used by
> the HW, we can reduce the work required to flush our state off the HW to
> those engines.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       | 3 +++
>   drivers/gpu/drm/i915/i915_gem_context.h       | 4 ++++
>   drivers/gpu/drm/i915/intel_lrc.c              | 2 ++
>   drivers/gpu/drm/i915/intel_ringbuffer.c       | 2 ++
>   drivers/gpu/drm/i915/selftests/mock_context.c | 1 +
>   drivers/gpu/drm/i915/selftests/mock_engine.c  | 2 ++
>   6 files changed, 14 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 582a7015e6a4..91037ca96be1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -210,6 +210,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   
>   	lockdep_assert_held(&ctx->i915->drm.struct_mutex);
>   	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
> +	GEM_BUG_ON(!list_empty(&ctx->active_engines));
>   
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
> @@ -337,6 +338,7 @@ intel_context_init(struct intel_context *ce,
>   		   struct intel_engine_cs *engine)
>   {
>   	ce->gem_context = ctx;
> +	ce->engine = engine;
>   
>   	INIT_LIST_HEAD(&ce->signal_link);
>   	INIT_LIST_HEAD(&ce->signals);
> @@ -364,6 +366,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   	list_add_tail(&ctx->link, &dev_priv->contexts.list);
>   	ctx->i915 = dev_priv;
>   	ctx->sched.priority = I915_USER_PRIORITY(I915_PRIORITY_NORMAL);
> +	INIT_LIST_HEAD(&ctx->active_engines);
>   
>   	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
>   		intel_context_init(&ctx->__engine[n], ctx, dev_priv->engine[n]);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 651f2e4badb6..ab89c7501408 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -161,6 +161,8 @@ struct i915_gem_context {
>   	atomic_t hw_id_pin_count;
>   	struct list_head hw_id_link;
>   
> +	struct list_head active_engines;
> +
>   	/**
>   	 * @user_handle: userspace identifier
>   	 *
> @@ -174,7 +176,9 @@ struct i915_gem_context {
>   	/** engine: per-engine logical HW state */
>   	struct intel_context {
>   		struct i915_gem_context *gem_context;
> +		struct intel_engine_cs *engine;
>   		struct intel_engine_cs *active;
> +		struct list_head active_link;
>   		struct list_head signal_link;
>   		struct list_head signals;
>   		struct i915_vma *state;
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 2424eb2b1fc6..b3555b1b0e07 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1276,6 +1276,7 @@ static void execlists_context_unpin(struct intel_context *ce)
>   	i915_gem_object_unpin_map(ce->state->obj);
>   	i915_vma_unpin(ce->state);
>   
> +	list_del(&ce->active_link);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -1361,6 +1362,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
>   
>   	ce->state->obj->pin_global++;
>   	i915_gem_context_get(ctx);
> +	list_add(&ce->active_link, &ctx->active_engines);

Why is it called active_engines if it lists active_contexts? :)

>   	return ce;
>   
>   unpin_ring:
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 91c49f644898..4557f715663d 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1430,6 +1430,7 @@ static void intel_ring_context_unpin(struct intel_context *ce)
>   	__context_unpin_ppgtt(ce->gem_context);
>   	__context_unpin(ce);
>   
> +	list_del(&ce->active_link);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -1530,6 +1531,7 @@ __ring_context_pin(struct intel_engine_cs *engine,
>   		goto err_unpin;
>   
>   	i915_gem_context_get(ctx);
> +	list_add(&ce->active_link, &ctx->active_engines);
>   
>   	/* One ringbuffer to rule them all */
>   	GEM_BUG_ON(!engine->buffer);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index b646cdcdd602..1b5073a362eb 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -44,6 +44,7 @@ mock_context(struct drm_i915_private *i915,
>   	INIT_RADIX_TREE(&ctx->handles_vma, GFP_KERNEL);
>   	INIT_LIST_HEAD(&ctx->handles_list);
>   	INIT_LIST_HEAD(&ctx->hw_id_link);
> +	INIT_LIST_HEAD(&ctx->active_engines);
>   
>   	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++)
>   		intel_context_init(&ctx->__engine[n], ctx, i915->engine[n]);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index c2c954f64226..b8c6769571c4 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -125,6 +125,7 @@ static void hw_delay_complete(struct timer_list *t)
>   static void mock_context_unpin(struct intel_context *ce)
>   {
>   	mock_timeline_unpin(ce->ring->timeline);
> +	list_del(&ce->active_link);
>   	i915_gem_context_put(ce->gem_context);
>   }
>   
> @@ -161,6 +162,7 @@ mock_context_pin(struct intel_engine_cs *engine,
>   
>   	ce->ops = &mock_context_ops;
>   	i915_gem_context_get(ctx);
> +	list_add(&ce->active_link, &ctx->active_engines);
>   	return ce;
>   
>   err:
> 

No other complaints! :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-02-06 13:03 ` [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-02-12 11:18   ` Tvrtko Ursulin
  2019-02-12 14:11     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-12 11:18 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> In preparation to making the ppGTT binding for a context explicit (to
> facilitate reusing the same ppGTT between different contexts), allow the
> user to create and destroy named ppGTT.
> 
> v2: Replace global barrier for swapping over the ppgtt and tlbs with a
> local context barrier (Tvrtko)
> v3: serialise with struct_mutex; it's lazy but required dammit
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c               |   2 +
>   drivers/gpu/drm/i915/i915_drv.h               |   3 +
>   drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
>   drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 239 ++++++++++++----
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
>   drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
>   include/uapi/drm/i915_drm.h                   |  35 +++
>   11 files changed, 510 insertions(+), 71 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 36da8ab1e7ce..487e78094e93 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -3005,6 +3005,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
>   };
>   
>   static struct drm_driver driver = {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index e554691304dc..523de3644570 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -217,6 +217,9 @@ struct drm_i915_file_private {
>   	} mm;
>   	struct idr context_idr;
>   
> +	struct mutex vm_lock;
> +	struct idr vm_idr;
> +
>   	unsigned int bsd_engine;
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index c3f41f501276..dd49b1ef3ff2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -110,6 +110,8 @@ static void lut_close(struct i915_gem_context *ctx)
>   		struct i915_vma *vma = rcu_dereference_raw(*slot);
>   
>   		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
> +
> +		vma->open_count--;

I am finding vma->open_count handling seriously confusing. So maybe that 
means there should be a comment here as minimum.

What is the path in the current code which brings vma->open_count back 
to zero? If it is not done from lut_close, but object is removed from 
the lut_list, then the only current decrement in i915_gem_close_object 
won't run. Surely I am missing something..

>   		__i915_gem_object_release_unless_active(vma->obj);
>   	}
>   	rcu_read_unlock();
> @@ -293,7 +295,7 @@ static void context_close(struct i915_gem_context *ctx)
>   	 */
>   	lut_close(ctx);
>   	if (ctx->ppgtt)
> -		i915_ppgtt_close(&ctx->ppgtt->vm);
> +		i915_ppgtt_close(ctx->ppgtt);
>   
>   	ctx->file_priv = ERR_PTR(-EBADF);
>   	i915_gem_context_put(ctx);
> @@ -425,6 +427,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
>   	context_close(ctx);
>   }
>   
> +static struct i915_hw_ppgtt *
> +__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct i915_hw_ppgtt *old = ctx->ppgtt;
> +
> +	i915_ppgtt_open(ppgtt);
> +	ctx->ppgtt = i915_ppgtt_get(ppgtt);
> +
> +	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
> +
> +	return old;
> +}
> +
> +static void __assign_ppgtt(struct i915_gem_context *ctx,
> +			   struct i915_hw_ppgtt *ppgtt)
> +{
> +	if (ppgtt == ctx->ppgtt)
> +		return;
> +
> +	ppgtt = __set_ppgtt(ctx, ppgtt);
> +	if (ppgtt) {
> +		i915_ppgtt_close(ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +}
> +
>   static struct i915_gem_context *
>   i915_gem_create_context(struct drm_i915_private *dev_priv,
>   			struct drm_i915_file_private *file_priv)
> @@ -451,8 +479,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   			return ERR_CAST(ppgtt);
>   		}
>   
> -		ctx->ppgtt = ppgtt;
> -		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
> +		__assign_ppgtt(ctx, ppgtt);
> +		i915_ppgtt_put(ppgtt);

Looks strange and one realizes it is dropping the ref __assign_ppgtt 
takes. Not sure if it wouldn't be better to just open code what this 
site needs.

>   	}
>   
>   	trace_i915_context_create(ctx);
> @@ -633,19 +661,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
>   	return 0;
>   }
>   
> +static int vm_idr_cleanup(int id, void *p, void *data)
> +{
> +	i915_ppgtt_put(p);
> +	return 0;
> +}
> +
>   int i915_gem_context_open(struct drm_i915_private *i915,
>   			  struct drm_file *file)
>   {
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct i915_gem_context *ctx;
>   
> +	mutex_init(&file_priv->vm_lock);
> +
>   	idr_init(&file_priv->context_idr);
> +	idr_init_base(&file_priv->vm_idr, 1);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   	ctx = i915_gem_create_context(i915, file_priv);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	if (IS_ERR(ctx)) {
>   		idr_destroy(&file_priv->context_idr);
> +		idr_destroy(&file_priv->vm_idr);
>   		return PTR_ERR(ctx);
>   	}
>   
> @@ -662,6 +700,89 @@ void i915_gem_context_close(struct drm_file *file)
>   
>   	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
>   	idr_destroy(&file_priv->context_idr);
> +
> +	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
> +	idr_destroy(&file_priv->vm_idr);

The name of this function always confuses me. Should we rename it to 
i915_gem_close_contexts or something?

> +
> +	mutex_destroy(&file_priv->vm_lock);
> +}
> +
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +
> +	if (!HAS_FULL_PPGTT(i915))
> +		return -ENODEV;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	ppgtt = i915_ppgtt_create(i915, file_priv);
> +	if (IS_ERR(ppgtt))
> +		return PTR_ERR(ppgtt);
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		goto err_put;
> +
> +	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (err < 0)
> +		goto err_put;
> +
> +	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
> +	ppgtt->user_handle = err;
> +	args->id = err;
> +	return 0;
> +
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +	u32 id;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	id = args->id;
> +	if (!id)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_remove(&file_priv->vm_idr, id);
> +	if (ppgtt) {
> +		GEM_BUG_ON(!ppgtt->user_handle);
> +		ppgtt->user_handle = 0;
> +	}
> +
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	i915_ppgtt_put(ppgtt);
> +	return 0;
>   }
>   
>   static struct i915_request *
> @@ -809,6 +930,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
>   	return 0;
>   }
>   
> +static int get_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int ret;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	/* XXX rcu acquire? */
> +	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);

Only to serialize threads working on the same ctx? Why not do that under 
the new vm->lock instead?

> +	if (ret)
> +		return ret;
> +
> +	ppgtt = i915_ppgtt_get(ctx->ppgtt);
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	ret = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (ret)
> +		goto err_put;
> +
> +	if (!ppgtt->user_handle) {
> +		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +		GEM_BUG_ON(!ret);
> +		if (ret < 0)
> +			goto err_unlock;
> +
> +		ppgtt->user_handle = ret;
> +		i915_ppgtt_get(ppgtt);
> +	}
> +
> +	args->size = 0;
> +	args->value = ppgtt->user_handle;
> +
> +	ret = 0;
> +err_unlock:
> +	mutex_unlock(&file_priv->vm_lock);
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return ret;
> +}
> +
> +static void set_ppgtt_barrier(void *data)
> +{
> +	struct i915_hw_ppgtt *old = data;
> +
> +	i915_ppgtt_close(old);
> +	i915_ppgtt_put(old);
> +}
> +
> +static int set_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt, *old;
> +	int err;
> +
> +	if (args->size)
> +		return -EINVAL;
> +
> +	if (upper_32_bits(args->value))
> +		return -EINVAL;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_find(&file_priv->vm_idr, args->value);
> +	if (ppgtt) {
> +		GEM_BUG_ON(ppgtt->user_handle != args->value);
> +		i915_ppgtt_get(ppgtt);
> +	}
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (err)
> +		goto out;
> +
> +	if (ppgtt == ctx->ppgtt)
> +		goto unlock;
> +
> +	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
> +	lut_close(ctx);

Nesting issues I guess, the answer to my previous question.

> +
> +	old = __set_ppgtt(ctx, ppgtt);
> +
> +	/*
> +	 * We need to flush any requests using the current ppgtt before
> +	 * we release it as the requests do not hold a reference themselves,
> +	 * only indirectly through the context.
> +	 */
> +	err = context_barrier_task(ctx, -1, set_ppgtt_barrier, old);

But barrier can be retired on user interrupt with context save still 
running, no?

Regards,

Tvrtko

> +	if (err) {
> +		ctx->ppgtt = old;
> +		ctx->desc_template = default_desc_template(ctx->i915, old);
> +
> +		i915_ppgtt_close(ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +
> +unlock:
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +out:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
>   static bool client_is_banned(struct drm_i915_file_private *file_priv)
>   {
>   	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> @@ -979,6 +1214,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = get_sseu(ctx, args);
>   		break;
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = get_ppgtt(ctx, args);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -1276,9 +1514,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		return -ENOENT;
>   
>   	switch (args->param) {
> -	case I915_CONTEXT_PARAM_BAN_PERIOD:
> -		ret = -EINVAL;
> -		break;
>   	case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1325,9 +1560,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   					I915_USER_PRIORITY(priority);
>   		}
>   		break;
> +
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = set_sseu(ctx, args);
>   		break;
> +
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = set_ppgtt(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index ab89c7501408..c5a6cb10dbda 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -365,6 +365,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
>   struct i915_gem_context *
>   i915_gem_context_create_gvt(struct drm_device *dev);
>   
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file);
> +
>   int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   				  struct drm_file *file);
>   int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index d646d37eec2f..ccf10306b1f5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2103,10 +2103,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
>   	return ppgtt;
>   }
>   
> -void i915_ppgtt_close(struct i915_address_space *vm)
> +void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
>   {
> -	GEM_BUG_ON(vm->closed);
> -	vm->closed = true;
> +	GEM_BUG_ON(ppgtt->vm.closed);
> +
> +	ppgtt->open_count++;
> +}
> +
> +void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
> +{
> +	GEM_BUG_ON(!ppgtt->open_count);
> +	if (--ppgtt->open_count)
> +		return;
> +
> +	GEM_BUG_ON(ppgtt->vm.closed);
> +	ppgtt->vm.closed = true;
>   }
>   
>   static void ppgtt_destroy_vma(struct i915_address_space *vm)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 03ade71b8d9a..bb750318f52a 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
>   	struct kref ref;
>   
>   	unsigned long pd_dirty_rings;
> +	unsigned int open_count;
> +
>   	union {
>   		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
>   		struct i915_page_directory_pointer pdp;	/* GEN8+ */
>   		struct i915_page_directory pd;		/* GEN6-7 */
>   	};
> +
> +	u32 user_handle;
>   };
>   
>   struct gen6_hw_ppgtt {
> @@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
>   void i915_ppgtt_release(struct kref *kref);
>   struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
>   					struct drm_i915_file_private *fpriv);
> -void i915_ppgtt_close(struct i915_address_space *vm);
> -static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
> +
> +void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
> +void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
> +
> +static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
>   {
> -	if (ppgtt)
> -		kref_get(&ppgtt->ref);
> +	kref_get(&ppgtt->ref);
> +	return ppgtt;
>   }
> +
>   static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
>   {
>   	if (ppgtt)
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index a9a2fa35876f..a7ee8e97bcee 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
>   	err = i915_subtests(tests, ppgtt);
>   
>   out_close:
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   
>   out_unlock:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 4b6df1c55345..a76a4f6f67e4 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
>   	return 0;
>   }
>   
> -static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
> +static noinline int cpu_check(struct drm_i915_gem_object *obj,
> +			      unsigned int idx, unsigned int max)
>   {
>   	unsigned int n, m, needs_flush;
>   	int err;
> @@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (m = 0; m < max; m++) {
>   			if (map[m] != m) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], m);
> +				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
> +				       __builtin_return_address(0), idx,
> +				       n, real_page_count(obj), m, max,
> +				       map[m], m);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (; m < DW_PER_PAGE; m++) {
>   			if (map[m] != STACK_MAGIC) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], STACK_MAGIC);
> +				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
> +				       __builtin_return_address(0), idx, n, m,
> +				       map[m], STACK_MAGIC);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
>   static int igt_ctx_exec(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> -	struct drm_i915_gem_object *obj = NULL;
> -	unsigned long ncontexts, ndwords, dw;
> -	struct igt_live_test t;
> -	struct drm_file *file;
> -	IGT_TIMEOUT(end_time);
> -	LIST_HEAD(objects);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
>   	int err = -ENODEV;
>   
>   	/*
> @@ -495,38 +495,42 @@ static int igt_ctx_exec(void *arg)
>   	if (!DRIVER_CAPS(i915)->has_logical_contexts)
>   		return 0;
>   
> -	file = mock_file(i915);
> -	if (IS_ERR(file))
> -		return PTR_ERR(file);
> +	for_each_engine(engine, i915, id) {
> +		struct drm_i915_gem_object *obj = NULL;
> +		unsigned long ncontexts, ndwords, dw;
> +		struct igt_live_test t;
> +		struct drm_file *file;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
>   
> -	mutex_lock(&i915->drm.struct_mutex);
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
>   
> -	err = igt_live_test_begin(&t, i915, __func__, "");
> -	if (err)
> -		goto out_unlock;
> +		if (!engine->context_size)
> +			continue; /* No logical context support in HW */
>   
> -	ncontexts = 0;
> -	ndwords = 0;
> -	dw = 0;
> -	while (!time_after(jiffies, end_time)) {
> -		struct intel_engine_cs *engine;
> -		struct i915_gem_context *ctx;
> -		unsigned int id;
> +		file = mock_file(i915);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
>   
> -		ctx = i915_gem_create_context(i915, file->driver_priv);
> -		if (IS_ERR(ctx)) {
> -			err = PTR_ERR(ctx);
> +		mutex_lock(&i915->drm.struct_mutex);
> +
> +		err = igt_live_test_begin(&t, i915, __func__, engine->name);
> +		if (err)
>   			goto out_unlock;
> -		}
>   
> -		for_each_engine(engine, i915, id) {
> +		ncontexts = 0;
> +		ndwords = 0;
> +		dw = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			struct i915_gem_context *ctx;
>   			intel_wakeref_t wakeref;
>   
> -			if (!engine->context_size)
> -				continue; /* No logical context support in HW */
> -
> -			if (!intel_engine_can_store_dword(engine))
> -				continue;
> +			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
>   
>   			if (!obj) {
>   				obj = create_test_object(ctx, file, &objects);
> @@ -536,7 +540,6 @@ static int igt_ctx_exec(void *arg)
>   				}
>   			}
>   
> -			err = 0;
>   			with_intel_runtime_pm(i915, wakeref)
>   				err = gpu_fill(obj, ctx, engine, dw);
>   			if (err) {
> @@ -551,32 +554,158 @@ static int igt_ctx_exec(void *arg)
>   				obj = NULL;
>   				dw = 0;
>   			}
> +
>   			ndwords++;
> +			ncontexts++;
>   		}
> -		ncontexts++;
> +
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
> +
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				break;
> +
> +			dw += rem;
> +		}
> +
> +out_unlock:
> +		if (igt_live_test_end(&t))
> +			err = -EIO;
> +		mutex_unlock(&i915->drm.struct_mutex);
> +
> +		mock_file_free(i915, file);
> +		if (err)
> +			return err;
>   	}
> -	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
> -		ncontexts, RUNTIME_INFO(i915)->num_rings, ndwords);
>   
> -	dw = 0;
> -	list_for_each_entry(obj, &objects, st_link) {
> -		unsigned int rem =
> -			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +	return 0;
> +}
> +
> +static int igt_shared_ctx_exec(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	int err = -ENODEV;
> +
> +	/*
> +	 * Create a few different contexts with the same mm and write
> +	 * through each ctx using the GPU making sure those writes end
> +	 * up in the expected pages of our obj.
> +	 */
> +
> +	for_each_engine(engine, i915, id) {
> +		unsigned long ncontexts, ndwords, dw;
> +		struct drm_i915_gem_object *obj = NULL;
> +		struct i915_gem_context *ctx = NULL;
> +		struct i915_gem_context *parent;
> +		struct igt_live_test t;
> +		struct drm_file *file;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
>   
> -		err = cpu_check(obj, rem);
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
> +
> +		file = mock_file(i915);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
> +
> +		mutex_lock(&i915->drm.struct_mutex);
> +
> +		err = igt_live_test_begin(&t, i915, __func__, engine->name);
>   		if (err)
> -			break;
> +			goto out_unlock;
>   
> -		dw += rem;
> -	}
> +		parent = i915_gem_create_context(i915, file->driver_priv);
> +		if (IS_ERR(parent)) {
> +			err = PTR_ERR(parent);
> +			if (err == -ENODEV) /* no logical ctx support */
> +				err = 0;
> +			goto out_unlock;
> +		}
> +
> +		if (!parent->ppgtt) {
> +			err = 0;
> +			goto out_unlock;
> +		}
> +
> +		ncontexts = 0;
> +		ndwords = 0;
> +		dw = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			intel_wakeref_t wakeref;
> +
> +			if (ctx)
> +				__destroy_hw_context(ctx, file->driver_priv);
> +
> +			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
> +
> +			__assign_ppgtt(ctx, parent->ppgtt);
> +
> +			if (!obj) {
> +				obj = create_test_object(parent, file, &objects);
> +				if (IS_ERR(obj)) {
> +					err = PTR_ERR(obj);
> +					goto out_unlock;
> +				}
> +			}
> +
> +			err = 0;
> +			with_intel_runtime_pm(i915, wakeref)
> +				err = gpu_fill(obj, ctx, engine, dw);
> +			if (err) {
> +				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
> +				       ndwords, dw, max_dwords(obj),
> +				       engine->name, ctx->hw_id,
> +				       yesno(!!ctx->ppgtt), err);
> +				goto out_unlock;
> +			}
> +
> +			if (++dw == max_dwords(obj)) {
> +				obj = NULL;
> +				dw = 0;
> +			}
> +
> +			ndwords++;
> +			ncontexts++;
> +		}
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
> +
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				break;
> +
> +			dw += rem;
> +		}
>   
>   out_unlock:
> -	if (igt_live_test_end(&t))
> -		err = -EIO;
> -	mutex_unlock(&i915->drm.struct_mutex);
> +		if (igt_live_test_end(&t))
> +			err = -EIO;
> +		mutex_unlock(&i915->drm.struct_mutex);
>   
> -	mock_file_free(i915, file);
> -	return err;
> +		mock_file_free(i915, file);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
>   }
>   
>   static struct i915_vma *rpcs_query_batch(struct i915_vma *vma)
> @@ -1048,7 +1177,7 @@ static int igt_ctx_readonly(void *arg)
>   	struct drm_i915_gem_object *obj = NULL;
>   	struct i915_gem_context *ctx;
>   	struct i915_hw_ppgtt *ppgtt;
> -	unsigned long ndwords, dw;
> +	unsigned long idx, ndwords, dw;
>   	struct igt_live_test t;
>   	struct drm_file *file;
>   	I915_RND_STATE(prng);
> @@ -1129,6 +1258,7 @@ static int igt_ctx_readonly(void *arg)
>   		ndwords, RUNTIME_INFO(i915)->num_rings);
>   
>   	dw = 0;
> +	idx = 0;
>   	list_for_each_entry(obj, &objects, st_link) {
>   		unsigned int rem =
>   			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> @@ -1138,7 +1268,7 @@ static int igt_ctx_readonly(void *arg)
>   		if (i915_gem_object_is_readonly(obj))
>   			num_writes = 0;
>   
> -		err = cpu_check(obj, num_writes);
> +		err = cpu_check(obj, idx++, num_writes);
>   		if (err)
>   			break;
>   
> @@ -1723,6 +1853,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
>   		SUBTEST(igt_ctx_exec),
>   		SUBTEST(igt_ctx_readonly),
>   		SUBTEST(igt_ctx_sseu),
> +		SUBTEST(igt_shared_ctx_exec),
>   		SUBTEST(igt_vm_isolation),
>   	};
>   
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 3850ef4a5ec8..08a8f3d20854 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>   
>   	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
>   
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   out_unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 1b5073a362eb..2189b606ca41 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
>   		goto err_handles;
>   
>   	if (name) {
> +		struct i915_hw_ppgtt *ppgtt;
> +
>   		ctx->name = kstrdup(name, GFP_KERNEL);
>   		if (!ctx->name)
>   			goto err_put;
>   
> -		ctx->ppgtt = mock_ppgtt(i915, name);
> -		if (!ctx->ppgtt)
> +		ppgtt = mock_ppgtt(i915, name);
> +		if (!ppgtt)
>   			goto err_put;
> +
> +		__set_ppgtt(ctx, ppgtt);
>   	}
>   
>   	return ctx;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index be2fcdf3ba90..2cd79639d6b5 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -339,6 +339,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_I915_PERF_ADD_CONFIG	0x37
>   #define DRM_I915_PERF_REMOVE_CONFIG	0x38
>   #define DRM_I915_QUERY			0x39
> +#define DRM_I915_GEM_VM_CREATE		0x3a
> +#define DRM_I915_GEM_VM_DESTROY		0x3b
>   
>   #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>   #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -397,6 +399,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
>   #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
>   #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
> +#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
>   
>   /* Allow drivers to submit batchbuffers directly to hardware, relying
>    * on the security mechanisms provided by hardware.
> @@ -1444,6 +1448,26 @@ struct drm_i915_gem_context_destroy {
>   	__u32 pad;
>   };
>   
> +/*
> + * DRM_I915_GEM_VM_CREATE -
> + *
> + * Create a new virtual memory address space (ppGTT) for use within a context
> + * on the same file. Extensions can be provided to configure exactly how the
> + * address space is setup upon creation.
> + *
> + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> + * returned.
> + *
> + * DRM_I915_GEM_VM_DESTROY -
> + *
> + * Destroys a previously created VM id.
> + */
> +struct drm_i915_gem_vm_control {
> +	__u64 extensions;
> +	__u32 flags;
> +	__u32 id;
> +};
> +
>   struct drm_i915_reg_read {
>   	/*
>   	 * Register offset.
> @@ -1513,6 +1537,17 @@ struct drm_i915_gem_context_param {
>   	 * drm_i915_gem_context_param_sseu.
>   	 */
>   #define I915_CONTEXT_PARAM_SSEU		0x7
> +
> +	/*
> +	 * The id of the associated virtual memory address space (ppGTT) of
> +	 * this context. Can be retrieved and passed to another context
> +	 * (on the same fd) for both to use the same ppGTT and so share
> +	 * address layouts, and avoid reloading the page tables on context
> +	 * switches between themselves.
> +	 *
> +	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
> +	 */
> +#define I915_CONTEXT_PARAM_VM		0x8
>   	__u64 value;
>   };
>   
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 10/46] drm/i915: Make request allocation caches global
  2019-02-11 17:02       ` Tvrtko Ursulin
@ 2019-02-12 11:51         ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 11:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 17:02:19)
> 
> On 11/02/2019 12:40, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-02-11 11:43:41)
> >>
> >> On 06/02/2019 13:03, Chris Wilson wrote:
> >>> As kmem_caches share the same properties (size, allocation/free behaviour)
> >>> for all potential devices, we can use global caches. While this
> >>> potential has worse fragmentation behaviour (one can argue that
> >>> different devices would have different activity lifetimes, but you can
> >>> also argue that activity is temporal across the system) it is the
> >>> default behaviour of the system at large to amalgamate matching caches.
> >>>
> >>> The benefit for us is much reduced pointer dancing along the frequent
> >>> allocation paths.
> >>>
> >>> v2: Defer shrinking until after a global grace period for futureproofing
> >>> multiple consumers of the slab caches, similar to the current strategy
> >>> for avoiding shrinking too early.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> ---
> >>>    drivers/gpu/drm/i915/Makefile                 |   1 +
> >>>    drivers/gpu/drm/i915/i915_active.c            |   7 +-
> >>>    drivers/gpu/drm/i915/i915_active.h            |   1 +
> >>>    drivers/gpu/drm/i915/i915_drv.h               |   3 -
> >>>    drivers/gpu/drm/i915/i915_gem.c               |  34 +-----
> >>>    drivers/gpu/drm/i915/i915_globals.c           | 105 ++++++++++++++++++
> >>>    drivers/gpu/drm/i915/i915_globals.h           |  15 +++
> >>>    drivers/gpu/drm/i915/i915_pci.c               |   8 +-
> >>>    drivers/gpu/drm/i915/i915_request.c           |  53 +++++++--
> >>>    drivers/gpu/drm/i915/i915_request.h           |  10 ++
> >>>    drivers/gpu/drm/i915/i915_scheduler.c         |  66 ++++++++---
> >>>    drivers/gpu/drm/i915/i915_scheduler.h         |  34 +++++-
> >>>    drivers/gpu/drm/i915/intel_guc_submission.c   |   3 +-
> >>>    drivers/gpu/drm/i915/intel_lrc.c              |   6 +-
> >>>    drivers/gpu/drm/i915/intel_ringbuffer.h       |  17 ---
> >>>    drivers/gpu/drm/i915/selftests/intel_lrc.c    |   2 +-
> >>>    drivers/gpu/drm/i915/selftests/mock_engine.c  |  48 ++++----
> >>>    .../gpu/drm/i915/selftests/mock_gem_device.c  |  26 -----
> >>>    drivers/gpu/drm/i915/selftests/mock_request.c |  12 +-
> >>>    drivers/gpu/drm/i915/selftests/mock_request.h |   7 --
> >>>    20 files changed, 306 insertions(+), 152 deletions(-)
> >>>    create mode 100644 drivers/gpu/drm/i915/i915_globals.c
> >>>    create mode 100644 drivers/gpu/drm/i915/i915_globals.h
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> >>> index 1787e1299b1b..a1d834068765 100644
> >>> --- a/drivers/gpu/drm/i915/Makefile
> >>> +++ b/drivers/gpu/drm/i915/Makefile
> >>> @@ -77,6 +77,7 @@ i915-y += \
> >>>          i915_gem_tiling.o \
> >>>          i915_gem_userptr.o \
> >>>          i915_gemfs.o \
> >>> +       i915_globals.o \
> >>>          i915_query.o \
> >>>          i915_request.o \
> >>>          i915_scheduler.o \
> >>> diff --git a/drivers/gpu/drm/i915/i915_active.c b/drivers/gpu/drm/i915/i915_active.c
> >>> index 215b6ff8aa73..9026787ebdf8 100644
> >>> --- a/drivers/gpu/drm/i915/i915_active.c
> >>> +++ b/drivers/gpu/drm/i915/i915_active.c
> >>> @@ -280,7 +280,12 @@ int __init i915_global_active_init(void)
> >>>        return 0;
> >>>    }
> >>>    
> >>> -void __exit i915_global_active_exit(void)
> >>> +void i915_global_active_shrink(void)
> >>> +{
> >>> +     kmem_cache_shrink(global.slab_cache);
> >>> +}
> >>> +
> >>> +void i915_global_active_exit(void)
> >>>    {
> >>>        kmem_cache_destroy(global.slab_cache);
> >>>    }
> >>> diff --git a/drivers/gpu/drm/i915/i915_active.h b/drivers/gpu/drm/i915/i915_active.h
> >>> index 12b5c1d287d1..5fbd9102384b 100644
> >>> --- a/drivers/gpu/drm/i915/i915_active.h
> >>> +++ b/drivers/gpu/drm/i915/i915_active.h
> >>> @@ -420,6 +420,7 @@ static inline void i915_active_fini(struct i915_active *ref) { }
> >>>    #endif
> >>>    
> >>>    int i915_global_active_init(void);
> >>> +void i915_global_active_shrink(void);
> >>>    void i915_global_active_exit(void);
> >>>    
> >>>    #endif /* _I915_ACTIVE_H_ */
> >>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>> index 37230ae7fbe6..a365b1a2ea9a 100644
> >>> --- a/drivers/gpu/drm/i915/i915_drv.h
> >>> +++ b/drivers/gpu/drm/i915/i915_drv.h
> >>> @@ -1459,9 +1459,6 @@ struct drm_i915_private {
> >>>        struct kmem_cache *objects;
> >>>        struct kmem_cache *vmas;
> >>>        struct kmem_cache *luts;
> >>> -     struct kmem_cache *requests;
> >>> -     struct kmem_cache *dependencies;
> >>> -     struct kmem_cache *priorities;
> >>>    
> >>>        const struct intel_device_info __info; /* Use INTEL_INFO() to access. */
> >>>        struct intel_runtime_info __runtime; /* Use RUNTIME_INFO() to access. */
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> >>> index 1eb3a5f8654c..d18c4ccff370 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem.c
> >>> @@ -42,6 +42,7 @@
> >>>    #include "i915_drv.h"
> >>>    #include "i915_gem_clflush.h"
> >>>    #include "i915_gemfs.h"
> >>> +#include "i915_globals.h"
> >>>    #include "i915_reset.h"
> >>>    #include "i915_trace.h"
> >>>    #include "i915_vgpu.h"
> >>> @@ -187,6 +188,8 @@ void i915_gem_unpark(struct drm_i915_private *i915)
> >>>        if (unlikely(++i915->gt.epoch == 0)) /* keep 0 as invalid */
> >>>                i915->gt.epoch = 1;
> >>>    
> >>> +     i915_globals_unpark();
> >>> +
> >>>        intel_enable_gt_powersave(i915);
> >>>        i915_update_gfx_val(i915);
> >>>        if (INTEL_GEN(i915) >= 6)
> >>> @@ -2916,12 +2919,11 @@ static void shrink_caches(struct drm_i915_private *i915)
> >>>         * filled slabs to prioritise allocating from the mostly full slabs,
> >>>         * with the aim of reducing fragmentation.
> >>>         */
> >>> -     kmem_cache_shrink(i915->priorities);
> >>> -     kmem_cache_shrink(i915->dependencies);
> >>> -     kmem_cache_shrink(i915->requests);
> >>>        kmem_cache_shrink(i915->luts);
> >>>        kmem_cache_shrink(i915->vmas);
> >>>        kmem_cache_shrink(i915->objects);
> >>> +
> >>> +     i915_globals_park();
> >>
> >> Slightly confusing that the shrink caches path calls globals_park - ie
> >> after the device has been parked. Would i915_globals_shrink and
> >> __i915_globals_shrink be clearer? Not sure.
> > 
> > Final destination is __i915_gem_park. I could stick it there now, but
> > felt it clearer to have it as a sideways move atm.
> > 
> > With the last 3 slab caches converted over to globals, they all sit
> > behind the same rcu_work and we can remove our open-coded variant
> > (rcu_work is a recent invention).
> 
> Is there some downside to calling i915_globals_park directly from the 
> idle work handler straight away? (I mean in this patch.)

I don't think so, I was just trying to do least change. Conversion to
using i915_globals_park directly is a separate patch just so we can
revert easily.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno
  2019-02-11 16:56       ` Tvrtko Ursulin
@ 2019-02-12 13:36         ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 13:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 16:56:03)
> 
> On 11/02/2019 12:44, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-02-11 12:40:07)
> >>
> >> On 06/02/2019 13:03, Chris Wilson wrote:
> >>> To determine whether an engine has 'stuck', we simply check whether or
> >>> not is still on the same seqno for several seconds. To keep this simple
> >>> mechanism intact over the loss of a global seqno, we can simply add a
> >>> new global heartbeat seqno instead. As we cannot know the sequence in
> >>> which requests will then be completed, we use a primitive random number
> >>> generator instead (with a cycle long enough to not matter over an
> >>> interval of a few thousand requests between hangcheck samples).
> >>
> >> We couldn't keep the global seqno just for hangcheck puposes? I mean as
> >> long as it is unique, which would be guaranteed by obtaining an
> >> increment on every submission to hw and storing it in atomic_t
> >> i915->hangcheck_global_seqno / rq->hangcheck_global_seqno, hangcheck
> >> does not care about the order of execution, no?
> > 
> > s/global_seqno/hangcheck_seqno/ ?
> 
> Yes sure, I was just trying to express the idea that a "globally" unique 
> number is all that I thought we need. Like:
> 
>      rq->hangcheck_seqno = atomic_inc_return(&i915->hangcheck_seqno);
> 
> Did I get that right then? That we don't really need the pseudo random 
> number solution? We could even avoid calling it a seqno if desired. 
> rq->unique, wait.. we possibly had this name for something in the past..

We don't need it to be random, I just picked the psuedo-random number so
we got used to not expecting it to be sequential and to be sure we
didn't make the mistake of assuming it was.
 
> > (a) the goal is to kill off global_seqno entirely so we are all sure
> > there is no such seqno or ordering anymore
> > (b) this is a temporary patch and we kill off hangcheck_seqno, just as
> > soon as I can submit requests without struct_mutex
> 
> The heartbeat request solution? Is that better than the hangcheck seqno?

Yes. We don't need an extra seqno every request and handles preemptible
OpenCL persistent kernels, as well as any other long running compute
batch (thinking some of the WebGL tests, they both expect hangcheck and
expect that isn't too quick afair).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer
  2019-02-11 18:18   ` Tvrtko Ursulin
@ 2019-02-12 13:40     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 13:40 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 18:18:36)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > As we no longer have a precise indication of requests queued to an
> > engine, make no presumptions and just sample the ring registers to see
> > if the engine is busy.
> > 
> > v2: Report busy while the ring is idling on a semaphore/event.
> 
> I was planning to take care of this detail but cool, no complaints. :)
> 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_pmu.c | 55 +++++++++++++--------------------
> >   1 file changed, 21 insertions(+), 34 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_pmu.c b/drivers/gpu/drm/i915/i915_pmu.c
> > index 13d70b90dd0f..157cbfa155d9 100644
> > --- a/drivers/gpu/drm/i915/i915_pmu.c
> > +++ b/drivers/gpu/drm/i915/i915_pmu.c
> > @@ -101,7 +101,7 @@ static bool pmu_needs_timer(struct drm_i915_private *i915, bool gpu_active)
> >        *
> >        * Use RCS as proxy for all engines.
> >        */
> > -     else if (intel_engine_supports_stats(i915->engine[RCS]))
> > +     else if (i915->caps.scheduler & I915_SCHEDULER_CAP_PMU)
> 
> Need to nuke the comment as well.
> 
> But my problem is I still think I915_SCHEDULER_CAP_PMU is wrong name and 
> level. It is neither a scheduler feature, nor the whole PMU. Maybe 
> I915_SCHEDULER_CAP_ENGINE_STATS removes one contention point, but still 
> I am wondering if I could refactor how the PMU tracks the need for 
> having the sampling timer and so remove the need for proxying from RCS 
> via that route.

This chunk wasn't meant to be in this patch, sorry.

> >               enable &= ~BIT(I915_SAMPLE_BUSY);
> >   
> >       /*
> > @@ -148,14 +148,6 @@ void i915_pmu_gt_unparked(struct drm_i915_private *i915)
> >       spin_unlock_irq(&i915->pmu.lock);
> >   }
> >   
> > -static bool grab_forcewake(struct drm_i915_private *i915, bool fw)
> > -{
> > -     if (!fw)
> > -             intel_uncore_forcewake_get(i915, FORCEWAKE_ALL);
> > -
> > -     return true;
> > -}
> > -
> >   static void
> >   add_sample(struct i915_pmu_sample *sample, u32 val)
> >   {
> > @@ -168,7 +160,6 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
> >       struct intel_engine_cs *engine;
> >       enum intel_engine_id id;
> >       intel_wakeref_t wakeref;
> > -     bool fw = false;
> >   
> >       if ((dev_priv->pmu.enable & ENGINE_SAMPLE_MASK) == 0)
> >               return;
> > @@ -181,36 +172,32 @@ engines_sample(struct drm_i915_private *dev_priv, unsigned int period_ns)
> >               return;
> >   
> >       for_each_engine(engine, dev_priv, id) {
> > -             u32 current_seqno = intel_engine_get_seqno(engine);
> > -             u32 last_seqno = intel_engine_last_submit(engine);
> > +             typeof(engine->pmu) *pmu = &engine->pmu;
> 
> I would also prefer we did not start introducing the idiom of declaring 
> locals outside macros with typeof.

You didn't give me a convenient type name :) I don't mind it too much,
auto locals.

> 
> > +             bool busy;
> >               u32 val;
> >   
> > -             val = !i915_seqno_passed(current_seqno, last_seqno);
> > -
> > -             if (val)
> > -                     add_sample(&engine->pmu.sample[I915_SAMPLE_BUSY],
> > -                                period_ns);
> > -
> > -             if (val && (engine->pmu.enable &
> > -                 (BIT(I915_SAMPLE_WAIT) | BIT(I915_SAMPLE_SEMA)))) {
> > -                     fw = grab_forcewake(dev_priv, fw);
> > -
> > -                     val = I915_READ_FW(RING_CTL(engine->mmio_base));
> > -             } else {
> > -                     val = 0;
> > -             }
> > +             val = I915_READ_FW(RING_CTL(engine->mmio_base));
> > +             if (val == 0 || val == ~0u) /* outside of powerwell */
> > +                     continue;
> Would /* Powerwell not awake. */ be clearer?
> 
> So the claim is we can rely on register being either all zeros or all 
> ones when powered down? Absolutely 100%? Is this documented somewhere? 
> But still need the runtime pm ref?

We still runtime pm or else we upset our own sanitychecks, iirc,
although we may be bypassing those and not leaving an error in the
mmio-debug register, I haven't checked.

As far as I remember, it always reads 0 outside the powerwell. Some
registers return ~0u so I hedged my bets.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-02-06 13:03 ` [PATCH 33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
@ 2019-02-12 13:43   ` Tvrtko Ursulin
  0 siblings, 0 replies; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-12 13:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> It can be useful to have a single ioctl to create a context with all
> the initial parameters instead of a series of create + setparam + setparam
> ioctls. This extension to create context allows any of the parameters
> to be passed in as a linked list to be applied to the newly constructed
> context.
> 
> v2: Make a local copy of user setparam (Tvrtko)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_drv.c         |   2 +-
>   drivers/gpu/drm/i915/i915_gem_context.c | 428 +++++++++++++-----------
>   include/uapi/drm/i915_drm.h             | 163 ++++-----
>   3 files changed, 325 insertions(+), 268 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 487e78094e93..fc11460f8327 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2994,7 +2994,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_SET_SPRITE_COLORKEY, intel_sprite_set_colorkey_ioctl, DRM_MASTER),
>   	DRM_IOCTL_DEF_DRV(I915_GET_SPRITE_COLORKEY, drm_noop, DRM_MASTER),
>   	DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
> -	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index dd49b1ef3ff2..609ef59f4d95 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -89,6 +89,7 @@
>   #include <drm/i915_drm.h>
>   #include "i915_drv.h"
>   #include "i915_trace.h"
> +#include "i915_user_extensions.h"
>   #include "intel_lrc_reg.h"
>   #include "intel_workarounds.h"
>   
> @@ -1044,188 +1045,6 @@ static int set_ppgtt(struct i915_gem_context *ctx,
>   	return err;
>   }
>   
> -static bool client_is_banned(struct drm_i915_file_private *file_priv)
> -{
> -	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> -}
> -
> -int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> -				  struct drm_file *file)
> -{
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct drm_i915_gem_context_create *args = data;
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct i915_gem_context *ctx;
> -	int ret;
> -
> -	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
> -		return -ENODEV;
> -
> -	if (args->pad != 0)
> -		return -EINVAL;
> -
> -	if (client_is_banned(file_priv)) {
> -		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
> -			  current->comm,
> -			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
> -
> -		return -EIO;
> -	}
> -
> -	ret = i915_mutex_lock_interruptible(dev);
> -	if (ret)
> -		return ret;
> -
> -	ctx = i915_gem_create_context(dev_priv, file_priv);
> -	mutex_unlock(&dev->struct_mutex);
> -	if (IS_ERR(ctx))
> -		return PTR_ERR(ctx);
> -
> -	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
> -
> -	args->ctx_id = ctx->user_handle;
> -	DRM_DEBUG("HW context %d created\n", args->ctx_id);
> -
> -	return 0;
> -}
> -
> -int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> -				   struct drm_file *file)
> -{
> -	struct drm_i915_gem_context_destroy *args = data;
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct i915_gem_context *ctx;
> -	int ret;
> -
> -	if (args->pad != 0)
> -		return -EINVAL;
> -
> -	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
> -		return -ENOENT;
> -
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> -	if (!ctx)
> -		return -ENOENT;
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		goto out;
> -
> -	__destroy_hw_context(ctx, file_priv);
> -	mutex_unlock(&dev->struct_mutex);
> -
> -out:
> -	i915_gem_context_put(ctx);
> -	return 0;
> -}
> -
> -static int get_sseu(struct i915_gem_context *ctx,
> -		    struct drm_i915_gem_context_param *args)
> -{
> -	struct drm_i915_gem_context_param_sseu user_sseu;
> -	struct intel_engine_cs *engine;
> -	struct intel_context *ce;
> -	int ret;
> -
> -	if (args->size == 0)
> -		goto out;
> -	else if (args->size < sizeof(user_sseu))
> -		return -EINVAL;
> -
> -	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> -			   sizeof(user_sseu)))
> -		return -EFAULT;
> -
> -	if (user_sseu.flags || user_sseu.rsvd)
> -		return -EINVAL;
> -
> -	engine = intel_engine_lookup_user(ctx->i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> -	if (!engine)
> -		return -EINVAL;
> -
> -	/* Only use for mutex here is to serialize get_param and set_param. */
> -	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	ce = to_intel_context(ctx, engine);
> -
> -	user_sseu.slice_mask = ce->sseu.slice_mask;
> -	user_sseu.subslice_mask = ce->sseu.subslice_mask;
> -	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> -	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> -
> -	mutex_unlock(&ctx->i915->drm.struct_mutex);
> -
> -	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
> -			 sizeof(user_sseu)))
> -		return -EFAULT;
> -
> -out:
> -	args->size = sizeof(user_sseu);
> -
> -	return 0;
> -}
> -
> -int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> -				    struct drm_file *file)
> -{
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct drm_i915_gem_context_param *args = data;
> -	struct i915_gem_context *ctx;
> -	int ret = 0;
> -
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> -	if (!ctx)
> -		return -ENOENT;
> -
> -	switch (args->param) {
> -	case I915_CONTEXT_PARAM_BAN_PERIOD:
> -		ret = -EINVAL;
> -		break;
> -	case I915_CONTEXT_PARAM_NO_ZEROMAP:
> -		args->size = 0;
> -		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> -		break;
> -	case I915_CONTEXT_PARAM_GTT_SIZE:
> -		args->size = 0;
> -
> -		if (ctx->ppgtt)
> -			args->value = ctx->ppgtt->vm.total;
> -		else if (to_i915(dev)->mm.aliasing_ppgtt)
> -			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
> -		else
> -			args->value = to_i915(dev)->ggtt.vm.total;
> -		break;
> -	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
> -		args->size = 0;
> -		args->value = i915_gem_context_no_error_capture(ctx);
> -		break;
> -	case I915_CONTEXT_PARAM_BANNABLE:
> -		args->size = 0;
> -		args->value = i915_gem_context_is_bannable(ctx);
> -		break;
> -	case I915_CONTEXT_PARAM_PRIORITY:
> -		args->size = 0;
> -		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
> -		break;
> -	case I915_CONTEXT_PARAM_SSEU:
> -		ret = get_sseu(ctx, args);
> -		break;
> -	case I915_CONTEXT_PARAM_VM:
> -		ret = get_ppgtt(ctx, args);
> -		break;
> -	default:
> -		ret = -EINVAL;
> -		break;
> -	}
> -
> -	i915_gem_context_put(ctx);
> -	return ret;
> -}
> -
>   static int gen8_emit_rpcs_config(struct i915_request *rq,
>   				 struct intel_context *ce,
>   				 struct intel_sseu sseu)
> @@ -1501,18 +1320,11 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	return 0;
>   }
>   
> -int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> -				    struct drm_file *file)
> +static int ctx_setparam(struct i915_gem_context *ctx,
> +			struct drm_i915_gem_context_param *args)
>   {
> -	struct drm_i915_file_private *file_priv = file->driver_priv;
> -	struct drm_i915_gem_context_param *args = data;
> -	struct i915_gem_context *ctx;
>   	int ret = 0;
>   
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> -	if (!ctx)
> -		return -ENOENT;
> -
>   	switch (args->param) {
>   	case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   		if (args->size)
> @@ -1522,6 +1334,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
>   		break;
> +
>   	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1530,6 +1343,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		else
>   			i915_gem_context_clear_no_error_capture(ctx);
>   		break;
> +
>   	case I915_CONTEXT_PARAM_BANNABLE:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1547,7 +1361,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   
>   			if (args->size)
>   				ret = -EINVAL;
> -			else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
> +			else if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
>   				ret = -ENODEV;
>   			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
>   				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
> @@ -1575,6 +1389,236 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		break;
>   	}
>   
> +	return ret;
> +}
> +
> +static int create_setparam(struct i915_user_extension __user *ext, void *data)
> +{
> +	struct drm_i915_gem_context_create_ext_setparam local;
> +
> +	if (copy_from_user(&local, ext, sizeof(local)))
> +		return -EFAULT;
> +
> +	if (local.setparam.ctx_id)
> +		return -EINVAL;
> +
> +	return ctx_setparam(data, &local.setparam);
> +}
> +
> +static const i915_user_extension_fn create_extensions[] = {
> +	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
> +};
> +
> +static bool client_is_banned(struct drm_i915_file_private *file_priv)
> +{
> +	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> +}
> +
> +int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
> +				  struct drm_file *file)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct drm_i915_gem_context_create_ext *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_gem_context *ctx;
> +	int ret;
> +
> +	if (!DRIVER_CAPS(dev_priv)->has_logical_contexts)
> +		return -ENODEV;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (client_is_banned(file_priv)) {
> +		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
> +			  current->comm,
> +			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
> +
> +		return -EIO;
> +	}
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret)
> +		return ret;
> +
> +	ctx = i915_gem_create_context(dev_priv, file_priv);
> +	mutex_unlock(&dev->struct_mutex);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
> +
> +	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
> +
> +	ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
> +				   create_extensions,
> +				   ARRAY_SIZE(create_extensions),
> +				   ctx);
> +	if (ret) {
> +		idr_remove(&file_priv->context_idr, ctx->user_handle);
> +		context_close(ctx);
> +		return ret;
> +	}
> +
> +	args->ctx_id = ctx->user_handle;
> +	DRM_DEBUG("HW context %d created\n", args->ctx_id);
> +
> +	return 0;
> +}
> +
> +int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> +				   struct drm_file *file)
> +{
> +	struct drm_i915_gem_context_destroy *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_gem_context *ctx;
> +	int ret;
> +
> +	if (args->pad != 0)
> +		return -EINVAL;
> +
> +	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
> +		return -ENOENT;
> +
> +	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (!ctx)
> +		return -ENOENT;
> +
> +	ret = mutex_lock_interruptible(&dev->struct_mutex);
> +	if (ret)
> +		goto out;
> +
> +	__destroy_hw_context(ctx, file_priv);
> +	mutex_unlock(&dev->struct_mutex);
> +
> +out:
> +	i915_gem_context_put(ctx);
> +	return 0;
> +}
> +
> +static int get_sseu(struct i915_gem_context *ctx,
> +		    struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_gem_context_param_sseu user_sseu;
> +	struct intel_engine_cs *engine;
> +	struct intel_context *ce;
> +	int ret;
> +
> +	if (args->size == 0)
> +		goto out;
> +	else if (args->size < sizeof(user_sseu))
> +		return -EINVAL;
> +
> +	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
> +			   sizeof(user_sseu)))
> +		return -EFAULT;
> +
> +	if (user_sseu.flags || user_sseu.rsvd)
> +		return -EINVAL;
> +
> +	engine = intel_engine_lookup_user(ctx->i915,
> +					  user_sseu.engine_class,
> +					  user_sseu.engine_instance);
> +	if (!engine)
> +		return -EINVAL;
> +
> +	/* Only use for mutex here is to serialize get_param and set_param. */
> +	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ce = to_intel_context(ctx, engine);
> +
> +	user_sseu.slice_mask = ce->sseu.slice_mask;
> +	user_sseu.subslice_mask = ce->sseu.subslice_mask;
> +	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
> +	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
> +
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
> +			 sizeof(user_sseu)))
> +		return -EFAULT;
> +
> +out:
> +	args->size = sizeof(user_sseu);
> +
> +	return 0;
> +}
> +
> +int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_context_param *args = data;
> +	struct i915_gem_context *ctx;
> +	int ret = 0;
> +
> +	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (!ctx)
> +		return -ENOENT;
> +
> +	switch (args->param) {
> +	case I915_CONTEXT_PARAM_NO_ZEROMAP:
> +		args->size = 0;
> +		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_GTT_SIZE:
> +		args->size = 0;
> +		if (ctx->ppgtt)
> +			args->value = ctx->ppgtt->vm.total;
> +		else if (to_i915(dev)->mm.aliasing_ppgtt)
> +			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
> +		else
> +			args->value = to_i915(dev)->ggtt.vm.total;
> +		break;
> +
> +	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
> +		args->size = 0;
> +		args->value = i915_gem_context_no_error_capture(ctx);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BANNABLE:
> +		args->size = 0;
> +		args->value = i915_gem_context_is_bannable(ctx);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_PRIORITY:
> +		args->size = 0;
> +		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
> +		break;
> +
> +	case I915_CONTEXT_PARAM_SSEU:
> +		ret = get_sseu(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = get_ppgtt(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BAN_PERIOD:
> +	default:
> +		ret = -EINVAL;
> +		break;
> +	}
> +
> +	i915_gem_context_put(ctx);
> +	return ret;
> +}
> +
> +int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
> +				    struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_context_param *args = data;
> +	struct i915_gem_context *ctx;
> +	int ret;
> +
> +	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (!ctx)
> +		return -ENOENT;
> +
> +	ret = ctx_setparam(ctx, args);
> +
>   	i915_gem_context_put(ctx);
>   	return ret;
>   }
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 2cd79639d6b5..c9ba7e408117 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -389,6 +389,7 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
>   #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
>   #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
> +#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)
>   #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
>   #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
>   #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
> @@ -1438,85 +1439,14 @@ struct drm_i915_gem_wait {
>   };
>   
>   struct drm_i915_gem_context_create {
> -	/*  output: id of new context*/
> -	__u32 ctx_id;
> -	__u32 pad;
> -};
> -
> -struct drm_i915_gem_context_destroy {
> -	__u32 ctx_id;
> +	__u32 ctx_id; /* output: id of new context*/
>   	__u32 pad;
>   };
>   
> -/*
> - * DRM_I915_GEM_VM_CREATE -
> - *
> - * Create a new virtual memory address space (ppGTT) for use within a context
> - * on the same file. Extensions can be provided to configure exactly how the
> - * address space is setup upon creation.
> - *
> - * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> - * returned.
> - *
> - * DRM_I915_GEM_VM_DESTROY -
> - *
> - * Destroys a previously created VM id.
> - */
> -struct drm_i915_gem_vm_control {
> -	__u64 extensions;
> +struct drm_i915_gem_context_create_ext {
> +	__u32 ctx_id; /* output: id of new context*/
>   	__u32 flags;
> -	__u32 id;
> -};
> -
> -struct drm_i915_reg_read {
> -	/*
> -	 * Register offset.
> -	 * For 64bit wide registers where the upper 32bits don't immediately
> -	 * follow the lower 32bits, the offset of the lower 32bits must
> -	 * be specified
> -	 */
> -	__u64 offset;
> -#define I915_REG_READ_8B_WA (1ul << 0)
> -
> -	__u64 val; /* Return value */
> -};
> -/* Known registers:
> - *
> - * Render engine timestamp - 0x2358 + 64bit - gen7+
> - * - Note this register returns an invalid value if using the default
> - *   single instruction 8byte read, in order to workaround that pass
> - *   flag I915_REG_READ_8B_WA in offset field.
> - *
> - */
> -
> -struct drm_i915_reset_stats {
> -	__u32 ctx_id;
> -	__u32 flags;
> -
> -	/* All resets since boot/module reload, for all contexts */
> -	__u32 reset_count;
> -
> -	/* Number of batches lost when active in GPU, for this context */
> -	__u32 batch_active;
> -
> -	/* Number of batches lost pending for execution, for this context */
> -	__u32 batch_pending;
> -
> -	__u32 pad;
> -};
> -
> -struct drm_i915_gem_userptr {
> -	__u64 user_ptr;
> -	__u64 user_size;
> -	__u32 flags;
> -#define I915_USERPTR_READ_ONLY 0x1
> -#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
> -	/**
> -	 * Returned handle for the object.
> -	 *
> -	 * Object handles are nonzero.
> -	 */
> -	__u32 handle;
> +	__u64 extensions;
>   };
>   
>   struct drm_i915_gem_context_param {
> @@ -1610,6 +1540,89 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +struct drm_i915_gem_context_create_ext_setparam {
> +#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
> +	struct i915_user_extension base;
> +	struct drm_i915_gem_context_param setparam;
> +};
> +
> +struct drm_i915_gem_context_destroy {
> +	__u32 ctx_id;
> +	__u32 pad;
> +};
> +
> +/*
> + * DRM_I915_GEM_VM_CREATE -
> + *
> + * Create a new virtual memory address space (ppGTT) for use within a context
> + * on the same file. Extensions can be provided to configure exactly how the
> + * address space is setup upon creation.
> + *
> + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> + * returned.
> + *
> + * DRM_I915_GEM_VM_DESTROY -
> + *
> + * Destroys a previously created VM id.
> + */
> +struct drm_i915_gem_vm_control {
> +	__u64 extensions;
> +	__u32 flags;
> +	__u32 id;
> +};
> +
> +struct drm_i915_reg_read {
> +	/*
> +	 * Register offset.
> +	 * For 64bit wide registers where the upper 32bits don't immediately
> +	 * follow the lower 32bits, the offset of the lower 32bits must
> +	 * be specified
> +	 */
> +	__u64 offset;
> +#define I915_REG_READ_8B_WA (1ul << 0)
> +
> +	__u64 val; /* Return value */
> +};
> +
> +/* Known registers:
> + *
> + * Render engine timestamp - 0x2358 + 64bit - gen7+
> + * - Note this register returns an invalid value if using the default
> + *   single instruction 8byte read, in order to workaround that pass
> + *   flag I915_REG_READ_8B_WA in offset field.
> + *
> + */
> +
> +struct drm_i915_reset_stats {
> +	__u32 ctx_id;
> +	__u32 flags;
> +
> +	/* All resets since boot/module reload, for all contexts */
> +	__u32 reset_count;
> +
> +	/* Number of batches lost when active in GPU, for this context */
> +	__u32 batch_active;
> +
> +	/* Number of batches lost pending for execution, for this context */
> +	__u32 batch_pending;
> +
> +	__u32 pad;
> +};
> +
> +struct drm_i915_gem_userptr {
> +	__u64 user_ptr;
> +	__u64 user_size;
> +	__u32 flags;
> +#define I915_USERPTR_READ_ONLY 0x1
> +#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
> +	/**
> +	 * Returned handle for the object.
> +	 *
> +	 * Object handles are nonzero.
> +	 */
> +	__u32 handle;
> +};
> +
>   enum drm_i915_oa_format {
>   	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
>   	I915_OA_FORMAT_A29,	    /* HSW only */
> 

LGTM. Feels like r-b is premature given the series will probably still 
change but effectively it's an r-b.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 21/46] drm/i915: Remove i915_request.global_seqno
  2019-02-11 18:44   ` Tvrtko Ursulin
@ 2019-02-12 13:45     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 13:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 18:44:50)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> > index a674c78ca1f8..8792ad12373d 100644
> > --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> > +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> > @@ -380,19 +380,16 @@ static void print_error_buffers(struct drm_i915_error_state_buf *m,
> >       err_printf(m, "%s [%d]:\n", name, count);
> >   
> >       while (count--) {
> > -             err_printf(m, "    %08x_%08x %8u %02x %02x %02x",
> > +             err_printf(m, "    %08x_%08x %8u %02x %02x",
> >                          upper_32_bits(err->gtt_offset),
> >                          lower_32_bits(err->gtt_offset),
> >                          err->size,
> >                          err->read_domains,
> > -                        err->write_domain,
> > -                        err->wseqno);
> > +                        err->write_domain);
> >               err_puts(m, tiling_flag(err->tiling));
> >               err_puts(m, dirty_flag(err->dirty));
> >               err_puts(m, purgeable_flag(err->purgeable));
> >               err_puts(m, err->userptr ? " userptr" : "");
> > -             err_puts(m, err->engine != -1 ? " " : "");
> > -             err_puts(m, engine_name(m->i915, err->engine));
> 
> Why remove this information?

Because I don't like it. Having removed the global write seqno tracking,
the concept is defunct. The information that we need to present, if
anyone actually cared, would be the obj->resv.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask
  2019-02-11 18:51   ` Tvrtko Ursulin
@ 2019-02-12 13:51     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 13:51 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 18:51:09)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > In the next patch, we are introducing a broad virtual engine to encompass
> > multiple physical engines, losing the 1:1 nature of BIT(engine->id). To
> > reflect the broader set of engines implied by the virtual instance, lets
> > store the full bitmask.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_reset.c                | 4 ++--
> >   drivers/gpu/drm/i915/intel_engine_cs.c           | 3 +++
> >   drivers/gpu/drm/i915/intel_hangcheck.c           | 8 ++++----
> >   drivers/gpu/drm/i915/intel_ringbuffer.c          | 4 ++--
> >   drivers/gpu/drm/i915/intel_ringbuffer.h          | 7 +------
> >   drivers/gpu/drm/i915/selftests/intel_hangcheck.c | 2 +-
> >   drivers/gpu/drm/i915/selftests/mock_engine.c     | 1 +
> >   7 files changed, 14 insertions(+), 15 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> > index 7051c0a43941..78c9689629a0 100644
> > --- a/drivers/gpu/drm/i915/i915_reset.c
> > +++ b/drivers/gpu/drm/i915/i915_reset.c
> > @@ -1053,7 +1053,7 @@ void i915_reset(struct drm_i915_private *i915,
> >   static inline int intel_gt_reset_engine(struct drm_i915_private *i915,
> >                                       struct intel_engine_cs *engine)
> >   {
> > -     return intel_gpu_reset(i915, intel_engine_flag(engine));
> > +     return intel_gpu_reset(i915, engine->mask);
> >   }
> >   
> >   /**
> > @@ -1253,7 +1253,7 @@ void i915_handle_error(struct drm_i915_private *i915,
> >                               continue;
> >   
> >                       if (i915_reset_engine(engine, msg) == 0)
> > -                             engine_mask &= ~intel_engine_flag(engine);
> > +                             engine_mask &= ~engine->mask;
> >   
> >                       clear_bit(I915_RESET_ENGINE + engine->id,
> >                                 &error->flags);
> > diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> > index ce7c19f2ae49..45e38877ab17 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> > +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> > @@ -313,7 +313,10 @@ intel_engine_setup(struct drm_i915_private *dev_priv,
> >       if (!engine)
> >               return -ENOMEM;
> >   
> > +     BUILD_BUG_ON(BITS_PER_TYPE(engine->mask) < I915_NUM_ENGINES);
> > +
> >       engine->id = id;
> > +     engine->mask = BIT(id);
> >       engine->i915 = dev_priv;
> >       __sprint_engine_name(engine->name, info);
> >       engine->hw_id = engine->guc_id = info->hw_id;
> > diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
> > index e04b2560369e..58b6ff8453dc 100644
> > --- a/drivers/gpu/drm/i915/intel_hangcheck.c
> > +++ b/drivers/gpu/drm/i915/intel_hangcheck.c
> > @@ -120,7 +120,7 @@ engine_stuck(struct intel_engine_cs *engine, u64 acthd)
> >        */
> >       tmp = I915_READ_CTL(engine);
> >       if (tmp & RING_WAIT) {
> > -             i915_handle_error(dev_priv, BIT(engine->id), 0,
> > +             i915_handle_error(dev_priv, engine->mask, 0,
> >                                 "stuck wait on %s", engine->name);
> >               I915_WRITE_CTL(engine, tmp);
> >               return ENGINE_WAIT_KICK;
> > @@ -282,13 +282,13 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
> >               hangcheck_store_sample(engine, &hc);
> >   
> >               if (hc.stalled) {
> > -                     hung |= intel_engine_flag(engine);
> > +                     hung |= engine->mask;
> >                       if (hc.action != ENGINE_DEAD)
> > -                             stuck |= intel_engine_flag(engine);
> > +                             stuck |= engine->mask;
> >               }
> >   
> >               if (hc.wedged)
> > -                     wedged |= intel_engine_flag(engine);
> > +                     wedged |= engine->mask;
> >       }
> >   
> >       if (GEM_SHOW_DEBUG() && (hung | stuck)) {
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > index 1b96b0960adc..91c49f644898 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> > @@ -1859,8 +1859,8 @@ static int switch_context(struct i915_request *rq)
> >                               goto err;
> >               } while (--loops);
> >   
> > -             if (intel_engine_flag(engine) & ppgtt->pd_dirty_rings) {
> > -                     unwind_mm = intel_engine_flag(engine);
> > +             if (ppgtt->pd_dirty_rings & engine->mask) {
> > +                     unwind_mm = engine->mask;
> >                       ppgtt->pd_dirty_rings &= ~unwind_mm;
> >                       hw_flags = MI_FORCE_RESTORE;
> >               }
> > diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > index 39a9ee7b61e2..7777d46784f9 100644
> > --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> > +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> > @@ -334,6 +334,7 @@ struct intel_engine_cs {
> >       enum intel_engine_id id;
> >       unsigned int hw_id;
> >       unsigned int guc_id;
> > +     unsigned long mask;
> 
> Could use intel_ring_mask_t - if we renamed it to intel_engine_mask_t - 
> which is already checked with a BUILD_BUG_ON.

Only downside, it was hidden away in intel_device_info.h

[snip]
 
> No other suggestions.

The name is open to suggestions, engine->physmask was a possibility we
discussed, but none of the suggestions stuck over a plain old boring
engine->mask.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method
  2019-02-11 19:00   ` Tvrtko Ursulin
@ 2019-02-12 13:56     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 13:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 19:00:59)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > An idea for extending uABI inspired by Vulkan's extension chains.
> > Instead of expanding the data struct for each ioctl every time we need
> > to add a new feature, define an extension chain instead. As we add
> > optional interfaces to control the ioctl, we define a new extension
> > struct that can be linked into the ioctl data only when required by the
> > user. The key advantage being able to ignore large control structs for
> > optional interfaces/extensions, while being able to process them in a
> > consistent manner.
> > 
> > In comparison to other extensible ioctls, the key difference is the
> > use of a linked chain of extension structs vs an array of tagged
> > pointers. For example,
> > 
> > struct drm_amdgpu_cs_chunk {
> >          __u32           chunk_id;
> >          __u32           length_dw;
> >          __u64           chunk_data;
> > };
> > 
> > struct drm_amdgpu_cs_in {
> >          __u32           ctx_id;
> >          __u32           bo_list_handle;
> >          __u32           num_chunks;
> >          __u32           _pad;
> >          __u64           chunks;
> > };
> > 
> > allows userspace to pass in array of pointers to extension structs, but
> > must therefore keep constructing that array along side the command stream.
> > In dynamic situations like that, a linked list is preferred and does not
> > similar from extra cache line misses as the extension structs themselves
> > must still be loaded separate to the chunks array.
> > 
> > v2: Apply the tail call optimisation directly to nip the worry of stack
> > overflow in the bud.
> > v3: Defend against recursion.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/Makefile               |  1 +
> >   drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
> >   drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
> >   drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
> >   include/uapi/drm/i915_drm.h                 | 20 ++++++++++
> >   5 files changed, 91 insertions(+)
> >   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
> >   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h
> > 
> > diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> > index a1d834068765..89105b1aaf12 100644
> > --- a/drivers/gpu/drm/i915/Makefile
> > +++ b/drivers/gpu/drm/i915/Makefile
> > @@ -46,6 +46,7 @@ i915-y := i915_drv.o \
> >         i915_sw_fence.o \
> >         i915_syncmap.o \
> >         i915_sysfs.o \
> > +       i915_user_extensions.o \
> >         intel_csr.o \
> >         intel_device_info.o \
> >         intel_pm.o \
> > diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
> > new file mode 100644
> > index 000000000000..879b4094b2d7
> > --- /dev/null
> > +++ b/drivers/gpu/drm/i915/i915_user_extensions.c
> > @@ -0,0 +1,43 @@
> > +/*
> > + * SPDX-License-Identifier: MIT
> > + *
> > + * Copyright © 2018 Intel Corporation
> > + */
> > +
> > +#include <linux/sched/signal.h>
> > +#include <linux/uaccess.h>
> > +#include <uapi/drm/i915_drm.h>
> > +
> > +#include "i915_user_extensions.h"
> > +
> > +int i915_user_extensions(struct i915_user_extension __user *ext,
> > +                      const i915_user_extension_fn *tbl,
> > +                      unsigned long count,
> > +                      void *data)
> > +{
> > +     unsigned int stackdepth = 512;
> > +
> > +     while (ext) {
> > +             int err;
> > +             u64 x;
> > +
> > +             if (!stackdepth--) /* recursion vs useful flexibility */
> > +                     return -EINVAL;
> 
> I don't get this - which recursion? Variable is also a local automatic.

User recursion:
my_uapi_extension = {
	.name = MY_UAPI_EXTENSION,
	.next = to_user_pointer(&my_uapi_extension),
};

> > +             if (get_user(x, &ext->name))
> > +                     return -EFAULT;
> > +
> > +             err = -EINVAL;
> > +             if (x < count && tbl[x])
> > +                     err = tbl[x](ext, data);
> > +             if (err)
> > +                     return err;
> 
> Do you plan to add the unwind as previously discussed? Or we define the 
> interface as having undefined state if one extension failed? I would be 
> a bit suboptimal for userspace since it would mean having to throw away 
> and recreate the object in use cases for user extensions capable ioctl 
> is executed post creation of some object.

I don't have a nice plan for adding unwind support, and it isn't
required to handle the current usecases. Real recursion would do the
trick, or manual callstacks. But each unwind callback basically has to
handle the struct in an unknown state, and in the end it all boils down
to a fini anyway (I think).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 30/46] drm/i915: Track active engines within a context
  2019-02-11 19:11   ` Tvrtko Ursulin
@ 2019-02-12 13:59     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 13:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-11 19:11:08)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > @@ -1361,6 +1362,7 @@ __execlists_context_pin(struct intel_engine_cs *engine,
> >   
> >       ce->state->obj->pin_global++;
> >       i915_gem_context_get(ctx);
> > +     list_add(&ce->active_link, &ctx->active_engines);
> 
> Why is it called active_engines if it lists active_contexts? :)

active_context_engines doesn't quite have the same ring to it. It's the
list of engines on which this GEM context has been pinned, which is my
approximation for active and is the list I need to flush if we change
logical state.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-02-12 11:18   ` Tvrtko Ursulin
@ 2019-02-12 14:11     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-12 14:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-12 11:18:16)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > In preparation to making the ppGTT binding for a context explicit (to
> > facilitate reusing the same ppGTT between different contexts), allow the
> > user to create and destroy named ppGTT.
> > 
> > v2: Replace global barrier for swapping over the ppgtt and tlbs with a
> > local context barrier (Tvrtko)
> > v3: serialise with struct_mutex; it's lazy but required dammit
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_drv.c               |   2 +
> >   drivers/gpu/drm/i915/i915_drv.h               |   3 +
> >   drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
> >   drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
> >   drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
> >   drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
> >   .../gpu/drm/i915/selftests/i915_gem_context.c | 239 ++++++++++++----
> >   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
> >   drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
> >   include/uapi/drm/i915_drm.h                   |  35 +++
> >   11 files changed, 510 insertions(+), 71 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> > index 36da8ab1e7ce..487e78094e93 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.c
> > +++ b/drivers/gpu/drm/i915/i915_drv.c
> > @@ -3005,6 +3005,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
> >       DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> >       DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> > +     DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
> > +     DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
> >   };
> >   
> >   static struct drm_driver driver = {
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index e554691304dc..523de3644570 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -217,6 +217,9 @@ struct drm_i915_file_private {
> >       } mm;
> >       struct idr context_idr;
> >   
> > +     struct mutex vm_lock;
> > +     struct idr vm_idr;
> > +
> >       unsigned int bsd_engine;
> >   
> >   /*
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index c3f41f501276..dd49b1ef3ff2 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -110,6 +110,8 @@ static void lut_close(struct i915_gem_context *ctx)
> >               struct i915_vma *vma = rcu_dereference_raw(*slot);
> >   
> >               radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
> > +
> > +             vma->open_count--;
> 
> I am finding vma->open_count handling seriously confusing. So maybe that 
> means there should be a comment here as minimum.
> 
> What is the path in the current code which brings vma->open_count back 
> to zero? If it is not done from lut_close, but object is removed from 
> the lut_list, then the only current decrement in i915_gem_close_object 
> won't run. Surely I am missing something..

In ppgtt, it is single instance, so we just know at this point it hits
zero. The open_count was originally to handle the reuse via the shared
GGTT.

> >   static struct i915_gem_context *
> >   i915_gem_create_context(struct drm_i915_private *dev_priv,
> >                       struct drm_i915_file_private *file_priv)
> > @@ -451,8 +479,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
> >                       return ERR_CAST(ppgtt);
> >               }
> >   
> > -             ctx->ppgtt = ppgtt;
> > -             ctx->desc_template = default_desc_template(dev_priv, ppgtt);
> > +             __assign_ppgtt(ctx, ppgtt);
> > +             i915_ppgtt_put(ppgtt);
> 
> Looks strange and one realizes it is dropping the ref __assign_ppgtt 
> takes. Not sure if it wouldn't be better to just open code what this 
> site needs.

Don't tempt me; you know you'll regret the amount of code I'll copy
around just because it's easier :)

I think having assign add a ref and not borrow the callers is simpler in
the long run.

> > @@ -662,6 +700,89 @@ void i915_gem_context_close(struct drm_file *file)
> >   
> >       idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
> >       idr_destroy(&file_priv->context_idr);
> > +
> > +     idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
> > +     idr_destroy(&file_priv->vm_idr);
> 
> The name of this function always confuses me. Should we rename it to 
> i915_gem_close_contexts or something?

Can do. Though not in this patch, so go right ahead ;)

> > +static int get_ppgtt(struct i915_gem_context *ctx,
> > +                  struct drm_i915_gem_context_param *args)
> > +{
> > +     struct drm_i915_file_private *file_priv = ctx->file_priv;
> > +     struct i915_hw_ppgtt *ppgtt;
> > +     int ret;
> > +
> > +     if (!ctx->ppgtt)
> > +             return -ENODEV;
> > +
> > +     /* XXX rcu acquire? */
> > +     ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> 
> Only to serialize threads working on the same ctx? Why not do that under 
> the new vm->lock instead?


> > +     err = mutex_lock_interruptible(&file_priv->vm_lock);
> > +     if (err)
> > +             return err;
> > +
> > +     ppgtt = idr_find(&file_priv->vm_idr, args->value);
> > +     if (ppgtt) {
> > +             GEM_BUG_ON(ppgtt->user_handle != args->value);
> > +             i915_ppgtt_get(ppgtt);
> > +     }
> > +     mutex_unlock(&file_priv->vm_lock);
> > +     if (!ppgtt)
> > +             return -ENOENT;
> > +
> > +     err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> > +     if (err)
> > +             goto out;
> > +
> > +     if (ppgtt == ctx->ppgtt)
> > +             goto unlock;
> > +
> > +     /* Teardown the existing obj:vma cache, it will have to be rebuilt. */
> > +     lut_close(ctx);
> 
> Nesting issues I guess, the answer to my previous question.

I'll take another look, I think my thinking was that file_priv->vm
didn't have wide enough scope. But ppgtt are definitely restricted to
being inside a single fd.

I've also a new lock for you to play with here, ctx->pin_mutex (solves
ce->sseu locking requirements as well as pinning in general).

> > +     old = __set_ppgtt(ctx, ppgtt);
> > +
> > +     /*
> > +      * We need to flush any requests using the current ppgtt before
> > +      * we release it as the requests do not hold a reference themselves,
> > +      * only indirectly through the context.
> > +      */
> > +     err = context_barrier_task(ctx, -1, set_ppgtt_barrier, old);
> 
> But barrier can be retired on user interrupt with context save still 
> running, no?

User interrupt while building the barrier requests? That results in the
barrier being cancelled with err = EINTR. The whole point is that the
context barrier is retired on the completion (maybe you meant
MI_USER_INTERRUPT) after the context has been run with the new ppgtt.
The context barrier itself is using the new ctx->ppgtt.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-06 13:03 ` [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
  2019-02-11 11:19   ` Tvrtko Ursulin
@ 2019-02-19 10:22   ` Matthew Auld
  2019-02-19 10:34     ` Chris Wilson
  1 sibling, 1 reply; 97+ messages in thread
From: Matthew Auld @ 2019-02-19 10:22 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Wed, 6 Feb 2019 at 13:05, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> WAIT is occasionally suppressed by virtue of preempted requests being
> promoted to NEWCLIENT if they have not all ready received that boost.
> Make this consistent for all WAIT boosts that they are not allowed to
> preempt executing contexts and are merely granted the right to be at the
> front of the queue for the next execution slot. This is in keeping with
> the desire that the WAIT boost be a minor tweak that does not give
> excessive promotion to its user and open ourselves to trivial abuse.
>
> The problem with the inconsistent WAIT preemption becomes more apparent
> as the preemption is propagated across the engines, where one engine may
> preempt and the other not, and we be relying on the exact execution
> order being consistent across engines (e.g. using HW semaphores to
> coordinate parallel execution).
>
> v2: Also protect GuC submission from false preemption loops.
> v3: Build bug safeguards and better debug messages for st.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_request.c        |  12 ++
>  drivers/gpu/drm/i915/i915_scheduler.h      |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c           |   9 +-
>  drivers/gpu/drm/i915/selftests/intel_lrc.c | 161 +++++++++++++++++++++
>  4 files changed, 183 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index c2a5c48c7541..35acef74b93a 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -372,12 +372,24 @@ void __i915_request_submit(struct i915_request *request)
>
>         /* We may be recursing from the signal callback of another i915 fence */
>         spin_lock_nested(&request->lock, SINGLE_DEPTH_NESTING);
> +
>         GEM_BUG_ON(test_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags));
>         set_bit(I915_FENCE_FLAG_ACTIVE, &request->fence.flags);
> +
>         request->global_seqno = seqno;
>         if (test_bit(DMA_FENCE_FLAG_ENABLE_SIGNAL_BIT, &request->fence.flags) &&
>             !i915_request_enable_breadcrumb(request))
>                 intel_engine_queue_breadcrumbs(engine);
> +
> +       /*
> +        * As we do not allow WAIT to preempt inflight requests,
> +        * once we have executed a request, along with triggering
> +        * any execution callbacks, we must preserve its ordering
> +        * within the non-preemptible FIFO.
> +        */
> +       BUILD_BUG_ON(__NO_PREEMPTION & ~I915_PRIORITY_MASK); /* only internal */
> +       request->sched.attr.priority |= __NO_PREEMPTION;
> +
>         spin_unlock(&request->lock);
>
>         engine->emit_fini_breadcrumb(request,
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
> index dbe9cb7ecd82..54bd6c89817e 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.h
> +++ b/drivers/gpu/drm/i915/i915_scheduler.h
> @@ -33,6 +33,8 @@ enum {
>  #define I915_PRIORITY_WAIT     ((u8)BIT(0))
>  #define I915_PRIORITY_NEWCLIENT        ((u8)BIT(1))
>
> +#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
> +
>  struct i915_sched_attr {
>         /**
>          * @priority: execution and service priority
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5d5ce91a5dfa..afd05e25f911 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -188,6 +188,12 @@ static inline int rq_prio(const struct i915_request *rq)
>         return rq->sched.attr.priority;
>  }
>
> +static int effective_prio(const struct i915_request *rq)
> +{
> +       /* Restrict mere WAIT boosts from triggering preemption */
> +       return rq_prio(rq) | __NO_PREEMPTION;
> +}
> +
>  static int queue_prio(const struct intel_engine_execlists *execlists)
>  {
>         struct i915_priolist *p;
> @@ -208,7 +214,7 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>  static inline bool need_preempt(const struct intel_engine_cs *engine,
>                                 const struct i915_request *rq)
>  {
> -       const int last_prio = rq_prio(rq);
> +       int last_prio;
>
>         if (!intel_engine_has_preemption(engine))
>                 return false;
> @@ -228,6 +234,7 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>          * preempt. If that hint is stale or we may be trying to preempt
>          * ourselves, ignore the request.
>          */
> +       last_prio = effective_prio(rq);
>         if (!__execlists_need_preempt(engine->execlists.queue_priority_hint,
>                                       last_prio))
>                 return false;
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 58144e024751..263afd2f1596 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -407,6 +407,166 @@ static int live_suppress_self_preempt(void *arg)
>         goto err_client_b;
>  }
>
> +static int __i915_sw_fence_call
> +dummy_notify(struct i915_sw_fence *fence, enum i915_sw_fence_notify state)
> +{
> +       return NOTIFY_DONE;
> +}
> +
> +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> +{
> +       struct i915_request *rq;
> +
> +       rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> +       if (!rq)
> +               return NULL;
> +
> +       INIT_LIST_HEAD(&rq->active_list);
> +       rq->engine = engine;
> +
> +       i915_sched_node_init(&rq->sched);
> +
> +       /* mark this request as permanently incomplete */
> +       rq->fence.seqno = 1;
> +       BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> +       rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> +       GEM_BUG_ON(i915_request_completed(rq));
> +
> +       i915_sw_fence_init(&rq->submit, dummy_notify);
> +       i915_sw_fence_commit(&rq->submit);
> +
> +       return rq;
> +}
> +
> +static void dummy_request_free(struct i915_request *dummy)
> +{
> +       i915_request_mark_complete(dummy);
> +       i915_sched_node_fini(dummy->engine->i915, &dummy->sched);

Do we need i915_sw_fence_fini() in here somewhere?

While looking at something unrelated I hit something like:
ODEBUG: init destroyed (active state 0) object type: i915_sw_fence
hint:           (null)
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption
  2019-02-19 10:22   ` Matthew Auld
@ 2019-02-19 10:34     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-19 10:34 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2019-02-19 10:22:57)
> On Wed, 6 Feb 2019 at 13:05, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > +static struct i915_request *dummy_request(struct intel_engine_cs *engine)
> > +{
> > +       struct i915_request *rq;
> > +
> > +       rq = kmalloc(sizeof(*rq), GFP_KERNEL | __GFP_ZERO);
> > +       if (!rq)
> > +               return NULL;
> > +
> > +       INIT_LIST_HEAD(&rq->active_list);
> > +       rq->engine = engine;
> > +
> > +       i915_sched_node_init(&rq->sched);
> > +
> > +       /* mark this request as permanently incomplete */
> > +       rq->fence.seqno = 1;
> > +       BUILD_BUG_ON(sizeof(rq->fence.seqno) != 8); /* upper 32b == 0 */
> > +       rq->hwsp_seqno = (u32 *)&rq->fence.seqno + 1;
> > +       GEM_BUG_ON(i915_request_completed(rq));
> > +
> > +       i915_sw_fence_init(&rq->submit, dummy_notify);
> > +       i915_sw_fence_commit(&rq->submit);
> > +
> > +       return rq;
> > +}
> > +
> > +static void dummy_request_free(struct i915_request *dummy)
> > +{
> > +       i915_request_mark_complete(dummy);
> > +       i915_sched_node_fini(dummy->engine->i915, &dummy->sched);
> 
> Do we need i915_sw_fence_fini() in here somewhere?
> 
> While looking at something unrelated I hit something like:
> ODEBUG: init destroyed (active state 0) object type: i915_sw_fence
> hint:           (null)

Yeah, a missing fw_fence_fini would account for that. We should also use
dma_fence_release if I haven't already, just in case it ends up being
RCU sensitive.

As for requiring dummy_request in the first place, I think it indicates
that the i915_request_add() api is inadequate. At the moment, the only
sore points are this particular test and later on when we have to
manually fudge the priority after submission (for heartbeat requests).
So, it's not a pressing issue, but definitely a weak point.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-06 13:03 ` [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
@ 2019-02-21 19:48   ` Daniele Ceraolo Spurio
  2019-02-21 21:17     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-02-21 19:48 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Sundaresan, Sujaritha


<snip>

> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>   	 * state. Fortunately, the kernel_context is disposable and we do
>   	 * not rely on its state.
>   	 */
> -	if (!i915_terminally_wedged(&i915->gpu_error)) {
> -		ret = i915_gem_switch_to_kernel_context(i915);
> -		if (ret)
> -			goto err_unlock;
> -
> -		ret = i915_gem_wait_for_idle(i915,
> -					     I915_WAIT_INTERRUPTIBLE |
> -					     I915_WAIT_LOCKED |
> -					     I915_WAIT_FOR_IDLE_BOOST,
> -					     HZ / 5);
> -		if (ret == -EINTR)
> -			goto err_unlock;
> -
> +	if (!switch_to_kernel_context_sync(i915)) { >   		/* Forcibly cancel outstanding work and leave the gpu quiet. */
>   		i915_gem_set_wedged(i915);
>   	}

GuC-related question: what's your expectation here in regards to GuC 
status? The current i915 flow expect either uc_reset_prepare() or 
uc_suspend() to be called to clean up the guc status, but we're calling 
neither of them here if the switch is successful. Do you expect the 
resume code to always blank out the GuC status before a reload?

Thanks,
Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-21 19:48   ` Daniele Ceraolo Spurio
@ 2019-02-21 21:17     ` Chris Wilson
  2019-02-21 21:31       ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-21 21:17 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx; +Cc: Sundaresan, Sujaritha

Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)
> 
> <snip>
> 
> > @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
> >        * state. Fortunately, the kernel_context is disposable and we do
> >        * not rely on its state.
> >        */
> > -     if (!i915_terminally_wedged(&i915->gpu_error)) {
> > -             ret = i915_gem_switch_to_kernel_context(i915);
> > -             if (ret)
> > -                     goto err_unlock;
> > -
> > -             ret = i915_gem_wait_for_idle(i915,
> > -                                          I915_WAIT_INTERRUPTIBLE |
> > -                                          I915_WAIT_LOCKED |
> > -                                          I915_WAIT_FOR_IDLE_BOOST,
> > -                                          HZ / 5);
> > -             if (ret == -EINTR)
> > -                     goto err_unlock;
> > -
> > +     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
> >               i915_gem_set_wedged(i915);
> >       }
> 
> GuC-related question: what's your expectation here in regards to GuC 
> status? The current i915 flow expect either uc_reset_prepare() or 
> uc_suspend() to be called to clean up the guc status, but we're calling 
> neither of them here if the switch is successful. Do you expect the 
> resume code to always blank out the GuC status before a reload?

(A few patches later on I propose that we always just do a reset+wedge
on suspend in lieu of hangcheck.)

On resume, we have to bring the HW up from scratch and do another reset
in the process. Some platforms have been known to survive the trips to
PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
clear the HW state. I expect we would need to force a reset on resume
even for the guc, to be sure we cover all cases such as kexec.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-21 21:17     ` Chris Wilson
@ 2019-02-21 21:31       ` Daniele Ceraolo Spurio
  2019-02-21 21:42         ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-02-21 21:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Sundaresan, Sujaritha



On 2/21/19 1:17 PM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)
>>
>> <snip>
>>
>>> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>>>         * state. Fortunately, the kernel_context is disposable and we do
>>>         * not rely on its state.
>>>         */
>>> -     if (!i915_terminally_wedged(&i915->gpu_error)) {
>>> -             ret = i915_gem_switch_to_kernel_context(i915);
>>> -             if (ret)
>>> -                     goto err_unlock;
>>> -
>>> -             ret = i915_gem_wait_for_idle(i915,
>>> -                                          I915_WAIT_INTERRUPTIBLE |
>>> -                                          I915_WAIT_LOCKED |
>>> -                                          I915_WAIT_FOR_IDLE_BOOST,
>>> -                                          HZ / 5);
>>> -             if (ret == -EINTR)
>>> -                     goto err_unlock;
>>> -
>>> +     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
>>>                i915_gem_set_wedged(i915);
>>>        }
>>
>> GuC-related question: what's your expectation here in regards to GuC
>> status? The current i915 flow expect either uc_reset_prepare() or
>> uc_suspend() to be called to clean up the guc status, but we're calling
>> neither of them here if the switch is successful. Do you expect the
>> resume code to always blank out the GuC status before a reload?
> 
> (A few patches later on I propose that we always just do a reset+wedge
> on suspend in lieu of hangcheck.)
> 
> On resume, we have to bring the HW up from scratch and do another reset
> in the process. Some platforms have been known to survive the trips to
> PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
> clear the HW state. I expect we would need to force a reset on resume
> even for the guc, to be sure we cover all cases such as kexec.
> -Chris
> 
More than about the HW state, my question here was about the SW 
tracking. At which point do we go and stop guc communication and mark 
guc as not loaded/accessible? e.g. we need to disable and re-enable CT 
buffers before GuC is reset/suspended to make sure the shared memory 
area is cleaned correctly (we currently avoid memsetting all of it on 
reload since it is quite big). Also, communication with GuC is going to 
increase going forward, so we'll need to make sure we accurately track 
its state and do all the relevant cleanups.

Daniele
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-21 21:31       ` Daniele Ceraolo Spurio
@ 2019-02-21 21:42         ` Chris Wilson
  2019-02-21 22:53           ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-21 21:42 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx; +Cc: Sundaresan, Sujaritha

Quoting Daniele Ceraolo Spurio (2019-02-21 21:31:45)
> 
> 
> On 2/21/19 1:17 PM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)
> >>
> >> <snip>
> >>
> >>> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
> >>>         * state. Fortunately, the kernel_context is disposable and we do
> >>>         * not rely on its state.
> >>>         */
> >>> -     if (!i915_terminally_wedged(&i915->gpu_error)) {
> >>> -             ret = i915_gem_switch_to_kernel_context(i915);
> >>> -             if (ret)
> >>> -                     goto err_unlock;
> >>> -
> >>> -             ret = i915_gem_wait_for_idle(i915,
> >>> -                                          I915_WAIT_INTERRUPTIBLE |
> >>> -                                          I915_WAIT_LOCKED |
> >>> -                                          I915_WAIT_FOR_IDLE_BOOST,
> >>> -                                          HZ / 5);
> >>> -             if (ret == -EINTR)
> >>> -                     goto err_unlock;
> >>> -
> >>> +     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
> >>>                i915_gem_set_wedged(i915);
> >>>        }
> >>
> >> GuC-related question: what's your expectation here in regards to GuC
> >> status? The current i915 flow expect either uc_reset_prepare() or
> >> uc_suspend() to be called to clean up the guc status, but we're calling
> >> neither of them here if the switch is successful. Do you expect the
> >> resume code to always blank out the GuC status before a reload?
> > 
> > (A few patches later on I propose that we always just do a reset+wedge
> > on suspend in lieu of hangcheck.)
> > 
> > On resume, we have to bring the HW up from scratch and do another reset
> > in the process. Some platforms have been known to survive the trips to
> > PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
> > clear the HW state. I expect we would need to force a reset on resume
> > even for the guc, to be sure we cover all cases such as kexec.
> > -Chris
> > 
> More than about the HW state, my question here was about the SW 
> tracking. At which point do we go and stop guc communication and mark 
> guc as not loaded/accessible? e.g. we need to disable and re-enable CT 
> buffers before GuC is reset/suspended to make sure the shared memory 
> area is cleaned correctly (we currently avoid memsetting all of it on 
> reload since it is quite big). Also, communication with GuC is going to 
> increase going forward, so we'll need to make sure we accurately track 
> its state and do all the relevant cleanups.

Across suspend/resume, we issue a couple of resets and scrub/sanitize our
state tracking. By the time we load the fw again, both the fw and our
state should be starting from scratch.

That all seems unavoidable, so I am not understanding the essence of
your question.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-21 21:42         ` Chris Wilson
@ 2019-02-21 22:53           ` Daniele Ceraolo Spurio
  2019-02-21 23:25             ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-02-21 22:53 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Sundaresan, Sujaritha



On 2/21/19 1:42 PM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-02-21 21:31:45)
>>
>>
>> On 2/21/19 1:17 PM, Chris Wilson wrote:
>>> Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)
>>>>
>>>> <snip>
>>>>
>>>>> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>>>>>          * state. Fortunately, the kernel_context is disposable and we do
>>>>>          * not rely on its state.
>>>>>          */
>>>>> -     if (!i915_terminally_wedged(&i915->gpu_error)) {
>>>>> -             ret = i915_gem_switch_to_kernel_context(i915);
>>>>> -             if (ret)
>>>>> -                     goto err_unlock;
>>>>> -
>>>>> -             ret = i915_gem_wait_for_idle(i915,
>>>>> -                                          I915_WAIT_INTERRUPTIBLE |
>>>>> -                                          I915_WAIT_LOCKED |
>>>>> -                                          I915_WAIT_FOR_IDLE_BOOST,
>>>>> -                                          HZ / 5);
>>>>> -             if (ret == -EINTR)
>>>>> -                     goto err_unlock;
>>>>> -
>>>>> +     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
>>>>>                 i915_gem_set_wedged(i915);
>>>>>         }
>>>>
>>>> GuC-related question: what's your expectation here in regards to GuC
>>>> status? The current i915 flow expect either uc_reset_prepare() or
>>>> uc_suspend() to be called to clean up the guc status, but we're calling
>>>> neither of them here if the switch is successful. Do you expect the
>>>> resume code to always blank out the GuC status before a reload?
>>>
>>> (A few patches later on I propose that we always just do a reset+wedge
>>> on suspend in lieu of hangcheck.)
>>>
>>> On resume, we have to bring the HW up from scratch and do another reset
>>> in the process. Some platforms have been known to survive the trips to
>>> PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
>>> clear the HW state. I expect we would need to force a reset on resume
>>> even for the guc, to be sure we cover all cases such as kexec.
>>> -Chris
>>>
>> More than about the HW state, my question here was about the SW
>> tracking. At which point do we go and stop guc communication and mark
>> guc as not loaded/accessible? e.g. we need to disable and re-enable CT
>> buffers before GuC is reset/suspended to make sure the shared memory
>> area is cleaned correctly (we currently avoid memsetting all of it on
>> reload since it is quite big). Also, communication with GuC is going to
>> increase going forward, so we'll need to make sure we accurately track
>> its state and do all the relevant cleanups.
> 
> Across suspend/resume, we issue a couple of resets and scrub/sanitize our
> state tracking. By the time we load the fw again, both the fw and our
> state should be starting from scratch.
> 
> That all seems unavoidable, so I am not understanding the essence of
> your question.
> -Chris
> 

We're not doing the state scrubbing for guc in all paths at the moment. 
There is logic in gem_suspend_late(), but that doesn't seem to be called 
on all paths; e.g. it isn't when we run 
igt@gem_exec_suspend@basic-s4-devices and that's why Suja's patch moved 
the disabling of communication from uc_sanitize to uc_suspend. The guc 
resume code also doesn't currently clean everything as some of the 
structures (including stuff we allocate for guc usage) are carried over. 
We can either add something more in the cleanup path or go and rework 
the resume to blank everything (which would be time consuming since 
there is tens of MBs involved), but before putting down any code one way 
or another I wanted to understand what the expectation is.

Thanks,
Daniele


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-21 22:53           ` Daniele Ceraolo Spurio
@ 2019-02-21 23:25             ` Chris Wilson
  2019-02-22  0:29               ` Daniele Ceraolo Spurio
  0 siblings, 1 reply; 97+ messages in thread
From: Chris Wilson @ 2019-02-21 23:25 UTC (permalink / raw)
  To: Daniele Ceraolo Spurio, intel-gfx; +Cc: Sundaresan, Sujaritha

Quoting Daniele Ceraolo Spurio (2019-02-21 22:53:41)
> 
> 
> On 2/21/19 1:42 PM, Chris Wilson wrote:
> > Quoting Daniele Ceraolo Spurio (2019-02-21 21:31:45)
> >>
> >>
> >> On 2/21/19 1:17 PM, Chris Wilson wrote:
> >>> Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)
> >>>>
> >>>> <snip>
> >>>>
> >>>>> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
> >>>>>          * state. Fortunately, the kernel_context is disposable and we do
> >>>>>          * not rely on its state.
> >>>>>          */
> >>>>> -     if (!i915_terminally_wedged(&i915->gpu_error)) {
> >>>>> -             ret = i915_gem_switch_to_kernel_context(i915);
> >>>>> -             if (ret)
> >>>>> -                     goto err_unlock;
> >>>>> -
> >>>>> -             ret = i915_gem_wait_for_idle(i915,
> >>>>> -                                          I915_WAIT_INTERRUPTIBLE |
> >>>>> -                                          I915_WAIT_LOCKED |
> >>>>> -                                          I915_WAIT_FOR_IDLE_BOOST,
> >>>>> -                                          HZ / 5);
> >>>>> -             if (ret == -EINTR)
> >>>>> -                     goto err_unlock;
> >>>>> -
> >>>>> +     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
> >>>>>                 i915_gem_set_wedged(i915);
> >>>>>         }
> >>>>
> >>>> GuC-related question: what's your expectation here in regards to GuC
> >>>> status? The current i915 flow expect either uc_reset_prepare() or
> >>>> uc_suspend() to be called to clean up the guc status, but we're calling
> >>>> neither of them here if the switch is successful. Do you expect the
> >>>> resume code to always blank out the GuC status before a reload?
> >>>
> >>> (A few patches later on I propose that we always just do a reset+wedge
> >>> on suspend in lieu of hangcheck.)
> >>>
> >>> On resume, we have to bring the HW up from scratch and do another reset
> >>> in the process. Some platforms have been known to survive the trips to
> >>> PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
> >>> clear the HW state. I expect we would need to force a reset on resume
> >>> even for the guc, to be sure we cover all cases such as kexec.
> >>> -Chris
> >>>
> >> More than about the HW state, my question here was about the SW
> >> tracking. At which point do we go and stop guc communication and mark
> >> guc as not loaded/accessible? e.g. we need to disable and re-enable CT
> >> buffers before GuC is reset/suspended to make sure the shared memory
> >> area is cleaned correctly (we currently avoid memsetting all of it on
> >> reload since it is quite big). Also, communication with GuC is going to
> >> increase going forward, so we'll need to make sure we accurately track
> >> its state and do all the relevant cleanups.
> > 
> > Across suspend/resume, we issue a couple of resets and scrub/sanitize our
> > state tracking. By the time we load the fw again, both the fw and our
> > state should be starting from scratch.
> > 
> > That all seems unavoidable, so I am not understanding the essence of
> > your question.
> > -Chris
> > 
> 
> We're not doing the state scrubbing for guc in all paths at the moment. 
> There is logic in gem_suspend_late(), but that doesn't seem to be called 
> on all paths; e.g. it isn't when we run 
> igt@gem_exec_suspend@basic-s4-devices

Yup, the dummy hibernate code throws a few surprises, and why
i915_gem_sanitize is so fiddly to get right between that and
gem_eio/suspend.

> and that's why Suja's patch moved 
> the disabling of communication from uc_sanitize to uc_suspend.

That should also help as previously it tried to talk to the guc after we
reset it.

> The guc 
> resume code also doesn't currently clean everything as some of the 
> structures (including stuff we allocate for guc usage) are carried over. 
> We can either add something more in the cleanup path or go and rework 
> the resume to blank everything (which would be time consuming since 
> there is tens of MBs involved), but before putting down any code one way 
> or another I wanted to understand what the expectation is.

I may be naive, but my expectations is that we just have to reset the
comm ringbuffer pointers. We shouldn't need to hand the guc pristine
pages, it will zero on allocate when its needs to, surely? We do have to
rebuild the set of clients everytime we load the guc, so that can't be
the issue (as that has to be done on resume, device reset etc today),
although that should only have to be the pinned clients?

We have to restart the comm channels on loading the guc, so what's
changing?

On suspend, hit the device reset & kill guc. On resume, load the guc fw,
restart comm. After fiddling about making sure we are in the right
callpaths, the intent is that resume just looks like a fresh module
load (so we only have to reason about init sequence [nearly] once).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling
  2019-02-21 23:25             ` Chris Wilson
@ 2019-02-22  0:29               ` Daniele Ceraolo Spurio
  0 siblings, 0 replies; 97+ messages in thread
From: Daniele Ceraolo Spurio @ 2019-02-22  0:29 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx; +Cc: Sundaresan, Sujaritha



On 2/21/19 3:25 PM, Chris Wilson wrote:
> Quoting Daniele Ceraolo Spurio (2019-02-21 22:53:41)
>>
>>
>> On 2/21/19 1:42 PM, Chris Wilson wrote:
>>> Quoting Daniele Ceraolo Spurio (2019-02-21 21:31:45)
>>>>
>>>>
>>>> On 2/21/19 1:17 PM, Chris Wilson wrote:
>>>>> Quoting Daniele Ceraolo Spurio (2019-02-21 19:48:01)
>>>>>>
>>>>>> <snip>
>>>>>>
>>>>>>> @@ -4481,19 +4471,7 @@ int i915_gem_suspend(struct drm_i915_private *i915)
>>>>>>>           * state. Fortunately, the kernel_context is disposable and we do
>>>>>>>           * not rely on its state.
>>>>>>>           */
>>>>>>> -     if (!i915_terminally_wedged(&i915->gpu_error)) {
>>>>>>> -             ret = i915_gem_switch_to_kernel_context(i915);
>>>>>>> -             if (ret)
>>>>>>> -                     goto err_unlock;
>>>>>>> -
>>>>>>> -             ret = i915_gem_wait_for_idle(i915,
>>>>>>> -                                          I915_WAIT_INTERRUPTIBLE |
>>>>>>> -                                          I915_WAIT_LOCKED |
>>>>>>> -                                          I915_WAIT_FOR_IDLE_BOOST,
>>>>>>> -                                          HZ / 5);
>>>>>>> -             if (ret == -EINTR)
>>>>>>> -                     goto err_unlock;
>>>>>>> -
>>>>>>> +     if (!switch_to_kernel_context_sync(i915)) { >                   /* Forcibly cancel outstanding work and leave the gpu quiet. */
>>>>>>>                  i915_gem_set_wedged(i915);
>>>>>>>          }
>>>>>>
>>>>>> GuC-related question: what's your expectation here in regards to GuC
>>>>>> status? The current i915 flow expect either uc_reset_prepare() or
>>>>>> uc_suspend() to be called to clean up the guc status, but we're calling
>>>>>> neither of them here if the switch is successful. Do you expect the
>>>>>> resume code to always blank out the GuC status before a reload?
>>>>>
>>>>> (A few patches later on I propose that we always just do a reset+wedge
>>>>> on suspend in lieu of hangcheck.)
>>>>>
>>>>> On resume, we have to bring the HW up from scratch and do another reset
>>>>> in the process. Some platforms have been known to survive the trips to
>>>>> PCI_D3 (someone is lying!) and so we _have_ to do a reset to be sure we
>>>>> clear the HW state. I expect we would need to force a reset on resume
>>>>> even for the guc, to be sure we cover all cases such as kexec.
>>>>> -Chris
>>>>>
>>>> More than about the HW state, my question here was about the SW
>>>> tracking. At which point do we go and stop guc communication and mark
>>>> guc as not loaded/accessible? e.g. we need to disable and re-enable CT
>>>> buffers before GuC is reset/suspended to make sure the shared memory
>>>> area is cleaned correctly (we currently avoid memsetting all of it on
>>>> reload since it is quite big). Also, communication with GuC is going to
>>>> increase going forward, so we'll need to make sure we accurately track
>>>> its state and do all the relevant cleanups.
>>>
>>> Across suspend/resume, we issue a couple of resets and scrub/sanitize our
>>> state tracking. By the time we load the fw again, both the fw and our
>>> state should be starting from scratch.
>>>
>>> That all seems unavoidable, so I am not understanding the essence of
>>> your question.
>>> -Chris
>>>
>>
>> We're not doing the state scrubbing for guc in all paths at the moment.
>> There is logic in gem_suspend_late(), but that doesn't seem to be called
>> on all paths; e.g. it isn't when we run
>> igt@gem_exec_suspend@basic-s4-devices
> 
> Yup, the dummy hibernate code throws a few surprises, and why
> i915_gem_sanitize is so fiddly to get right between that and
> gem_eio/suspend.
> 
>> and that's why Suja's patch moved
>> the disabling of communication from uc_sanitize to uc_suspend.
> 
> That should also help as previously it tried to talk to the guc after we
> reset it.

But only helps if we do call uc_suspend ;). I'm wondering if it ends up 
being better to call it from both places.

> 
>> The guc
>> resume code also doesn't currently clean everything as some of the
>> structures (including stuff we allocate for guc usage) are carried over.
>> We can either add something more in the cleanup path or go and rework
>> the resume to blank everything (which would be time consuming since
>> there is tens of MBs involved), but before putting down any code one way
>> or another I wanted to understand what the expectation is.
> 
> I may be naive, but my expectations is that we just have to reset the
> comm ringbuffer pointers. We shouldn't need to hand the guc pristine
> pages, it will zero on allocate when its needs to, surely? We do have to
> rebuild the set of clients everytime we load the guc, so that can't be
> the issue (as that has to be done on resume, device reset etc today),
> although that should only have to be the pinned clients?

GuC doesn't clean up some of the state stored in the memory we allocate 
for its use. In the specific example of the CT buffers, the registration 
is not automatically cleaned by GuC, it is only cleaned when the 
disable_communication H2G is issued or if we just memset the guc memory. 
This is to allow re-use of the same buffers across resets without having 
to issue an H2G to re-enable them. Similar approach is taken for other 
info (e.g. lrc info required gen11+), again to allow the host to 
seamlessly restart after a reset or suspend/resume. We always need to 
recreate the clients because the doorbells are a HW state and thus they 
can get reset with the guc; the firmware also saves db status in the 
WOPCM rather then in the shared memory for speed, so that does get 
cleaned on reload.

In the current gen11 guc code (which hopefully will hit the ML soon) we 
assumed that uc_suspend would be called on all suspend paths to make 
sure the state in the shared structures was clean, but if it doesn't 
then we'll have to do some tweaks to cope. BTW, we need to add 
uc_reset_prepare() to __i915_gem_set_wedged as well.

Daniele

> 
> We have to restart the comm channels on loading the guc, so what's
> changing?
> 
> On suspend, hit the device reset & kill guc. On resume, load the guc fw,
> restart comm. After fiddling about making sure we are in the right
> callpaths, the intent is that resume just looks like a fresh module
> load (so we only have to reason about init sequence [nearly] once).
> -Chris
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 38/46] drm/i915: Allow a context to define its set of engines
  2019-02-06 13:03 ` [PATCH 38/46] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-02-25 10:41   ` Tvrtko Ursulin
  2019-02-25 10:47     ` Chris Wilson
  0 siblings, 1 reply; 97+ messages in thread
From: Tvrtko Ursulin @ 2019-02-25 10:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 06/02/2019 13:03, Chris Wilson wrote:
> Over the last few years, we have debated how to extend the user API to
> support an increase in the number of engines, that may be sparse and
> even be heterogeneous within a class (not all video decoders created
> equal). We settled on using (class, instance) tuples to identify a
> specific engine, with an API for the user to construct a map of engines
> to capabilities. Into this picture, we then add a challenge of virtual
> engines; one user engine that maps behind the scenes to any number of
> physical engines. To keep it general, we want the user to have full
> control over that mapping. To that end, we allow the user to constrain a
> context to define the set of engines that it can access, order fully
> controlled by the user via (class, instance). With such precise control
> in context setup, we can continue to use the existing execbuf uABI of
> specifying a single index; only now it doesn't automagically map onto
> the engines, it uses the user defined engine map from the context.
> 
> The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
> execbuf. It's use will be revealed in the next patch.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c    | 184 ++++++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h    |   4 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c |  22 ++-
>   include/uapi/drm/i915_drm.h                |  32 ++++
>   4 files changed, 230 insertions(+), 12 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 2e2de0532c08..ad8052235f37 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -95,6 +95,21 @@
>   
>   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
>   
> +static struct intel_engine_cs *
> +lookup_user_engine(struct i915_gem_context *ctx,
> +		   unsigned long flags, u16 class, u16 instance)
> +#define LOOKUP_USER_INDEX BIT(0)
> +{
> +	if (flags & LOOKUP_USER_INDEX) {
> +		if (instance >= ctx->nengine)
> +			return NULL;
> +
> +		return ctx->engines[instance];
> +	}
> +
> +	return intel_engine_lookup_user(ctx->i915, class, instance);
> +}
> +
>   static void lut_close(struct i915_gem_context *ctx)
>   {
>   	struct i915_lut_handle *lut, *ln;
> @@ -218,6 +233,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
>   
> +	kfree(ctx->engines);
> +
>   	for (n = 0; n < ARRAY_SIZE(ctx->__engine); n++) {
>   		struct intel_context *ce = &ctx->__engine[n];
>   
> @@ -1317,9 +1334,9 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	if (user_sseu.flags || user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = intel_engine_lookup_user(i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> +	engine = lookup_user_engine(ctx, 0,
> +				    user_sseu.engine_class,
> +				    user_sseu.engine_instance);
>   	if (!engine)
>   		return -EINVAL;
>   
> @@ -1337,9 +1354,156 @@ static int set_sseu(struct i915_gem_context *ctx,
>   
>   	args->size = sizeof(user_sseu);
>   
> +	return 0;
> +};
> +
> +struct set_engines {
> +	struct i915_gem_context *ctx;
> +	struct intel_engine_cs **engines;
> +	unsigned int nengine;
> +};
> +
> +static const i915_user_extension_fn set_engines__extensions[] = {
> +};
> +
> +static int
> +set_engines(struct i915_gem_context *ctx,
> +	    const struct drm_i915_gem_context_param *args)
> +{
> +	struct i915_context_param_engines __user *user;
> +	struct set_engines set = { .ctx = ctx };
> +	u64 size, extensions;
> +	unsigned int n;
> +	int err;
> +
> +	user = u64_to_user_ptr(args->value);
> +	size = args->size;
> +	if (!size)
> +		goto out;
> +
> +	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
> +	if (size < sizeof(*user) || size % sizeof(*user->class_instance))
> +		return -EINVAL;
> +
> +	set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
> +	if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK)
> +		return -EINVAL;
> +
> +	set.engines = kmalloc_array(set.nengine,
> +				    sizeof(*set.engines),
> +				    GFP_KERNEL);
> +	if (!set.engines)
> +		return -ENOMEM;
> +
> +	for (n = 0; n < set.nengine; n++) {
> +		u16 class, inst;
> +
> +		if (get_user(class, &user->class_instance[n].engine_class) ||
> +		    get_user(inst, &user->class_instance[n].engine_instance)) {
> +			kfree(set.engines);
> +			return -EFAULT;
> +		}
> +
> +		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
> +		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
> +			set.engines[n] = NULL;
> +			continue;
> +		}
> +
> +		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
> +		if (!set.engines[n]) {
> +			kfree(set.engines);
> +			return -ENOENT;
> +		}
> +	}
> +
> +	err = -EFAULT;
> +	if (!get_user(extensions, &user->extensions))
> +		err = i915_user_extensions(u64_to_user_ptr(extensions),
> +					   set_engines__extensions,
> +					   ARRAY_SIZE(set_engines__extensions),
> +					   &set);
> +	if (err) {
> +		kfree(set.engines);
> +		return err;
> +	}
> +
> +out:
> +	mutex_lock(&ctx->i915->drm.struct_mutex);
> +	kfree(ctx->engines);
> +	ctx->engines = set.engines;
> +	ctx->nengine = set.nengine;
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
>   	return 0;
>   }
>   
> +static int
> +get_engines(struct i915_gem_context *ctx,
> +	    struct drm_i915_gem_context_param *args)
> +{
> +	struct i915_context_param_engines *local;
> +	unsigned int n, count, size;
> +	int err;
> +
> +restart:
> +	count = READ_ONCE(ctx->nengine);
> +	if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
> +		return -ENOMEM; /* unrepresentable! */
> +
> +	size = sizeof(*local) + count * sizeof(*local->class_instance);
> +	if (!args->size) {
> +		args->size = size;
> +		return 0;
> +	}
> +	if (args->size < size)
> +		return -EINVAL;
> +
> +	local = kmalloc(size, GFP_KERNEL);
> +	if (!local)
> +		return -ENOMEM;
> +
> +	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
> +		err = -EINTR;
> +		goto err;
> +	}
> +
> +	if (READ_ONCE(ctx->nengine) != count) {
> +		mutex_unlock(&ctx->i915->drm.struct_mutex);
> +		kfree(local);
> +		goto restart;
> +	}
> +
> +	local->extensions = 0;
> +	for (n = 0; n < count; n++) {
> +		if (ctx->engines[n]) {
> +			local->class_instance[n].engine_class =
> +				ctx->engines[n]->uabi_class;
> +			local->class_instance[n].engine_instance =
> +				ctx->engines[n]->instance;
> +		} else {
> +			local->class_instance[n].engine_class =
> +				I915_ENGINE_CLASS_INVALID;
> +			local->class_instance[n].engine_instance =
> +				I915_ENGINE_CLASS_INVALID_NONE;
> +		}
> +	}
> +
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
> +		err = -EFAULT;
> +		goto err;
> +	}
> +

Sripada, Radhakrishna <radhakrishna.sripada@intel.com> reports leakage 
of local on the success path here.

Alternatively, perhaps you could simplify by using stack space, with 
some reasonable max size for future proofing, and a GEM_WARN_ON error 
return or something.

Regards,

Tvrtko

> +	args->size = size;
> +	return 0;
> +
> +err:
> +	kfree(local);
> +	return err;
> +}
> +
>   static int ctx_setparam(struct i915_gem_context *ctx,
>   			struct drm_i915_gem_context_param *args)
>   {
> @@ -1403,6 +1567,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
>   		ret = set_ppgtt(ctx, args);
>   		break;
>   
> +	case I915_CONTEXT_PARAM_ENGINES:
> +		ret = set_engines(ctx, args);
> +		break;
> +
>   	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
> @@ -1535,9 +1703,9 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	if (user_sseu.flags || user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = intel_engine_lookup_user(ctx->i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> +	engine = lookup_user_engine(ctx, 0,
> +				    user_sseu.engine_class,
> +				    user_sseu.engine_instance);
>   	if (!engine)
>   		return -EINVAL;
>   
> @@ -1616,6 +1784,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   		ret = get_ppgtt(ctx, args);
>   		break;
>   
> +	case I915_CONTEXT_PARAM_ENGINES:
> +		ret = get_engines(ctx, args);
> +		break;
> +
>   	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 3bd1faabbc3f..775de1af1b10 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -78,6 +78,8 @@ struct i915_gem_context {
>   	/** file_priv: owning file descriptor */
>   	struct drm_i915_file_private *file_priv;
>   
> +	struct intel_engine_cs **engines;
> +
>   	struct i915_timeline *timeline;
>   
>   	/**
> @@ -146,6 +148,8 @@ struct i915_gem_context {
>   #define CONTEXT_CLOSED			1
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
>   
> +	unsigned int nengine;
> +
>   	/**
>   	 * @hw_id: - unique identifier for the context
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 859625474f58..5052b49f8dcd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -2086,13 +2086,23 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
>   };
>   
>   static struct intel_engine_cs *
> -eb_select_engine(struct drm_i915_private *dev_priv,
> +eb_select_engine(struct i915_execbuffer *eb,
>   		 struct drm_file *file,
>   		 struct drm_i915_gem_execbuffer2 *args)
>   {
>   	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
>   	struct intel_engine_cs *engine;
>   
> +	if (eb->ctx->engines) {
> +		if (user_ring_id >= eb->ctx->nengine) {
> +			DRM_DEBUG("execbuf with unknown ring: %u\n",
> +				  user_ring_id);
> +			return NULL;
> +		}
> +
> +		return eb->ctx->engines[user_ring_id];
> +	}
> +
>   	if (user_ring_id > I915_USER_RINGS) {
>   		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
>   		return NULL;
> @@ -2105,11 +2115,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   		return NULL;
>   	}
>   
> -	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(dev_priv)) {
> +	if (user_ring_id == I915_EXEC_BSD && HAS_BSD2(eb->i915)) {
>   		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
>   
>   		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
> -			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
> +			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
>   		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
>   			   bsd_idx <= I915_EXEC_BSD_RING2) {
>   			bsd_idx >>= I915_EXEC_BSD_SHIFT;
> @@ -2120,9 +2130,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   			return NULL;
>   		}
>   
> -		engine = dev_priv->engine[_VCS(bsd_idx)];
> +		engine = eb->i915->engine[_VCS(bsd_idx)];
>   	} else {
> -		engine = dev_priv->engine[user_ring_map[user_ring_id]];
> +		engine = eb->i915->engine[user_ring_map[user_ring_id]];
>   	}
>   
>   	if (!engine) {
> @@ -2332,7 +2342,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	if (unlikely(err))
>   		goto err_destroy;
>   
> -	eb.engine = eb_select_engine(eb.i915, file, args);
> +	eb.engine = eb_select_engine(&eb, file, args);
>   	if (!eb.engine) {
>   		err = -EINVAL;
>   		goto err_engine;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0c5566b2d244..eb5799fe3868 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -122,6 +122,8 @@ enum drm_i915_gem_engine_class {
>   	I915_ENGINE_CLASS_INVALID	= -1
>   };
>   
> +#define I915_ENGINE_CLASS_INVALID_NONE -1
> +
>   /**
>    * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
>    *
> @@ -1481,6 +1483,27 @@ struct drm_i915_gem_context_param {
>   	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
>   	 */
>   #define I915_CONTEXT_PARAM_VM		0x8
> +
> +/*
> + * I915_CONTEXT_PARAM_ENGINES:
> + *
> + * Bind this context to operate on this subset of available engines. Henceforth,
> + * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
> + * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
> + * and upwards. Slots 0...N are filled in using the specified (class, instance).
> + * Use
> + *	engine_class: I915_ENGINE_CLASS_INVALID,
> + *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
> + * to specify a gap in the array that can be filled in later, e.g. by a
> + * virtual engine used for load balancing.
> + *
> + * Setting the number of engines bound to the context to 0, by passing a zero
> + * sized argument, will revert back to default settings.
> + *
> + * See struct i915_context_param_engines.
> + */
> +#define I915_CONTEXT_PARAM_ENGINES	0x9
> +
>   	__u64 value;
>   };
>   
> @@ -1543,6 +1566,15 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +struct i915_context_param_engines {
> +	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
> +
> +	struct {
> +		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> +		__u16 engine_instance;
> +	} class_instance[0];
> +};
> +
>   struct drm_i915_gem_context_create_ext_setparam {
>   #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
>   	struct i915_user_extension base;
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 38/46] drm/i915: Allow a context to define its set of engines
  2019-02-25 10:41   ` Tvrtko Ursulin
@ 2019-02-25 10:47     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-25 10:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-02-25 10:41:28)
> 
> On 06/02/2019 13:03, Chris Wilson wrote:
> > +static int
> > +get_engines(struct i915_gem_context *ctx,
> > +         struct drm_i915_gem_context_param *args)
> > +{
> > +     struct i915_context_param_engines *local;
> > +     unsigned int n, count, size;
> > +     int err;
> > +
> > +restart:
> > +     count = READ_ONCE(ctx->nengine);
> > +     if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
> > +             return -ENOMEM; /* unrepresentable! */
> > +
> > +     size = sizeof(*local) + count * sizeof(*local->class_instance);
> > +     if (!args->size) {
> > +             args->size = size;
> > +             return 0;
> > +     }
> > +     if (args->size < size)
> > +             return -EINVAL;
> > +
> > +     local = kmalloc(size, GFP_KERNEL);
> > +     if (!local)
> > +             return -ENOMEM;
> > +
> > +     if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
> > +             err = -EINTR;
> > +             goto err;
> > +     }
> > +
> > +     if (READ_ONCE(ctx->nengine) != count) {
> > +             mutex_unlock(&ctx->i915->drm.struct_mutex);
> > +             kfree(local);
> > +             goto restart;
> > +     }
> > +
> > +     local->extensions = 0;
> > +     for (n = 0; n < count; n++) {
> > +             if (ctx->engines[n]) {
> > +                     local->class_instance[n].engine_class =
> > +                             ctx->engines[n]->uabi_class;
> > +                     local->class_instance[n].engine_instance =
> > +                             ctx->engines[n]->instance;
> > +             } else {
> > +                     local->class_instance[n].engine_class =
> > +                             I915_ENGINE_CLASS_INVALID;
> > +                     local->class_instance[n].engine_instance =
> > +                             I915_ENGINE_CLASS_INVALID_NONE;
> > +             }
> > +     }
> > +
> > +     mutex_unlock(&ctx->i915->drm.struct_mutex);
> > +
> > +     if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
> > +             err = -EFAULT;
> > +             goto err;
> > +     }
> > +
> 
> Sripada, Radhakrishna <radhakrishna.sripada@intel.com> reports leakage 
> of local on the success path here.
> 
> Alternatively, perhaps you could simplify by using stack space, with 
> some reasonable max size for future proofing, and a GEM_WARN_ON error 
> return or something.

I don't expect this getter to be called frequently, so resisting the urge
to optimise. And similarly not impose restrictions we may need to lift
later on. For once, kiss.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-06 13:03 ` [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
  2019-02-06 15:56   ` Mika Kuoppala
@ 2019-02-26 19:53   ` Rodrigo Vivi
  2019-02-26 20:27     ` Chris Wilson
  1 sibling, 1 reply; 97+ messages in thread
From: Rodrigo Vivi @ 2019-02-26 19:53 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx, Mika Kuoppala

On Wed, Feb 06, 2019 at 01:03:12PM +0000, Chris Wilson wrote:
> Previously, we were able to rely on the recursive properties of
> struct_mutex to allow us to serialise revoking mmaps and reacquiring the
> FENCE registers with them being clobbered over a global device reset.
> I then proceeded to throw out the baby with the bath water in order to
> pursue a struct_mutex-less reset.
> 
> Perusing LWN for alternative strategies, the dilemma on how to serialise
> access to a global resource on one side was answered by
> https://lwn.net/Articles/202847/ -- Sleepable RCU:
> 
>     1  int readside(void) {
>     2      int idx;
>     3      rcu_read_lock();
>     4	   if (nomoresrcu) {
>     5          rcu_read_unlock();
>     6	       return -EINVAL;
>     7      }
>     8	   idx = srcu_read_lock(&ss);
>     9	   rcu_read_unlock();
>     10	   /* SRCU read-side critical section. */
>     11	   srcu_read_unlock(&ss, idx);
>     12	   return 0;
>     13 }
>     14
>     15 void cleanup(void)
>     16 {
>     17     nomoresrcu = 1;
>     18     synchronize_rcu();
>     19     synchronize_srcu(&ss);
>     20     cleanup_srcu_struct(&ss);
>     21 }
> 
> No more worrying about stop_machine, just an uber-complex mutex,
> optimised for reads, with the overhead pushed to the rare reset path.
> 
> However, we do run the risk of a deadlock as we allocate underneath the
> SRCU read lock, and the allocation may require a GPU reset, causing a
> dependency cycle via the in-flight requests. We resolve that by declaring
> the driver wedged and cancelling all in-flight rendering.
> 
> v2: Use expedited rcu barriers to match our earlier timing
> characteristics.
> v3: Try to annotate locking contexts for sparse
> v4: Reduce selftest lock duration to avoid a reset deadlock with fences
> v5: s/srcu/reset_backoff_srcu/
> 
> Testcase: igt/gem_mmap_gtt/hang
> Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")

Hi Chris,

this patch didn't applied cleanly on dinf
so I noticed that I could also get

115ff80a97cf ("drm/i915: Defer removing fence register tracking to rpm wakeup")

so that applies, but then I noticed that there's a Fixes of
this patch here:

de3a87cf6352 ("drm/i915: Recursive i915_reset_trylock() verboten")

It seems sane to pick 3 of them, but I'd like to confirm with you first.

Thoughts?

Thanks in advance,
Rodrigo.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c           |  12 +-
>  drivers/gpu/drm/i915/i915_drv.h               |  18 +--
>  drivers/gpu/drm/i915/i915_gem.c               |  56 +++------
>  drivers/gpu/drm/i915/i915_gem_fence_reg.c     |  31 +----
>  drivers/gpu/drm/i915/i915_gpu_error.h         |  12 +-
>  drivers/gpu/drm/i915/i915_reset.c             | 109 +++++++++++-------
>  drivers/gpu/drm/i915/i915_reset.h             |   4 +
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  |   5 +-
>  .../gpu/drm/i915/selftests/mock_gem_device.c  |   1 +
>  9 files changed, 109 insertions(+), 139 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 0bd890c04fe4..a6fd157b1637 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1281,14 +1281,11 @@ static int i915_hangcheck_info(struct seq_file *m, void *unused)
>  	intel_wakeref_t wakeref;
>  	enum intel_engine_id id;
>  
> +	seq_printf(m, "Reset flags: %lx\n", dev_priv->gpu_error.flags);
>  	if (test_bit(I915_WEDGED, &dev_priv->gpu_error.flags))
> -		seq_puts(m, "Wedged\n");
> +		seq_puts(m, "\tWedged\n");
>  	if (test_bit(I915_RESET_BACKOFF, &dev_priv->gpu_error.flags))
> -		seq_puts(m, "Reset in progress: struct_mutex backoff\n");
> -	if (waitqueue_active(&dev_priv->gpu_error.wait_queue))
> -		seq_puts(m, "Waiter holding struct mutex\n");
> -	if (waitqueue_active(&dev_priv->gpu_error.reset_queue))
> -		seq_puts(m, "struct_mutex blocked for reset\n");
> +		seq_puts(m, "\tDevice (global) reset in progress\n");
>  
>  	if (!i915_modparams.enable_hangcheck) {
>  		seq_puts(m, "Hangcheck disabled\n");
> @@ -3885,9 +3882,6 @@ i915_wedged_set(void *data, u64 val)
>  	 * while it is writing to 'i915_wedged'
>  	 */
>  
> -	if (i915_reset_backoff(&i915->gpu_error))
> -		return -EAGAIN;
> -
>  	i915_handle_error(i915, val, I915_ERROR_CAPTURE,
>  			  "Manually set wedged engine mask = %llx", val);
>  	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a2293152cb6a..37230ae7fbe6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2989,7 +2989,12 @@ i915_gem_obj_finish_shmem_access(struct drm_i915_gem_object *obj)
>  	i915_gem_object_unpin_pages(obj);
>  }
>  
> -int __must_check i915_mutex_lock_interruptible(struct drm_device *dev);
> +static inline int __must_check
> +i915_mutex_lock_interruptible(struct drm_device *dev)
> +{
> +	return mutex_lock_interruptible(&dev->struct_mutex);
> +}
> +
>  int i915_gem_dumb_create(struct drm_file *file_priv,
>  			 struct drm_device *dev,
>  			 struct drm_mode_create_dumb *args);
> @@ -3006,21 +3011,11 @@ int __must_check i915_gem_set_global_seqno(struct drm_device *dev, u32 seqno);
>  struct i915_request *
>  i915_gem_find_active_request(struct intel_engine_cs *engine);
>  
> -static inline bool i915_reset_backoff(struct i915_gpu_error *error)
> -{
> -	return unlikely(test_bit(I915_RESET_BACKOFF, &error->flags));
> -}
> -
>  static inline bool i915_terminally_wedged(struct i915_gpu_error *error)
>  {
>  	return unlikely(test_bit(I915_WEDGED, &error->flags));
>  }
>  
> -static inline bool i915_reset_backoff_or_wedged(struct i915_gpu_error *error)
> -{
> -	return i915_reset_backoff(error) | i915_terminally_wedged(error);
> -}
> -
>  static inline u32 i915_reset_count(struct i915_gpu_error *error)
>  {
>  	return READ_ONCE(error->reset_count);
> @@ -3093,7 +3088,6 @@ struct drm_i915_fence_reg *
>  i915_reserve_fence(struct drm_i915_private *dev_priv);
>  void i915_unreserve_fence(struct drm_i915_fence_reg *fence);
>  
> -void i915_gem_revoke_fences(struct drm_i915_private *dev_priv);
>  void i915_gem_restore_fences(struct drm_i915_private *dev_priv);
>  
>  void i915_gem_detect_bit_6_swizzle(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 05ce9176ac4e..1eb3a5f8654c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -100,47 +100,6 @@ static void i915_gem_info_remove_obj(struct drm_i915_private *dev_priv,
>  	spin_unlock(&dev_priv->mm.object_stat_lock);
>  }
>  
> -static int
> -i915_gem_wait_for_error(struct i915_gpu_error *error)
> -{
> -	int ret;
> -
> -	might_sleep();
> -
> -	/*
> -	 * Only wait 10 seconds for the gpu reset to complete to avoid hanging
> -	 * userspace. If it takes that long something really bad is going on and
> -	 * we should simply try to bail out and fail as gracefully as possible.
> -	 */
> -	ret = wait_event_interruptible_timeout(error->reset_queue,
> -					       !i915_reset_backoff(error),
> -					       I915_RESET_TIMEOUT);
> -	if (ret == 0) {
> -		DRM_ERROR("Timed out waiting for the gpu reset to complete\n");
> -		return -EIO;
> -	} else if (ret < 0) {
> -		return ret;
> -	} else {
> -		return 0;
> -	}
> -}
> -
> -int i915_mutex_lock_interruptible(struct drm_device *dev)
> -{
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	int ret;
> -
> -	ret = i915_gem_wait_for_error(&dev_priv->gpu_error);
> -	if (ret)
> -		return ret;
> -
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		return ret;
> -
> -	return 0;
> -}
> -
>  static u32 __i915_gem_park(struct drm_i915_private *i915)
>  {
>  	intel_wakeref_t wakeref;
> @@ -1869,6 +1828,7 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  	intel_wakeref_t wakeref;
>  	struct i915_vma *vma;
>  	pgoff_t page_offset;
> +	int srcu;
>  	int ret;
>  
>  	/* Sanity check that we allow writing into this object */
> @@ -1908,7 +1868,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  		goto err_unlock;
>  	}
>  
> -
>  	/* Now pin it into the GTT as needed */
>  	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
>  				       PIN_MAPPABLE |
> @@ -1946,9 +1905,15 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  	if (ret)
>  		goto err_unpin;
>  
> +	srcu = i915_reset_trylock(dev_priv);
> +	if (srcu < 0) {
> +		ret = srcu;
> +		goto err_unpin;
> +	}
> +
>  	ret = i915_vma_pin_fence(vma);
>  	if (ret)
> -		goto err_unpin;
> +		goto err_reset;
>  
>  	/* Finally, remap it using the new GTT offset */
>  	ret = remap_io_mapping(area,
> @@ -1969,6 +1934,8 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>  
>  err_fence:
>  	i915_vma_unpin_fence(vma);
> +err_reset:
> +	i915_reset_unlock(dev_priv, srcu);
>  err_unpin:
>  	__i915_vma_unpin(vma);
>  err_unlock:
> @@ -5326,6 +5293,7 @@ int i915_gem_init_early(struct drm_i915_private *dev_priv)
>  	init_waitqueue_head(&dev_priv->gpu_error.wait_queue);
>  	init_waitqueue_head(&dev_priv->gpu_error.reset_queue);
>  	mutex_init(&dev_priv->gpu_error.wedge_mutex);
> +	init_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
>  
>  	atomic_set(&dev_priv->mm.bsd_engine_dispatch_index, 0);
>  
> @@ -5358,6 +5326,8 @@ void i915_gem_cleanup_early(struct drm_i915_private *dev_priv)
>  	GEM_BUG_ON(atomic_read(&dev_priv->mm.free_count));
>  	WARN_ON(dev_priv->mm.object_count);
>  
> +	cleanup_srcu_struct(&dev_priv->gpu_error.reset_backoff_srcu);
> +
>  	kmem_cache_destroy(dev_priv->priorities);
>  	kmem_cache_destroy(dev_priv->dependencies);
>  	kmem_cache_destroy(dev_priv->requests);
> diff --git a/drivers/gpu/drm/i915/i915_gem_fence_reg.c b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> index e037e94792f3..36d548fa3aa2 100644
> --- a/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> +++ b/drivers/gpu/drm/i915/i915_gem_fence_reg.c
> @@ -240,6 +240,10 @@ static int fence_update(struct drm_i915_fence_reg *fence,
>  		i915_vma_flush_writes(old);
>  	}
>  
> +	ret = i915_reset_trylock(fence->i915);
> +	if (ret < 0)
> +		return ret;
> +
>  	if (fence->vma && fence->vma != vma) {
>  		/* Ensure that all userspace CPU access is completed before
>  		 * stealing the fence.
> @@ -272,6 +276,7 @@ static int fence_update(struct drm_i915_fence_reg *fence,
>  		list_move_tail(&fence->link, &fence->i915->mm.fence_list);
>  	}
>  
> +	i915_reset_unlock(fence->i915, ret);
>  	return 0;
>  }
>  
> @@ -435,32 +440,6 @@ void i915_unreserve_fence(struct drm_i915_fence_reg *fence)
>  	list_add(&fence->link, &fence->i915->mm.fence_list);
>  }
>  
> -/**
> - * i915_gem_revoke_fences - revoke fence state
> - * @dev_priv: i915 device private
> - *
> - * Removes all GTT mmappings via the fence registers. This forces any user
> - * of the fence to reacquire that fence before continuing with their access.
> - * One use is during GPU reset where the fence register is lost and we need to
> - * revoke concurrent userspace access via GTT mmaps until the hardware has been
> - * reset and the fence registers have been restored.
> - */
> -void i915_gem_revoke_fences(struct drm_i915_private *dev_priv)
> -{
> -	int i;
> -
> -	lockdep_assert_held(&dev_priv->drm.struct_mutex);
> -
> -	for (i = 0; i < dev_priv->num_fence_regs; i++) {
> -		struct drm_i915_fence_reg *fence = &dev_priv->fence_regs[i];
> -
> -		GEM_BUG_ON(fence->vma && fence->vma->fence != fence);
> -
> -		if (fence->vma)
> -			i915_vma_revoke_mmap(fence->vma);
> -	}
> -}
> -
>  /**
>   * i915_gem_restore_fences - restore fence state
>   * @dev_priv: i915 device private
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index 53b1f22dd365..d5c58e82508b 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -231,12 +231,10 @@ struct i915_gpu_error {
>  	/**
>  	 * flags: Control various stages of the GPU reset
>  	 *
> -	 * #I915_RESET_BACKOFF - When we start a reset, we want to stop any
> -	 * other users acquiring the struct_mutex. To do this we set the
> -	 * #I915_RESET_BACKOFF bit in the error flags when we detect a reset
> -	 * and then check for that bit before acquiring the struct_mutex (in
> -	 * i915_mutex_lock_interruptible()?). I915_RESET_BACKOFF serves a
> -	 * secondary role in preventing two concurrent global reset attempts.
> +	 * #I915_RESET_BACKOFF - When we start a global reset, we need to
> +	 * serialise with any other users attempting to do the same, and
> +	 * any global resources that may be clobber by the reset (such as
> +	 * FENCE registers).
>  	 *
>  	 * #I915_RESET_ENGINE[num_engines] - Since the driver doesn't need to
>  	 * acquire the struct_mutex to reset an engine, we need an explicit
> @@ -272,6 +270,8 @@ struct i915_gpu_error {
>  	 */
>  	wait_queue_head_t reset_queue;
>  
> +	struct srcu_struct reset_backoff_srcu;
> +
>  	struct i915_gpu_restart *restart;
>  };
>  
> diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
> index 0e0ddf2e6815..272d00d4b8a3 100644
> --- a/drivers/gpu/drm/i915/i915_reset.c
> +++ b/drivers/gpu/drm/i915/i915_reset.c
> @@ -639,6 +639,31 @@ static void reset_prepare_engine(struct intel_engine_cs *engine)
>  	engine->reset.prepare(engine);
>  }
>  
> +static void revoke_mmaps(struct drm_i915_private *i915)
> +{
> +	int i;
> +
> +	for (i = 0; i < i915->num_fence_regs; i++) {
> +		struct i915_vma *vma = i915->fence_regs[i].vma;
> +		struct drm_vma_offset_node *node;
> +		u64 vma_offset;
> +
> +		if (!vma)
> +			continue;
> +
> +		GEM_BUG_ON(vma->fence != &i915->fence_regs[i]);
> +		if (!i915_vma_has_userfault(vma))
> +			continue;
> +
> +		node = &vma->obj->base.vma_node;
> +		vma_offset = vma->ggtt_view.partial.offset << PAGE_SHIFT;
> +		unmap_mapping_range(i915->drm.anon_inode->i_mapping,
> +				    drm_vma_node_offset_addr(node) + vma_offset,
> +				    vma->size,
> +				    1);
> +	}
> +}
> +
>  static void reset_prepare(struct drm_i915_private *i915)
>  {
>  	struct intel_engine_cs *engine;
> @@ -648,6 +673,7 @@ static void reset_prepare(struct drm_i915_private *i915)
>  		reset_prepare_engine(engine);
>  
>  	intel_uc_sanitize(i915);
> +	revoke_mmaps(i915);
>  }
>  
>  static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
> @@ -911,50 +937,22 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
>  	return ret;
>  }
>  
> -struct __i915_reset {
> -	struct drm_i915_private *i915;
> -	unsigned int stalled_mask;
> -};
> -
> -static int __i915_reset__BKL(void *data)
> -{
> -	struct __i915_reset *arg = data;
> -	int err;
> -
> -	err = intel_gpu_reset(arg->i915, ALL_ENGINES);
> -	if (err)
> -		return err;
> -
> -	return gt_reset(arg->i915, arg->stalled_mask);
> -}
> -
> -#if RESET_UNDER_STOP_MACHINE
> -/*
> - * XXX An alternative to using stop_machine would be to park only the
> - * processes that have a GGTT mmap. By remote parking the threads (SIGSTOP)
> - * we should be able to prevent their memmory accesses via the lost fence
> - * registers over the course of the reset without the potential recursive
> - * of mutexes between the pagefault handler and reset.
> - *
> - * See igt/gem_mmap_gtt/hang
> - */
> -#define __do_reset(fn, arg) stop_machine(fn, arg, NULL)
> -#else
> -#define __do_reset(fn, arg) fn(arg)
> -#endif
> -
>  static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
>  {
> -	struct __i915_reset arg = { i915, stalled_mask };
>  	int err, i;
>  
> -	err = __do_reset(__i915_reset__BKL, &arg);
> +	/* Flush everyone currently using a resource about to be clobbered */
> +	synchronize_srcu(&i915->gpu_error.reset_backoff_srcu);
> +
> +	err = intel_gpu_reset(i915, ALL_ENGINES);
>  	for (i = 0; err && i < RESET_MAX_RETRIES; i++) {
> -		msleep(100);
> -		err = __do_reset(__i915_reset__BKL, &arg);
> +		msleep(10 * (i + 1));
> +		err = intel_gpu_reset(i915, ALL_ENGINES);
>  	}
> +	if (err)
> +		return err;
>  
> -	return err;
> +	return gt_reset(i915, stalled_mask);
>  }
>  
>  /**
> @@ -966,8 +964,6 @@ static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
>   * Reset the chip.  Useful if a hang is detected. Marks the device as wedged
>   * on failure.
>   *
> - * Caller must hold the struct_mutex.
> - *
>   * Procedure is fairly simple:
>   *   - reset the chip using the reset reg
>   *   - re-init context state
> @@ -1274,9 +1270,12 @@ void i915_handle_error(struct drm_i915_private *i915,
>  		wait_event(i915->gpu_error.reset_queue,
>  			   !test_bit(I915_RESET_BACKOFF,
>  				     &i915->gpu_error.flags));
> -		goto out;
> +		goto out; /* piggy-back on the other reset */
>  	}
>  
> +	/* Make sure i915_reset_trylock() sees the I915_RESET_BACKOFF */
> +	synchronize_rcu_expedited();
> +
>  	/* Prevent any other reset-engine attempt. */
>  	for_each_engine(engine, i915, tmp) {
>  		while (test_and_set_bit(I915_RESET_ENGINE + engine->id,
> @@ -1300,6 +1299,36 @@ void i915_handle_error(struct drm_i915_private *i915,
>  	intel_runtime_pm_put(i915, wakeref);
>  }
>  
> +int i915_reset_trylock(struct drm_i915_private *i915)
> +{
> +	struct i915_gpu_error *error = &i915->gpu_error;
> +	int srcu;
> +
> +	rcu_read_lock();
> +	while (test_bit(I915_RESET_BACKOFF, &error->flags)) {
> +		rcu_read_unlock();
> +
> +		if (wait_event_interruptible(error->reset_queue,
> +					     !test_bit(I915_RESET_BACKOFF,
> +						       &error->flags)))
> +			return -EINTR;
> +
> +		rcu_read_lock();
> +	}
> +	srcu = srcu_read_lock(&error->reset_backoff_srcu);
> +	rcu_read_unlock();
> +
> +	return srcu;
> +}
> +
> +void i915_reset_unlock(struct drm_i915_private *i915, int tag)
> +__releases(&i915->gpu_error.reset_backoff_srcu)
> +{
> +	struct i915_gpu_error *error = &i915->gpu_error;
> +
> +	srcu_read_unlock(&error->reset_backoff_srcu, tag);
> +}
> +
>  bool i915_reset_flush(struct drm_i915_private *i915)
>  {
>  	int err;
> diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
> index f2d347f319df..893c5d1c2eb8 100644
> --- a/drivers/gpu/drm/i915/i915_reset.h
> +++ b/drivers/gpu/drm/i915/i915_reset.h
> @@ -9,6 +9,7 @@
>  
>  #include <linux/compiler.h>
>  #include <linux/types.h>
> +#include <linux/srcu.h>
>  
>  struct drm_i915_private;
>  struct intel_engine_cs;
> @@ -32,6 +33,9 @@ int i915_reset_engine(struct intel_engine_cs *engine,
>  void i915_reset_request(struct i915_request *rq, bool guilty);
>  bool i915_reset_flush(struct drm_i915_private *i915);
>  
> +int __must_check i915_reset_trylock(struct drm_i915_private *i915);
> +void i915_reset_unlock(struct drm_i915_private *i915, int tag);
> +
>  bool intel_has_gpu_reset(struct drm_i915_private *i915);
>  bool intel_has_reset_engine(struct drm_i915_private *i915);
>  
> diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> index 7b6f3bea9ef8..4886fac12628 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
> @@ -1039,8 +1039,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  
>  	/* Check that we can recover an unbind stuck on a hanging request */
>  
> -	igt_global_reset_lock(i915);
> -
>  	mutex_lock(&i915->drm.struct_mutex);
>  	err = hang_init(&h, i915);
>  	if (err)
> @@ -1138,7 +1136,9 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  	}
>  
>  out_reset:
> +	igt_global_reset_lock(i915);
>  	fake_hangcheck(rq->i915, intel_engine_flag(rq->engine));
> +	igt_global_reset_unlock(i915);
>  
>  	if (tsk) {
>  		struct igt_wedge_me w;
> @@ -1159,7 +1159,6 @@ static int __igt_reset_evict_vma(struct drm_i915_private *i915,
>  	hang_fini(&h);
>  unlock:
>  	mutex_unlock(&i915->drm.struct_mutex);
> -	igt_global_reset_unlock(i915);
>  
>  	if (i915_terminally_wedged(&i915->gpu_error))
>  		return -EIO;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_gem_device.c b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> index 14ae46fda49f..fc516a2970f4 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_gem_device.c
> @@ -189,6 +189,7 @@ struct drm_i915_private *mock_gem_device(void)
>  
>  	init_waitqueue_head(&i915->gpu_error.wait_queue);
>  	init_waitqueue_head(&i915->gpu_error.reset_queue);
> +	init_srcu_struct(&i915->gpu_error.reset_backoff_srcu);
>  	mutex_init(&i915->gpu_error.wedge_mutex);
>  
>  	i915->wq = alloc_ordered_workqueue("mock", 0);
> -- 
> 2.20.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

* Re: [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset
  2019-02-26 19:53   ` Rodrigo Vivi
@ 2019-02-26 20:27     ` Chris Wilson
  0 siblings, 0 replies; 97+ messages in thread
From: Chris Wilson @ 2019-02-26 20:27 UTC (permalink / raw)
  To: Rodrigo Vivi; +Cc: intel-gfx, Mika Kuoppala

Quoting Rodrigo Vivi (2019-02-26 19:53:47)
> On Wed, Feb 06, 2019 at 01:03:12PM +0000, Chris Wilson wrote:
> > Previously, we were able to rely on the recursive properties of
> > struct_mutex to allow us to serialise revoking mmaps and reacquiring the
> > FENCE registers with them being clobbered over a global device reset.
> > I then proceeded to throw out the baby with the bath water in order to
> > pursue a struct_mutex-less reset.
> > 
> > Perusing LWN for alternative strategies, the dilemma on how to serialise
> > access to a global resource on one side was answered by
> > https://lwn.net/Articles/202847/ -- Sleepable RCU:
> > 
> >     1  int readside(void) {
> >     2      int idx;
> >     3      rcu_read_lock();
> >     4    if (nomoresrcu) {
> >     5          rcu_read_unlock();
> >     6        return -EINVAL;
> >     7      }
> >     8    idx = srcu_read_lock(&ss);
> >     9    rcu_read_unlock();
> >     10           /* SRCU read-side critical section. */
> >     11           srcu_read_unlock(&ss, idx);
> >     12           return 0;
> >     13 }
> >     14
> >     15 void cleanup(void)
> >     16 {
> >     17     nomoresrcu = 1;
> >     18     synchronize_rcu();
> >     19     synchronize_srcu(&ss);
> >     20     cleanup_srcu_struct(&ss);
> >     21 }
> > 
> > No more worrying about stop_machine, just an uber-complex mutex,
> > optimised for reads, with the overhead pushed to the rare reset path.
> > 
> > However, we do run the risk of a deadlock as we allocate underneath the
> > SRCU read lock, and the allocation may require a GPU reset, causing a
> > dependency cycle via the in-flight requests. We resolve that by declaring
> > the driver wedged and cancelling all in-flight rendering.
> > 
> > v2: Use expedited rcu barriers to match our earlier timing
> > characteristics.
> > v3: Try to annotate locking contexts for sparse
> > v4: Reduce selftest lock duration to avoid a reset deadlock with fences
> > v5: s/srcu/reset_backoff_srcu/
> > 
> > Testcase: igt/gem_mmap_gtt/hang
> > Fixes: eb8d0f5af4ec ("drm/i915: Remove GPU reset dependence on struct_mutex")
> 
> Hi Chris,
> 
> this patch didn't applied cleanly on dinf
> so I noticed that I could also get
> 
> 115ff80a97cf ("drm/i915: Defer removing fence register tracking to rpm wakeup")
> 
> so that applies, but then I noticed that there's a Fixes of
> this patch here:
> 
> de3a87cf6352 ("drm/i915: Recursive i915_reset_trylock() verboten")
> 
> It seems sane to pick 3 of them, but I'd like to confirm with you first.
> 
> Thoughts?

There's a at least one later patch as well. I think the safest course of
option is to ignore these fixups, as the glitch is very rare (after a
gpu hang as well), only temporary and does not impact stability or security
(they can only see their own buffers/data in a slightly different order).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 97+ messages in thread

end of thread, other threads:[~2019-02-26 20:27 UTC | newest]

Thread overview: 97+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-02-06 13:03 The road to load balancing Chris Wilson
2019-02-06 13:03 ` [PATCH 01/46] drm/i915: Hack and slash, throttle execbuffer hogs Chris Wilson
2019-02-06 13:03 ` [PATCH 02/46] drm/i915: Revoke mmaps and prevent access to fence registers across reset Chris Wilson
2019-02-06 15:56   ` Mika Kuoppala
2019-02-06 16:08     ` Chris Wilson
2019-02-06 16:18       ` Chris Wilson
2019-02-26 19:53   ` Rodrigo Vivi
2019-02-26 20:27     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 03/46] drm/i915: Force the GPU reset upon wedging Chris Wilson
2019-02-06 13:03 ` [PATCH 04/46] drm/i915: Uninterruptibly drain the timelines on unwedging Chris Wilson
2019-02-06 13:03 ` [PATCH 05/46] drm/i915: Wait for old resets before applying debugfs/i915_wedged Chris Wilson
2019-02-06 13:03 ` [PATCH 06/46] drm/i915: Serialise resets with wedging Chris Wilson
2019-02-06 13:03 ` [PATCH 07/46] drm/i915: Don't claim an unstarted request was guilty Chris Wilson
2019-02-06 13:03 ` [PATCH 08/46] drm/i915/execlists: Suppress mere WAIT preemption Chris Wilson
2019-02-11 11:19   ` Tvrtko Ursulin
2019-02-19 10:22   ` Matthew Auld
2019-02-19 10:34     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 09/46] drm/i915/execlists: Suppress redundant preemption Chris Wilson
2019-02-06 13:03 ` [PATCH 10/46] drm/i915: Make request allocation caches global Chris Wilson
2019-02-11 11:43   ` Tvrtko Ursulin
2019-02-11 12:40     ` Chris Wilson
2019-02-11 17:02       ` Tvrtko Ursulin
2019-02-12 11:51         ` Chris Wilson
2019-02-06 13:03 ` [PATCH 11/46] drm/i915: Keep timeline HWSP allocated until idle across the system Chris Wilson
2019-02-06 13:03 ` [PATCH 12/46] drm/i915/execlists: Refactor out can_merge_rq() Chris Wilson
2019-02-06 13:03 ` [PATCH 13/46] drm/i915: Compute the global scheduler caps Chris Wilson
2019-02-11 12:24   ` Tvrtko Ursulin
2019-02-11 12:33     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 14/46] drm/i915: Use HW semaphores for inter-engine synchronisation on gen8+ Chris Wilson
2019-02-06 13:03 ` [PATCH 15/46] drm/i915: Prioritise non-busywait semaphore workloads Chris Wilson
2019-02-06 13:03 ` [PATCH 16/46] drm/i915: Show support for accurate sw PMU busyness tracking Chris Wilson
2019-02-06 13:03 ` [PATCH 17/46] drm/i915: Apply rps waitboosting for dma_fence_wait_timeout() Chris Wilson
2019-02-11 18:06   ` Tvrtko Ursulin
2019-02-06 13:03 ` [PATCH 18/46] drm/i915: Replace global_seqno with a hangcheck heartbeat seqno Chris Wilson
2019-02-11 12:40   ` Tvrtko Ursulin
2019-02-11 12:44     ` Chris Wilson
2019-02-11 16:56       ` Tvrtko Ursulin
2019-02-12 13:36         ` Chris Wilson
2019-02-06 13:03 ` [PATCH 19/46] drm/i915/pmu: Always sample an active ringbuffer Chris Wilson
2019-02-11 18:18   ` Tvrtko Ursulin
2019-02-12 13:40     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 20/46] drm/i915: Remove access to global seqno in the HWSP Chris Wilson
2019-02-11 18:22   ` Tvrtko Ursulin
2019-02-06 13:03 ` [PATCH 21/46] drm/i915: Remove i915_request.global_seqno Chris Wilson
2019-02-11 18:44   ` Tvrtko Ursulin
2019-02-12 13:45     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 22/46] drm/i915: Force GPU idle on suspend Chris Wilson
2019-02-06 13:03 ` [PATCH 23/46] drm/i915/selftests: Improve switch-to-kernel-context checking Chris Wilson
2019-02-06 13:03 ` [PATCH 24/46] drm/i915: Do a synchronous switch-to-kernel-context on idling Chris Wilson
2019-02-21 19:48   ` Daniele Ceraolo Spurio
2019-02-21 21:17     ` Chris Wilson
2019-02-21 21:31       ` Daniele Ceraolo Spurio
2019-02-21 21:42         ` Chris Wilson
2019-02-21 22:53           ` Daniele Ceraolo Spurio
2019-02-21 23:25             ` Chris Wilson
2019-02-22  0:29               ` Daniele Ceraolo Spurio
2019-02-06 13:03 ` [PATCH 25/46] drm/i915: Store the BIT(engine->id) as the engine's mask Chris Wilson
2019-02-11 18:51   ` Tvrtko Ursulin
2019-02-12 13:51     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 26/46] drm/i915: Refactor common code to load initial power context Chris Wilson
2019-02-06 13:03 ` [PATCH 27/46] drm/i915: Reduce presumption of request ordering for barriers Chris Wilson
2019-02-06 13:03 ` [PATCH 28/46] drm/i915: Remove has-kernel-context Chris Wilson
2019-02-06 13:03 ` [PATCH 29/46] drm/i915: Introduce the i915_user_extension_method Chris Wilson
2019-02-11 19:00   ` Tvrtko Ursulin
2019-02-12 13:56     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 30/46] drm/i915: Track active engines within a context Chris Wilson
2019-02-11 19:11   ` Tvrtko Ursulin
2019-02-12 13:59     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 31/46] drm/i915: Introduce a context barrier callback Chris Wilson
2019-02-06 13:03 ` [PATCH 32/46] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
2019-02-12 11:18   ` Tvrtko Ursulin
2019-02-12 14:11     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 33/46] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
2019-02-12 13:43   ` Tvrtko Ursulin
2019-02-06 13:03 ` [PATCH 34/46] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
2019-02-06 13:03 ` [PATCH 35/46] drm/i915: Fix I915_EXEC_RING_MASK Chris Wilson
2019-02-06 13:03 ` [PATCH 36/46] drm/i915: Remove last traces of exec-id (GEM_BUSY) Chris Wilson
2019-02-06 13:03 ` [PATCH 37/46] drm/i915: Re-arrange execbuf so context is known before engine Chris Wilson
2019-02-06 13:03 ` [PATCH 38/46] drm/i915: Allow a context to define its set of engines Chris Wilson
2019-02-25 10:41   ` Tvrtko Ursulin
2019-02-25 10:47     ` Chris Wilson
2019-02-06 13:03 ` [PATCH 39/46] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
2019-02-06 13:03 ` [PATCH 40/46] drm/i915: Pass around the intel_context Chris Wilson
2019-02-06 13:03 ` [PATCH 41/46] drm/i915: Split struct intel_context definition to its own header Chris Wilson
2019-02-06 13:03 ` [PATCH 42/46] drm/i915: Move over to intel_context_lookup() Chris Wilson
2019-02-06 14:27   ` [PATCH] " Chris Wilson
2019-02-06 13:03 ` [PATCH 43/46] drm/i915: Load balancing across a virtual engine Chris Wilson
2019-02-06 13:03 ` [PATCH 44/46] drm/i915: Extend execution fence to support a callback Chris Wilson
2019-02-06 13:03 ` [PATCH 45/46] drm/i915/execlists: Virtual engine bonding Chris Wilson
2019-02-06 13:03 ` [PATCH 46/46] drm/i915: Allow specification of parallel execbuf Chris Wilson
2019-02-06 13:52 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs Patchwork
2019-02-06 14:09 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-02-06 14:11 ` ✗ Fi.CI.SPARSE: warning " Patchwork
2019-02-06 14:37 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/46] drm/i915: Hack and slash, throttle execbuffer hogs (rev2) Patchwork
2019-02-06 14:55 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-02-06 14:56 ` ✓ Fi.CI.BAT: success " Patchwork
2019-02-06 16:18 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.