[PATCH 01/18] drm/i915/selftests: Provide stub reset functions

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 01/18] drm/i915/selftests: Provide stub reset functions
@ 2019-03-19 11:57 Chris Wilson
  2019-03-19 11:57 ` [PATCH 02/18] drm/i915: Flush pages on acquisition Chris Wilson
                   ` (23 more replies)
  0 siblings, 24 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

If a test fails, we quite often mark the device as wedged. Provide the
stub functions so that we can wedge the mock device, and avoid exploding
on test failures.

Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=109981
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/selftests/mock_engine.c | 36 ++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 639d36eb904a..61744819172b 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -198,6 +198,37 @@ static void mock_submit_request(struct i915_request *request)
 	spin_unlock_irqrestore(&engine->hw_lock, flags);
 }
 
+static void mock_reset_prepare(struct intel_engine_cs *engine)
+{
+}
+
+static void mock_reset(struct intel_engine_cs *engine, bool stalled)
+{
+	GEM_BUG_ON(stalled);
+}
+
+static void mock_reset_finish(struct intel_engine_cs *engine)
+{
+}
+
+static void mock_cancel_requests(struct intel_engine_cs *engine)
+{
+	struct i915_request *request;
+	unsigned long flags;
+
+	spin_lock_irqsave(&engine->timeline.lock, flags);
+
+	/* Mark all submitted requests as skipped. */
+	list_for_each_entry(request, &engine->timeline.requests, sched.link) {
+		if (!i915_request_signaled(request))
+			dma_fence_set_error(&request->fence, -EIO);
+
+		i915_request_mark_complete(request);
+	}
+
+	spin_unlock_irqrestore(&engine->timeline.lock, flags);
+}
+
 struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 				    const char *name,
 				    int id)
@@ -223,6 +254,11 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.emit_fini_breadcrumb = mock_emit_breadcrumb;
 	engine->base.submit_request = mock_submit_request;
 
+	engine->base.reset.prepare = mock_reset_prepare;
+	engine->base.reset.reset = mock_reset;
+	engine->base.reset.finish = mock_reset_finish;
+	engine->base.cancel_requests = mock_cancel_requests;
+
 	if (i915_timeline_init(i915,
 			       &engine->base.timeline,
 			       engine->base.name,
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 11:41   ` Matthew Auld
  2019-03-19 11:57 ` [PATCH 03/18] drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h Chris Wilson
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

When we return pages to the system, we ensure that they are marked as
being in the CPU domain since any external access is uncontrolled and we
must assume the worst. This means that we need to always flush the pages
on acquisition if we need to use them on the GPU, and from the beginning
have used set-domain. Set-domain is overkill for the purpose as it is a
general synchronisation barrier, but our intent is to only flush the
pages being swapped in. If we move that flush into the pages acquisition
phase, we know then that when we have obj->mm.pages, they are coherent
with the GPU and need only maintain that status without resorting to
heavy handed use of set-domain.

The principle knock-on effect for userspace is through mmap-gtt
pagefaulting. Our uAPI has always implied that the GTT mmap was async
(especially as when any pagefault occurs is unpredicatable to userspace)
and so userspace had to apply explicit domain control itself
(set-domain). However, swapping is transparent to the kernel, and so on
first fault we need to acquire the pages and make them coherent for
access through the GTT. Our use of set-domain here leaks into the uABI
that the first pagefault was synchronous. This is unintentional and
baring a few igt should be unoticed, nevertheless we bump the uABI
version for mmap-gtt to reflect the change in behaviour.

Another implication of the change is that gem_create() is presumed to
create an object that is coherent with the CPU and is in the CPU write
domain, so a set-domain(CPU) following a gem_create() would be a minor
operation that merely checked whether we could allocate all pages for
the object. On applying this change, a set-domain(CPU) causes a clflush
as we acquire the pages. This will have a small impact on mesa as we move
the clflush here on !llc from execbuf time to create, but that should
have minimal performance impact as the same clflush exists but is now
done early and because of the clflush issue, userspace recycles bo and
so should resist allocating fresh objects.

Internally, the presumption that objects are created in the CPU
write-domain and remain so through writes to obj->mm.mapping is more
prevalent than I expect; but easy enough to catch and apply a manual
flush.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Matthew Auld <matthew.william.auld@gmail.com>
Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
Cc: Antonio Argenziano <antonio.argenziano@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h               |  8 +++
 drivers/gpu/drm/i915/i915_gem.c               | 57 ++++++++++++-----
 drivers/gpu/drm/i915/i915_gem_dmabuf.c        |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  7 +--
 drivers/gpu/drm/i915/i915_gem_render_state.c  |  2 +-
 drivers/gpu/drm/i915/i915_perf.c              |  4 +-
 drivers/gpu/drm/i915/intel_engine_cs.c        |  4 +-
 drivers/gpu/drm/i915/intel_lrc.c              | 63 +++++++++----------
 drivers/gpu/drm/i915/intel_ringbuffer.c       | 62 +++++++-----------
 drivers/gpu/drm/i915/selftests/huge_pages.c   |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_context.c | 17 ++---
 .../gpu/drm/i915/selftests/i915_gem_dmabuf.c  |  1 +
 .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
 drivers/gpu/drm/i915/selftests/i915_request.c | 14 ++---
 drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  2 +-
 drivers/gpu/drm/i915/selftests/intel_lrc.c    |  5 +-
 .../drm/i915/selftests/intel_workarounds.c    |  3 +
 18 files changed, 127 insertions(+), 134 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c65c2e6649df..395aa9d5ba02 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2959,6 +2959,14 @@ i915_coherent_map_type(struct drm_i915_private *i915)
 void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 					   enum i915_map_type type);
 
+void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj,
+				 unsigned long offset,
+				 unsigned long size);
+static inline void i915_gem_object_flush_map(struct drm_i915_gem_object *obj)
+{
+	__i915_gem_object_flush_map(obj, 0, obj->base.size);
+}
+
 /**
  * i915_gem_object_unpin_map - releases an earlier mapping
  * @obj: the object to unmap
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b7086c8d4726..41d96414ef18 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1713,6 +1713,9 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
  * 2 - Recognise WC as a separate cache domain so that we can flush the
  *     delayed writes via GTT before performing direct access via WC.
  *
+ * 3 - Remove implicit set-domain(GTT) and synchronisation on initial
+ *     pagefault; swapin remains transparent.
+ *
  * Restrictions:
  *
  *  * snoopable objects cannot be accessed via the GTT. It can cause machine
@@ -1740,7 +1743,7 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
  */
 int i915_gem_mmap_gtt_version(void)
 {
-	return 2;
+	return 3;
 }
 
 static inline struct i915_ggtt_view
@@ -1808,17 +1811,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 
 	trace_i915_gem_object_fault(obj, page_offset, true, write);
 
-	/* Try to flush the object off the GPU first without holding the lock.
-	 * Upon acquiring the lock, we will perform our sanity checks and then
-	 * repeat the flush holding the lock in the normal manner to catch cases
-	 * where we are gazumped.
-	 */
-	ret = i915_gem_object_wait(obj,
-				   I915_WAIT_INTERRUPTIBLE,
-				   MAX_SCHEDULE_TIMEOUT);
-	if (ret)
-		goto err;
-
 	ret = i915_gem_object_pin_pages(obj);
 	if (ret)
 		goto err;
@@ -1874,10 +1866,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
 		goto err_unlock;
 	}
 
-	ret = i915_gem_object_set_to_gtt_domain(obj, write);
-	if (ret)
-		goto err_unpin;
-
 	ret = i915_vma_pin_fence(vma);
 	if (ret)
 		goto err_unpin;
@@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
 
 	lockdep_assert_held(&obj->mm.lock);
 
+	/* Make the pages coherent with the GPU (flushing any swapin). */
+	if (obj->cache_dirty) {
+		obj->write_domain = 0;
+		if (i915_gem_object_has_struct_page(obj))
+			drm_clflush_sg(pages);
+		obj->cache_dirty = false;
+	}
+
 	obj->mm.get_page.sg_pos = pages->sgl;
 	obj->mm.get_page.sg_idx = 0;
 
@@ -2735,6 +2731,33 @@ void *i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
 	goto out_unlock;
 }
 
+void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj,
+				 unsigned long offset,
+				 unsigned long size)
+{
+	enum i915_map_type has_type;
+	void *ptr;
+
+	GEM_BUG_ON(!i915_gem_object_has_pinned_pages(obj));
+	GEM_BUG_ON(range_overflows_t(typeof(obj->base.size),
+				     offset, size, obj->base.size));
+
+	obj->mm.dirty = true;
+
+	if (obj->cache_coherent & I915_BO_CACHE_COHERENT_FOR_WRITE)
+		return;
+
+	ptr = page_unpack_bits(obj->mm.mapping, &has_type);
+	if (has_type == I915_MAP_WC)
+		return;
+
+	drm_clflush_virt_range(ptr + offset, size);
+	if (size == obj->base.size) {
+		obj->write_domain &= ~I915_GEM_DOMAIN_CPU;
+		obj->cache_dirty = false;
+	}
+}
+
 static int
 i915_gem_object_pwrite_gtt(struct drm_i915_gem_object *obj,
 			   const struct drm_i915_gem_pwrite *arg)
@@ -4692,6 +4715,8 @@ static int __intel_engines_record_defaults(struct drm_i915_private *i915)
 			goto err_active;
 
 		engine->default_state = i915_gem_object_get(state->obj);
+		i915_gem_object_set_cache_coherency(engine->default_state,
+						    I915_CACHE_LLC);
 
 		/* Check we can acquire the image of the context state */
 		vaddr = i915_gem_object_pin_map(engine->default_state,
diff --git a/drivers/gpu/drm/i915/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
index 33181678990e..5a101a9462d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/i915_gem_dmabuf.c
@@ -107,6 +107,7 @@ static void i915_gem_dmabuf_vunmap(struct dma_buf *dma_buf, void *vaddr)
 {
 	struct drm_i915_gem_object *obj = dma_buf_to_obj(dma_buf);
 
+	i915_gem_object_flush_map(obj);
 	i915_gem_object_unpin_map(obj);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ee6d301a9627..3d672c9edb94 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -1001,7 +1001,10 @@ static void reloc_gpu_flush(struct reloc_cache *cache)
 {
 	GEM_BUG_ON(cache->rq_size >= cache->rq->batch->obj->base.size / sizeof(u32));
 	cache->rq_cmd[cache->rq_size] = MI_BATCH_BUFFER_END;
+
+	__i915_gem_object_flush_map(cache->rq->batch->obj, 0, cache->rq_size);
 	i915_gem_object_unpin_map(cache->rq->batch->obj);
+
 	i915_gem_chipset_flush(cache->rq->i915);
 
 	i915_request_add(cache->rq);
@@ -1214,10 +1217,6 @@ static int __reloc_gpu_alloc(struct i915_execbuffer *eb,
 	if (IS_ERR(cmd))
 		return PTR_ERR(cmd);
 
-	err = i915_gem_object_set_to_wc_domain(obj, false);
-	if (err)
-		goto err_unmap;
-
 	batch = i915_vma_instance(obj, vma->vm, NULL);
 	if (IS_ERR(batch)) {
 		err = PTR_ERR(batch);
diff --git a/drivers/gpu/drm/i915/i915_gem_render_state.c b/drivers/gpu/drm/i915/i915_gem_render_state.c
index 91196348c68c..9440024c763f 100644
--- a/drivers/gpu/drm/i915/i915_gem_render_state.c
+++ b/drivers/gpu/drm/i915/i915_gem_render_state.c
@@ -164,7 +164,7 @@ static int render_state_setup(struct intel_render_state *so,
 		drm_clflush_virt_range(d, i * sizeof(u32));
 	kunmap_atomic(d);
 
-	ret = i915_gem_object_set_to_gtt_domain(so->obj, false);
+	ret = 0;
 out:
 	i915_gem_obj_finish_shmem_access(so->obj);
 	return ret;
diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 9b0292a38865..7f92d52579bd 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -1509,9 +1509,7 @@ static int alloc_oa_buffer(struct drm_i915_private *dev_priv)
 		goto unlock;
 	}
 
-	ret = i915_gem_object_set_cache_level(bo, I915_CACHE_LLC);
-	if (ret)
-		goto err_unref;
+	i915_gem_object_set_cache_coherency(bo, I915_CACHE_LLC);
 
 	/* PreHSW required 512K alignment, HSW requires 16M */
 	vma = i915_gem_object_ggtt_pin(bo, NULL, 0, SZ_16M, 0);
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 652c1b3ba190..314b86b6f88d 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -528,9 +528,7 @@ static int init_status_page(struct intel_engine_cs *engine)
 		return PTR_ERR(obj);
 	}
 
-	ret = i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
-	if (ret)
-		goto err;
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 
 	vma = i915_vma_instance(obj, &engine->i915->ggtt.vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 51c2ea164b36..7e0c20a2d733 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1248,6 +1248,30 @@ static void execlists_context_destroy(struct kref *kref)
 	intel_context_free(ce);
 }
 
+static int __context_pin(struct i915_vma *vma)
+{
+	unsigned int flags;
+	int err;
+
+	flags = PIN_GLOBAL | PIN_HIGH;
+	flags |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
+
+	err = i915_vma_pin(vma, 0, 0, flags);
+	if (err)
+		return err;
+
+	vma->obj->pin_global++;
+	vma->obj->mm.dirty = true;
+
+	return 0;
+}
+
+static void __context_unpin(struct i915_vma *vma)
+{
+	vma->obj->pin_global--;
+	__i915_vma_unpin(vma);
+}
+
 static void execlists_context_unpin(struct intel_context *ce)
 {
 	struct intel_engine_cs *engine;
@@ -1276,31 +1300,8 @@ static void execlists_context_unpin(struct intel_context *ce)
 
 	intel_ring_unpin(ce->ring);
 
-	ce->state->obj->pin_global--;
 	i915_gem_object_unpin_map(ce->state->obj);
-	i915_vma_unpin(ce->state);
-}
-
-static int __context_pin(struct i915_vma *vma)
-{
-	unsigned int flags;
-	int err;
-
-	/*
-	 * Clear this page out of any CPU caches for coherent swap-in/out.
-	 * We only want to do this on the first bind so that we do not stall
-	 * on an active context (which by nature is already on the GPU).
-	 */
-	if (!(vma->flags & I915_VMA_GLOBAL_BIND)) {
-		err = i915_gem_object_set_to_wc_domain(vma->obj, true);
-		if (err)
-			return err;
-	}
-
-	flags = PIN_GLOBAL | PIN_HIGH;
-	flags |= PIN_OFFSET_BIAS | i915_ggtt_pin_bias(vma);
-
-	return i915_vma_pin(vma, 0, 0, flags);
+	__context_unpin(ce->state);
 }
 
 static void
@@ -1361,7 +1362,6 @@ __execlists_context_pin(struct intel_context *ce,
 	ce->lrc_reg_state = vaddr + LRC_STATE_PN * PAGE_SIZE;
 	__execlists_update_reg_state(ce, engine);
 
-	ce->state->obj->pin_global++;
 	return 0;
 
 unpin_ring:
@@ -1369,7 +1369,7 @@ __execlists_context_pin(struct intel_context *ce,
 unpin_map:
 	i915_gem_object_unpin_map(ce->state->obj);
 unpin_vma:
-	__i915_vma_unpin(ce->state);
+	__context_unpin(ce->state);
 err:
 	return ret;
 }
@@ -2751,19 +2751,12 @@ populate_lr_context(struct intel_context *ce,
 	u32 *regs;
 	int ret;
 
-	ret = i915_gem_object_set_to_cpu_domain(ctx_obj, true);
-	if (ret) {
-		DRM_DEBUG_DRIVER("Could not set to CPU domain\n");
-		return ret;
-	}
-
 	vaddr = i915_gem_object_pin_map(ctx_obj, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		ret = PTR_ERR(vaddr);
 		DRM_DEBUG_DRIVER("Could not map object pages! (%d)\n", ret);
 		return ret;
 	}
-	ctx_obj->mm.dirty = true;
 
 	if (engine->default_state) {
 		/*
@@ -2798,7 +2791,11 @@ populate_lr_context(struct intel_context *ce,
 			_MASKED_BIT_ENABLE(CTX_CTRL_ENGINE_CTX_RESTORE_INHIBIT |
 					   CTX_CTRL_ENGINE_CTX_SAVE_INHIBIT);
 
+	ret = 0;
 err_unpin_ctx:
+	__i915_gem_object_flush_map(ctx_obj,
+				    LRC_HEADER_PAGES * PAGE_SIZE,
+				    engine->context_size);
 	i915_gem_object_unpin_map(ctx_obj);
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 9e7ad17b5250..0310d5d53bf9 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1195,15 +1195,6 @@ int intel_ring_pin(struct intel_ring *ring)
 	else
 		flags |= PIN_HIGH;
 
-	if (!(vma->flags & I915_VMA_GLOBAL_BIND)) {
-		if (flags & PIN_MAPPABLE || map == I915_MAP_WC)
-			ret = i915_gem_object_set_to_gtt_domain(vma->obj, true);
-		else
-			ret = i915_gem_object_set_to_cpu_domain(vma->obj, true);
-		if (unlikely(ret))
-			goto unpin_timeline;
-	}
-
 	ret = i915_vma_pin(vma, 0, 0, flags);
 	if (unlikely(ret))
 		goto unpin_timeline;
@@ -1392,17 +1383,6 @@ static int __context_pin(struct intel_context *ce)
 	if (!vma)
 		return 0;
 
-	/*
-	 * Clear this page out of any CPU caches for coherent swap-in/out.
-	 * We only want to do this on the first bind so that we do not stall
-	 * on an active context (which by nature is already on the GPU).
-	 */
-	if (!(vma->flags & I915_VMA_GLOBAL_BIND)) {
-		err = i915_gem_object_set_to_gtt_domain(vma->obj, true);
-		if (err)
-			return err;
-	}
-
 	err = i915_vma_pin(vma, 0, 0, PIN_GLOBAL | PIN_HIGH);
 	if (err)
 		return err;
@@ -1412,6 +1392,7 @@ static int __context_pin(struct intel_context *ce)
 	 * it cannot reclaim the object until we release it.
 	 */
 	vma->obj->pin_global++;
+	vma->obj->mm.dirty = true;
 
 	return 0;
 }
@@ -1446,6 +1427,24 @@ alloc_context_vma(struct intel_engine_cs *engine)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
+	/*
+	 * Try to make the context utilize L3 as well as LLC.
+	 *
+	 * On VLV we don't have L3 controls in the PTEs so we
+	 * shouldn't touch the cache level, especially as that
+	 * would make the object snooped which might have a
+	 * negative performance impact.
+	 *
+	 * Snooping is required on non-llc platforms in execlist
+	 * mode, but since all GGTT accesses use PAT entry 0 we
+	 * get snooping anyway regardless of cache_level.
+	 *
+	 * This is only applicable for Ivy Bridge devices since
+	 * later platforms don't have L3 control bits in the PTE.
+	 */
+	if (IS_IVYBRIDGE(i915))
+		i915_gem_object_set_cache_coherency(obj, I915_CACHE_L3_LLC);
+
 	if (engine->default_state) {
 		void *defaults, *vaddr;
 
@@ -1463,29 +1462,10 @@ alloc_context_vma(struct intel_engine_cs *engine)
 		}
 
 		memcpy(vaddr, defaults, engine->context_size);
-
 		i915_gem_object_unpin_map(engine->default_state);
-		i915_gem_object_unpin_map(obj);
-	}
 
-	/*
-	 * Try to make the context utilize L3 as well as LLC.
-	 *
-	 * On VLV we don't have L3 controls in the PTEs so we
-	 * shouldn't touch the cache level, especially as that
-	 * would make the object snooped which might have a
-	 * negative performance impact.
-	 *
-	 * Snooping is required on non-llc platforms in execlist
-	 * mode, but since all GGTT accesses use PAT entry 0 we
-	 * get snooping anyway regardless of cache_level.
-	 *
-	 * This is only applicable for Ivy Bridge devices since
-	 * later platforms don't have L3 control bits in the PTE.
-	 */
-	if (IS_IVYBRIDGE(i915)) {
-		/* Ignore any error, regard it as a simple optimisation */
-		i915_gem_object_set_cache_level(obj, I915_CACHE_L3_LLC);
+		i915_gem_object_flush_map(obj);
+		i915_gem_object_unpin_map(obj);
 	}
 
 	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index 2e1db30af477..218cfc361de3 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -908,10 +908,6 @@ gpu_write_dw(struct i915_vma *vma, u64 offset, u32 val)
 	if (IS_ERR(obj))
 		return ERR_CAST(obj);
 
-	err = i915_gem_object_set_to_wc_domain(obj, true);
-	if (err)
-		goto err;
-
 	cmd = i915_gem_object_pin_map(obj, I915_MAP_WC);
 	if (IS_ERR(cmd)) {
 		err = PTR_ERR(cmd);
@@ -1584,6 +1580,7 @@ static int igt_tmpfs_fallback(void *arg)
 	}
 	*vaddr = 0xdeadbeaf;
 
+	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
 	vma = i915_vma_instance(obj, vm, NULL);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 4399ef9ebf15..0759a90c0d5a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -220,6 +220,7 @@ gpu_fill_dw(struct i915_vma *vma, u64 offset, unsigned long count, u32 value)
 		offset += PAGE_SIZE;
 	}
 	*cmd = MI_BATCH_BUFFER_END;
+	i915_gem_object_flush_map(obj);
 	i915_gem_object_unpin_map(obj);
 
 	err = i915_gem_object_set_to_gtt_domain(obj, false);
@@ -604,12 +605,9 @@ static struct i915_vma *rpcs_query_batch(struct i915_vma *vma)
 	*cmd++ = upper_32_bits(vma->node.start);
 	*cmd = MI_BATCH_BUFFER_END;
 
+	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
-	err = i915_gem_object_set_to_gtt_domain(obj, false);
-	if (err)
-		goto err;
-
 	vma = i915_vma_instance(obj, vma->vm, NULL);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
@@ -1202,12 +1200,9 @@ static int write_to_scratch(struct i915_gem_context *ctx,
 	}
 	*cmd++ = value;
 	*cmd = MI_BATCH_BUFFER_END;
+	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
-	err = i915_gem_object_set_to_gtt_domain(obj, false);
-	if (err)
-		goto err;
-
 	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
 	if (IS_ERR(vma)) {
 		err = PTR_ERR(vma);
@@ -1299,11 +1294,9 @@ static int read_from_scratch(struct i915_gem_context *ctx,
 		*cmd++ = result;
 	}
 	*cmd = MI_BATCH_BUFFER_END;
-	i915_gem_object_unpin_map(obj);
 
-	err = i915_gem_object_set_to_gtt_domain(obj, false);
-	if (err)
-		goto err;
+	i915_gem_object_flush_map(obj);
+	i915_gem_object_unpin_map(obj);
 
 	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
 	if (IS_ERR(vma)) {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
index a7055b12e53c..2b943ee246c9 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_dmabuf.c
@@ -315,6 +315,7 @@ static int igt_dmabuf_export_kmap(void *arg)
 		goto err;
 	}
 	memset(ptr + PAGE_SIZE, 0xaa, PAGE_SIZE);
+	i915_gem_object_flush_map(obj);
 	i915_gem_object_unpin_map(obj);
 
 	ptr = dma_buf_kmap(dmabuf, 1);
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
index b270eab1cad1..9a9451846b33 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_evict.c
@@ -274,7 +274,7 @@ static int igt_evict_for_cache_color(void *arg)
 		err = PTR_ERR(obj);
 		goto cleanup;
 	}
-	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 	quirk_add(obj, &objects);
 
 	vma = i915_gem_object_ggtt_pin(obj, NULL, 0, 0,
@@ -290,7 +290,7 @@ static int igt_evict_for_cache_color(void *arg)
 		err = PTR_ERR(obj);
 		goto cleanup;
 	}
-	i915_gem_object_set_cache_level(obj, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(obj, I915_CACHE_LLC);
 	quirk_add(obj, &objects);
 
 	/* Neighbouring; same colour - should fit */
diff --git a/drivers/gpu/drm/i915/selftests/i915_request.c b/drivers/gpu/drm/i915/selftests/i915_request.c
index 3eb6a6b075ab..e6ffe2240126 100644
--- a/drivers/gpu/drm/i915/selftests/i915_request.c
+++ b/drivers/gpu/drm/i915/selftests/i915_request.c
@@ -619,13 +619,11 @@ static struct i915_vma *empty_batch(struct drm_i915_private *i915)
 	}
 
 	*cmd = MI_BATCH_BUFFER_END;
-	i915_gem_chipset_flush(i915);
 
+	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
-	err = i915_gem_object_set_to_gtt_domain(obj, false);
-	if (err)
-		goto err;
+	i915_gem_chipset_flush(i915);
 
 	vma = i915_vma_instance(obj, &i915->ggtt.vm, NULL);
 	if (IS_ERR(vma)) {
@@ -777,10 +775,6 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
 	if (err)
 		goto err;
 
-	err = i915_gem_object_set_to_wc_domain(obj, true);
-	if (err)
-		goto err;
-
 	cmd = i915_gem_object_pin_map(obj, I915_MAP_WC);
 	if (IS_ERR(cmd)) {
 		err = PTR_ERR(cmd);
@@ -799,10 +793,12 @@ static struct i915_vma *recursive_batch(struct drm_i915_private *i915)
 		*cmd++ = lower_32_bits(vma->node.start);
 	}
 	*cmd++ = MI_BATCH_BUFFER_END; /* terminate early in case of error */
-	i915_gem_chipset_flush(i915);
 
+	__i915_gem_object_flush_map(obj, 0, 64);
 	i915_gem_object_unpin_map(obj);
 
+	i915_gem_chipset_flush(i915);
+
 	return vma;
 
 err:
diff --git a/drivers/gpu/drm/i915/selftests/igt_spinner.c b/drivers/gpu/drm/i915/selftests/igt_spinner.c
index d0b93a3fbc54..16890dfe74c0 100644
--- a/drivers/gpu/drm/i915/selftests/igt_spinner.c
+++ b/drivers/gpu/drm/i915/selftests/igt_spinner.c
@@ -29,7 +29,7 @@ int igt_spinner_init(struct igt_spinner *spin, struct drm_i915_private *i915)
 		goto err_hws;
 	}
 
-	i915_gem_object_set_cache_level(spin->hws, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(spin->hws, I915_CACHE_LLC);
 	vaddr = i915_gem_object_pin_map(spin->hws, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		err = PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index b5e35b2a925f..76b4fa150f2e 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -70,7 +70,7 @@ static int hang_init(struct hang *h, struct drm_i915_private *i915)
 		goto err_hws;
 	}
 
-	i915_gem_object_set_cache_level(h->hws, I915_CACHE_LLC);
+	i915_gem_object_set_cache_coherency(h->hws, I915_CACHE_LLC);
 	vaddr = i915_gem_object_pin_map(h->hws, I915_MAP_WB);
 	if (IS_ERR(vaddr)) {
 		err = PTR_ERR(vaddr);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index d61520ea03c1..9e871eb0bfb1 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -1018,12 +1018,9 @@ static int live_preempt_smoke(void *arg)
 	for (n = 0; n < PAGE_SIZE / sizeof(*cs) - 1; n++)
 		cs[n] = MI_ARB_CHECK;
 	cs[n] = MI_BATCH_BUFFER_END;
+	i915_gem_object_flush_map(smoke.batch);
 	i915_gem_object_unpin_map(smoke.batch);
 
-	err = i915_gem_object_set_to_gtt_domain(smoke.batch, false);
-	if (err)
-		goto err_batch;
-
 	for (n = 0; n < smoke.ncontext; n++) {
 		smoke.contexts[n] = kernel_context(smoke.i915);
 		if (!smoke.contexts[n])
diff --git a/drivers/gpu/drm/i915/selftests/intel_workarounds.c b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
index f2a2b51a4662..3baed59008d7 100644
--- a/drivers/gpu/drm/i915/selftests/intel_workarounds.c
+++ b/drivers/gpu/drm/i915/selftests/intel_workarounds.c
@@ -90,6 +90,7 @@ read_nonprivs(struct i915_gem_context *ctx, struct intel_engine_cs *engine)
 		goto err_obj;
 	}
 	memset(cs, 0xc5, PAGE_SIZE);
+	i915_gem_object_flush_map(result);
 	i915_gem_object_unpin_map(result);
 
 	vma = i915_vma_instance(result, &engine->i915->ggtt.vm, NULL);
@@ -358,6 +359,7 @@ static struct i915_vma *create_scratch(struct i915_gem_context *ctx)
 		goto err_obj;
 	}
 	memset(ptr, 0xc5, PAGE_SIZE);
+	i915_gem_object_flush_map(obj);
 	i915_gem_object_unpin_map(obj);
 
 	vma = i915_vma_instance(obj, &ctx->ppgtt->vm, NULL);
@@ -551,6 +553,7 @@ static int check_dirty_whitelist(struct i915_gem_context *ctx,
 
 		*cs++ = MI_BATCH_BUFFER_END;
 
+		i915_gem_object_flush_map(batch->obj);
 		i915_gem_object_unpin_map(batch->obj);
 		i915_gem_chipset_flush(ctx->i915);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 03/18] drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
  2019-03-19 11:57 ` [PATCH 02/18] drm/i915: Flush pages on acquisition Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 11:57 ` [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace Chris Wilson
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

We want to use intel_engine_mask_t inside i915_request.h, which means
extracting it from the general header file mess and placing it inside a
types.h. A knock on effect is that the compiler wants to warn about
type-contraction of ALL_ENGINES into intel_engine_maskt_t, so prepare
for the worst.

v2: Use intel_engine_mask_t consistently

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Mika Kuoppala <mika.kuoppala@linux.intel.com>
---
 drivers/gpu/drm/i915/Makefile                 |  1 +
 drivers/gpu/drm/i915/gvt/execlist.c           | 11 ++-
 drivers/gpu/drm/i915/gvt/execlist.h           |  2 +-
 drivers/gpu/drm/i915/gvt/gvt.h                |  8 +-
 drivers/gpu/drm/i915/gvt/handlers.c           |  2 +-
 drivers/gpu/drm/i915/gvt/scheduler.c          |  8 +-
 drivers/gpu/drm/i915/gvt/scheduler.h          |  6 +-
 drivers/gpu/drm/i915/gvt/vgpu.c               |  4 +-
 drivers/gpu/drm/i915/i915_debugfs.c           |  2 +-
 drivers/gpu/drm/i915/i915_drv.h               |  1 -
 drivers/gpu/drm/i915/i915_gem_context.c       |  6 +-
 drivers/gpu/drm/i915/i915_gem_context.h       |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  2 +-
 drivers/gpu/drm/i915/i915_gpu_error.c         |  9 +-
 drivers/gpu/drm/i915/i915_gpu_error.h         |  2 +-
 drivers/gpu/drm/i915/i915_reset.c             | 43 ++++----
 drivers/gpu/drm/i915/i915_reset.h             |  9 +-
 drivers/gpu/drm/i915/i915_scheduler.h         | 86 +---------------
 drivers/gpu/drm/i915/i915_scheduler_types.h   | 98 +++++++++++++++++++
 drivers/gpu/drm/i915/i915_timeline.h          |  1 +
 drivers/gpu/drm/i915/i915_timeline_types.h    |  3 +-
 drivers/gpu/drm/i915/intel_device_info.h      |  3 +-
 drivers/gpu/drm/i915/intel_engine_types.h     |  8 +-
 drivers/gpu/drm/i915/intel_guc_submission.c   |  2 +-
 drivers/gpu/drm/i915/intel_hangcheck.c        |  2 +-
 .../gpu/drm/i915/selftests/i915_gem_context.c |  8 +-
 .../gpu/drm/i915/selftests/intel_hangcheck.c  |  3 +-
 .../test_i915_scheduler_types_standalone.c    |  7 ++
 28 files changed, 189 insertions(+), 150 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_scheduler_types.h
 create mode 100644 drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 68fecf355471..197b081769b5 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -60,6 +60,7 @@ i915-$(CONFIG_PERF_EVENTS) += i915_pmu.o
 i915-$(CONFIG_DRM_I915_WERROR) += \
 	test_i915_active_types_standalone.o \
 	test_i915_gem_context_types_standalone.o \
+	test_i915_scheduler_types_standalone.o \
 	test_i915_timeline_types_standalone.o \
 	test_intel_context_types_standalone.o \
 	test_intel_engine_types_standalone.o \
diff --git a/drivers/gpu/drm/i915/gvt/execlist.c b/drivers/gpu/drm/i915/gvt/execlist.c
index 1a93472cb34e..f21b8fb5b37e 100644
--- a/drivers/gpu/drm/i915/gvt/execlist.c
+++ b/drivers/gpu/drm/i915/gvt/execlist.c
@@ -526,12 +526,13 @@ static void init_vgpu_execlist(struct intel_vgpu *vgpu, int ring_id)
 	vgpu_vreg(vgpu, ctx_status_ptr_reg) = ctx_status_ptr.dw;
 }
 
-static void clean_execlist(struct intel_vgpu *vgpu, unsigned long engine_mask)
+static void clean_execlist(struct intel_vgpu *vgpu,
+			   intel_engine_mask_t engine_mask)
 {
-	unsigned int tmp;
 	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 	struct intel_engine_cs *engine;
 	struct intel_vgpu_submission *s = &vgpu->submission;
+	intel_engine_mask_t tmp;
 
 	for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
 		kfree(s->ring_scan_buffer[engine->id]);
@@ -541,18 +542,18 @@ static void clean_execlist(struct intel_vgpu *vgpu, unsigned long engine_mask)
 }
 
 static void reset_execlist(struct intel_vgpu *vgpu,
-		unsigned long engine_mask)
+			   intel_engine_mask_t engine_mask)
 {
 	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 	struct intel_engine_cs *engine;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 
 	for_each_engine_masked(engine, dev_priv, engine_mask, tmp)
 		init_vgpu_execlist(vgpu, engine->id);
 }
 
 static int init_execlist(struct intel_vgpu *vgpu,
-			 unsigned long engine_mask)
+			 intel_engine_mask_t engine_mask)
 {
 	reset_execlist(vgpu, engine_mask);
 	return 0;
diff --git a/drivers/gpu/drm/i915/gvt/execlist.h b/drivers/gpu/drm/i915/gvt/execlist.h
index 714d709829a2..5ccc2c695848 100644
--- a/drivers/gpu/drm/i915/gvt/execlist.h
+++ b/drivers/gpu/drm/i915/gvt/execlist.h
@@ -180,6 +180,6 @@ int intel_vgpu_init_execlist(struct intel_vgpu *vgpu);
 int intel_vgpu_submit_execlist(struct intel_vgpu *vgpu, int ring_id);
 
 void intel_vgpu_reset_execlist(struct intel_vgpu *vgpu,
-		unsigned long engine_mask);
+			       intel_engine_mask_t engine_mask);
 
 #endif /*_GVT_EXECLIST_H_*/
diff --git a/drivers/gpu/drm/i915/gvt/gvt.h b/drivers/gpu/drm/i915/gvt/gvt.h
index 8bce09de4b82..7a4e1a6387e5 100644
--- a/drivers/gpu/drm/i915/gvt/gvt.h
+++ b/drivers/gpu/drm/i915/gvt/gvt.h
@@ -144,9 +144,9 @@ enum {
 
 struct intel_vgpu_submission_ops {
 	const char *name;
-	int (*init)(struct intel_vgpu *vgpu, unsigned long engine_mask);
-	void (*clean)(struct intel_vgpu *vgpu, unsigned long engine_mask);
-	void (*reset)(struct intel_vgpu *vgpu, unsigned long engine_mask);
+	int (*init)(struct intel_vgpu *vgpu, intel_engine_mask_t engine_mask);
+	void (*clean)(struct intel_vgpu *vgpu, intel_engine_mask_t engine_mask);
+	void (*reset)(struct intel_vgpu *vgpu, intel_engine_mask_t engine_mask);
 };
 
 struct intel_vgpu_submission {
@@ -488,7 +488,7 @@ struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
 void intel_gvt_destroy_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_release_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
-				 unsigned int engine_mask);
+				 intel_engine_mask_t engine_mask);
 void intel_gvt_reset_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_activate_vgpu(struct intel_vgpu *vgpu);
 void intel_gvt_deactivate_vgpu(struct intel_vgpu *vgpu);
diff --git a/drivers/gpu/drm/i915/gvt/handlers.c b/drivers/gpu/drm/i915/gvt/handlers.c
index b596cb42e24e..5d44db21acc4 100644
--- a/drivers/gpu/drm/i915/gvt/handlers.c
+++ b/drivers/gpu/drm/i915/gvt/handlers.c
@@ -311,7 +311,7 @@ static int mul_force_wake_write(struct intel_vgpu *vgpu,
 static int gdrst_mmio_write(struct intel_vgpu *vgpu, unsigned int offset,
 			    void *p_data, unsigned int bytes)
 {
-	unsigned int engine_mask = 0;
+	intel_engine_mask_t engine_mask = 0;
 	u32 data;
 
 	write_vreg(vgpu, offset, p_data, bytes);
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.c b/drivers/gpu/drm/i915/gvt/scheduler.c
index 7550e09939ae..7459162b38c9 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.c
+++ b/drivers/gpu/drm/i915/gvt/scheduler.c
@@ -838,13 +838,13 @@ static void update_guest_context(struct intel_vgpu_workload *workload)
 }
 
 void intel_vgpu_clean_workloads(struct intel_vgpu *vgpu,
-				unsigned long engine_mask)
+				intel_engine_mask_t engine_mask)
 {
 	struct intel_vgpu_submission *s = &vgpu->submission;
 	struct drm_i915_private *dev_priv = vgpu->gvt->dev_priv;
 	struct intel_engine_cs *engine;
 	struct intel_vgpu_workload *pos, *n;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 
 	/* free the unsubmited workloads in the queues. */
 	for_each_engine_masked(engine, dev_priv, engine_mask, tmp) {
@@ -1137,7 +1137,7 @@ void intel_vgpu_clean_submission(struct intel_vgpu *vgpu)
  *
  */
 void intel_vgpu_reset_submission(struct intel_vgpu *vgpu,
-		unsigned long engine_mask)
+				 intel_engine_mask_t engine_mask)
 {
 	struct intel_vgpu_submission *s = &vgpu->submission;
 
@@ -1227,7 +1227,7 @@ int intel_vgpu_setup_submission(struct intel_vgpu *vgpu)
  *
  */
 int intel_vgpu_select_submission_ops(struct intel_vgpu *vgpu,
-				     unsigned long engine_mask,
+				     intel_engine_mask_t engine_mask,
 				     unsigned int interface)
 {
 	struct intel_vgpu_submission *s = &vgpu->submission;
diff --git a/drivers/gpu/drm/i915/gvt/scheduler.h b/drivers/gpu/drm/i915/gvt/scheduler.h
index 0635b2c4bed7..90c6756f5453 100644
--- a/drivers/gpu/drm/i915/gvt/scheduler.h
+++ b/drivers/gpu/drm/i915/gvt/scheduler.h
@@ -142,12 +142,12 @@ void intel_gvt_wait_vgpu_idle(struct intel_vgpu *vgpu);
 int intel_vgpu_setup_submission(struct intel_vgpu *vgpu);
 
 void intel_vgpu_reset_submission(struct intel_vgpu *vgpu,
-				 unsigned long engine_mask);
+				 intel_engine_mask_t engine_mask);
 
 void intel_vgpu_clean_submission(struct intel_vgpu *vgpu);
 
 int intel_vgpu_select_submission_ops(struct intel_vgpu *vgpu,
-				     unsigned long engine_mask,
+				     intel_engine_mask_t engine_mask,
 				     unsigned int interface);
 
 extern const struct intel_vgpu_submission_ops
@@ -160,6 +160,6 @@ intel_vgpu_create_workload(struct intel_vgpu *vgpu, int ring_id,
 void intel_vgpu_destroy_workload(struct intel_vgpu_workload *workload);
 
 void intel_vgpu_clean_workloads(struct intel_vgpu *vgpu,
-				unsigned long engine_mask);
+				intel_engine_mask_t engine_mask);
 
 #endif
diff --git a/drivers/gpu/drm/i915/gvt/vgpu.c b/drivers/gpu/drm/i915/gvt/vgpu.c
index 314e40121e47..44ce3c2b9ac1 100644
--- a/drivers/gpu/drm/i915/gvt/vgpu.c
+++ b/drivers/gpu/drm/i915/gvt/vgpu.c
@@ -526,11 +526,11 @@ struct intel_vgpu *intel_gvt_create_vgpu(struct intel_gvt *gvt,
  * GPU engines. For FLR, engine_mask is ignored.
  */
 void intel_gvt_reset_vgpu_locked(struct intel_vgpu *vgpu, bool dmlr,
-				 unsigned int engine_mask)
+				 intel_engine_mask_t engine_mask)
 {
 	struct intel_gvt *gvt = vgpu->gvt;
 	struct intel_gvt_workload_scheduler *scheduler = &gvt->scheduler;
-	unsigned int resetting_eng = dmlr ? ALL_ENGINES : engine_mask;
+	intel_engine_mask_t resetting_eng = dmlr ? ALL_ENGINES : engine_mask;
 
 	gvt_dbg_core("------------------------------------------\n");
 	gvt_dbg_core("resseting vgpu%d, dmlr %d, engine_mask %08x\n",
diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 08683dca7775..f4a07190a0e8 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -2245,7 +2245,7 @@ static int i915_guc_stage_pool(struct seq_file *m, void *data)
 	const struct intel_guc *guc = &dev_priv->guc;
 	struct guc_stage_desc *desc = guc->stage_desc_pool_vaddr;
 	struct intel_guc_client *client = guc->execbuf_client;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 	int index;
 
 	if (!USES_GUC_SUBMISSION(dev_priv))
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 395aa9d5ba02..86080a6e0f45 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2432,7 +2432,6 @@ static inline unsigned int i915_sg_segment_size(void)
 #define IS_GEN9_LP(dev_priv)	(IS_GEN(dev_priv, 9) && IS_LP(dev_priv))
 #define IS_GEN9_BC(dev_priv)	(IS_GEN(dev_priv, 9) && !IS_LP(dev_priv))
 
-#define ALL_ENGINES	(~0u)
 #define HAS_ENGINE(dev_priv, id) (INTEL_INFO(dev_priv)->engine_mask & BIT(id))
 
 #define HAS_LLC(dev_priv)	(INTEL_INFO(dev_priv)->has_llc)
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d776d43707e0..2fa24326307a 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -694,9 +694,9 @@ static void cb_retire(struct i915_active *base)
 	kfree(cb);
 }
 
-I915_SELFTEST_DECLARE(static unsigned long context_barrier_inject_fault);
+I915_SELFTEST_DECLARE(static intel_engine_mask_t context_barrier_inject_fault);
 static int context_barrier_task(struct i915_gem_context *ctx,
-				unsigned long engines,
+				intel_engine_mask_t engines,
 				void (*task)(void *data),
 				void *data)
 {
@@ -752,7 +752,7 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 }
 
 int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
-				      unsigned long mask)
+				      intel_engine_mask_t mask)
 {
 	struct intel_engine_cs *engine;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 5a32c4b4816f..8a1377691d6d 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -147,7 +147,7 @@ void i915_gem_context_close(struct drm_file *file);
 
 int i915_switch_context(struct i915_request *rq);
 int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
-				      unsigned long engine_mask);
+				      intel_engine_mask_t engine_mask);
 
 void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 35f21a2ae36c..47a54fbd30bf 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -390,7 +390,7 @@ struct i915_hw_ppgtt {
 	struct i915_address_space vm;
 	struct kref ref;
 
-	unsigned long pd_dirty_engines;
+	intel_engine_mask_t pd_dirty_engines;
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
 		struct i915_page_directory_pointer pdp;	/* GEN8+ */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index 26bac517e383..e8674347f589 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -1095,7 +1095,7 @@ static u32 capture_error_bo(struct drm_i915_error_buffer *err,
  * It's only a small step better than a random number in its current form.
  */
 static u32 i915_error_generate_code(struct i915_gpu_state *error,
-				    unsigned long engine_mask)
+				    intel_engine_mask_t engine_mask)
 {
 	/*
 	 * IPEHR would be an ideal way to detect errors, as it's the gross
@@ -1641,7 +1641,8 @@ static void capture_reg_state(struct i915_gpu_state *error)
 }
 
 static const char *
-error_msg(struct i915_gpu_state *error, unsigned long engines, const char *msg)
+error_msg(struct i915_gpu_state *error,
+	  intel_engine_mask_t engines, const char *msg)
 {
 	int len;
 	int i;
@@ -1651,7 +1652,7 @@ error_msg(struct i915_gpu_state *error, unsigned long engines, const char *msg)
 			engines &= ~BIT(i);
 
 	len = scnprintf(error->error_msg, sizeof(error->error_msg),
-			"GPU HANG: ecode %d:%lx:0x%08x",
+			"GPU HANG: ecode %d:%x:0x%08x",
 			INTEL_GEN(error->i915), engines,
 			i915_error_generate_code(error, engines));
 	if (engines) {
@@ -1790,7 +1791,7 @@ i915_capture_gpu_state(struct drm_i915_private *i915)
  * to pick up.
  */
 void i915_capture_error_state(struct drm_i915_private *i915,
-			      unsigned long engine_mask,
+			      intel_engine_mask_t engine_mask,
 			      const char *msg)
 {
 	static bool warned;
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index 99d6b7b270c2..d011cb90bee1 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -264,7 +264,7 @@ void i915_error_printf(struct drm_i915_error_state_buf *e, const char *f, ...);
 
 struct i915_gpu_state *i915_capture_gpu_state(struct drm_i915_private *i915);
 void i915_capture_error_state(struct drm_i915_private *dev_priv,
-			      unsigned long engine_mask,
+			      intel_engine_mask_t engine_mask,
 			      const char *error_msg);
 
 static inline struct i915_gpu_state *
diff --git a/drivers/gpu/drm/i915/i915_reset.c b/drivers/gpu/drm/i915/i915_reset.c
index fb86d5ca5d8b..9e0347681474 100644
--- a/drivers/gpu/drm/i915/i915_reset.c
+++ b/drivers/gpu/drm/i915/i915_reset.c
@@ -144,15 +144,15 @@ static void gen3_stop_engine(struct intel_engine_cs *engine)
 }
 
 static void i915_stop_engines(struct drm_i915_private *i915,
-			      unsigned int engine_mask)
+			      intel_engine_mask_t engine_mask)
 {
 	struct intel_engine_cs *engine;
-	enum intel_engine_id id;
+	intel_engine_mask_t tmp;
 
 	if (INTEL_GEN(i915) < 3)
 		return;
 
-	for_each_engine_masked(engine, i915, engine_mask, id)
+	for_each_engine_masked(engine, i915, engine_mask, tmp)
 		gen3_stop_engine(engine);
 }
 
@@ -165,7 +165,7 @@ static bool i915_in_reset(struct pci_dev *pdev)
 }
 
 static int i915_do_reset(struct drm_i915_private *i915,
-			 unsigned int engine_mask,
+			 intel_engine_mask_t engine_mask,
 			 unsigned int retry)
 {
 	struct pci_dev *pdev = i915->drm.pdev;
@@ -194,7 +194,7 @@ static bool g4x_reset_complete(struct pci_dev *pdev)
 }
 
 static int g33_do_reset(struct drm_i915_private *i915,
-			unsigned int engine_mask,
+			intel_engine_mask_t engine_mask,
 			unsigned int retry)
 {
 	struct pci_dev *pdev = i915->drm.pdev;
@@ -204,7 +204,7 @@ static int g33_do_reset(struct drm_i915_private *i915,
 }
 
 static int g4x_do_reset(struct drm_i915_private *dev_priv,
-			unsigned int engine_mask,
+			intel_engine_mask_t engine_mask,
 			unsigned int retry)
 {
 	struct pci_dev *pdev = dev_priv->drm.pdev;
@@ -242,7 +242,7 @@ static int g4x_do_reset(struct drm_i915_private *dev_priv,
 }
 
 static int ironlake_do_reset(struct drm_i915_private *dev_priv,
-			     unsigned int engine_mask,
+			     intel_engine_mask_t engine_mask,
 			     unsigned int retry)
 {
 	int ret;
@@ -299,7 +299,7 @@ static int gen6_hw_domain_reset(struct drm_i915_private *dev_priv,
 }
 
 static int gen6_reset_engines(struct drm_i915_private *i915,
-			      unsigned int engine_mask,
+			      intel_engine_mask_t engine_mask,
 			      unsigned int retry)
 {
 	struct intel_engine_cs *engine;
@@ -315,7 +315,7 @@ static int gen6_reset_engines(struct drm_i915_private *i915,
 	if (engine_mask == ALL_ENGINES) {
 		hw_mask = GEN6_GRDOM_FULL;
 	} else {
-		unsigned int tmp;
+		intel_engine_mask_t tmp;
 
 		hw_mask = 0;
 		for_each_engine_masked(engine, i915, engine_mask, tmp) {
@@ -425,7 +425,7 @@ static void gen11_unlock_sfc(struct drm_i915_private *dev_priv,
 }
 
 static int gen11_reset_engines(struct drm_i915_private *i915,
-			       unsigned int engine_mask,
+			       intel_engine_mask_t engine_mask,
 			       unsigned int retry)
 {
 	const u32 hw_engine_mask[] = {
@@ -439,7 +439,7 @@ static int gen11_reset_engines(struct drm_i915_private *i915,
 		[VECS1] = GEN11_GRDOM_VECS2,
 	};
 	struct intel_engine_cs *engine;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 	u32 hw_mask;
 	int ret;
 
@@ -492,12 +492,12 @@ static void gen8_engine_reset_cancel(struct intel_engine_cs *engine)
 }
 
 static int gen8_reset_engines(struct drm_i915_private *i915,
-			      unsigned int engine_mask,
+			      intel_engine_mask_t engine_mask,
 			      unsigned int retry)
 {
 	struct intel_engine_cs *engine;
 	const bool reset_non_ready = retry >= 1;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 	int ret;
 
 	for_each_engine_masked(engine, i915, engine_mask, tmp) {
@@ -533,7 +533,7 @@ static int gen8_reset_engines(struct drm_i915_private *i915,
 }
 
 typedef int (*reset_func)(struct drm_i915_private *,
-			  unsigned int engine_mask,
+			  intel_engine_mask_t engine_mask,
 			  unsigned int retry);
 
 static reset_func intel_get_gpu_reset(struct drm_i915_private *i915)
@@ -554,7 +554,8 @@ static reset_func intel_get_gpu_reset(struct drm_i915_private *i915)
 		return NULL;
 }
 
-int intel_gpu_reset(struct drm_i915_private *i915, unsigned int engine_mask)
+int intel_gpu_reset(struct drm_i915_private *i915,
+		    intel_engine_mask_t engine_mask)
 {
 	const int retries = engine_mask == ALL_ENGINES ? RESET_MAX_RETRIES : 1;
 	reset_func reset;
@@ -688,7 +689,8 @@ static void gt_revoke(struct drm_i915_private *i915)
 	revoke_mmaps(i915);
 }
 
-static int gt_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
+static int gt_reset(struct drm_i915_private *i915,
+		    intel_engine_mask_t stalled_mask)
 {
 	struct intel_engine_cs *engine;
 	enum intel_engine_id id;
@@ -947,7 +949,8 @@ bool i915_gem_unset_wedged(struct drm_i915_private *i915)
 	return result;
 }
 
-static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
+static int do_reset(struct drm_i915_private *i915,
+		    intel_engine_mask_t stalled_mask)
 {
 	int err, i;
 
@@ -982,7 +985,7 @@ static int do_reset(struct drm_i915_private *i915, unsigned int stalled_mask)
  *   - re-init display
  */
 void i915_reset(struct drm_i915_private *i915,
-		unsigned int stalled_mask,
+		intel_engine_mask_t stalled_mask,
 		const char *reason)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
@@ -1224,14 +1227,14 @@ void i915_clear_error_registers(struct drm_i915_private *dev_priv)
  * of a ring dump etc.).
  */
 void i915_handle_error(struct drm_i915_private *i915,
-		       u32 engine_mask,
+		       intel_engine_mask_t engine_mask,
 		       unsigned long flags,
 		       const char *fmt, ...)
 {
 	struct i915_gpu_error *error = &i915->gpu_error;
 	struct intel_engine_cs *engine;
 	intel_wakeref_t wakeref;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 	char error_msg[80];
 	char *msg = NULL;
 
diff --git a/drivers/gpu/drm/i915/i915_reset.h b/drivers/gpu/drm/i915/i915_reset.h
index 16f2389f656f..86b1ac8116ce 100644
--- a/drivers/gpu/drm/i915/i915_reset.h
+++ b/drivers/gpu/drm/i915/i915_reset.h
@@ -11,13 +11,15 @@
 #include <linux/types.h>
 #include <linux/srcu.h>
 
+#include "intel_engine_types.h"
+
 struct drm_i915_private;
 struct intel_engine_cs;
 struct intel_guc;
 
 __printf(4, 5)
 void i915_handle_error(struct drm_i915_private *i915,
-		       u32 engine_mask,
+		       intel_engine_mask_t engine_mask,
 		       unsigned long flags,
 		       const char *fmt, ...);
 #define I915_ERROR_CAPTURE BIT(0)
@@ -25,7 +27,7 @@ void i915_handle_error(struct drm_i915_private *i915,
 void i915_clear_error_registers(struct drm_i915_private *i915);
 
 void i915_reset(struct drm_i915_private *i915,
-		unsigned int stalled_mask,
+		intel_engine_mask_t stalled_mask,
 		const char *reason);
 int i915_reset_engine(struct intel_engine_cs *engine,
 		      const char *reason);
@@ -41,7 +43,8 @@ int i915_terminally_wedged(struct drm_i915_private *i915);
 bool intel_has_gpu_reset(struct drm_i915_private *i915);
 bool intel_has_reset_engine(struct drm_i915_private *i915);
 
-int intel_gpu_reset(struct drm_i915_private *i915, u32 engine_mask);
+int intel_gpu_reset(struct drm_i915_private *i915,
+		    intel_engine_mask_t engine_mask);
 
 int intel_reset_guc(struct drm_i915_private *i915);
 
diff --git a/drivers/gpu/drm/i915/i915_scheduler.h b/drivers/gpu/drm/i915/i915_scheduler.h
index 9a1d257f3d6e..07d243acf553 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.h
+++ b/drivers/gpu/drm/i915/i915_scheduler.h
@@ -8,92 +8,10 @@
 #define _I915_SCHEDULER_H_
 
 #include <linux/bitops.h>
+#include <linux/list.h>
 #include <linux/kernel.h>
 
-#include <uapi/drm/i915_drm.h>
-
-struct drm_i915_private;
-struct i915_request;
-struct intel_engine_cs;
-
-enum {
-	I915_PRIORITY_MIN = I915_CONTEXT_MIN_USER_PRIORITY - 1,
-	I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
-	I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
-
-	I915_PRIORITY_INVALID = INT_MIN
-};
-
-#define I915_USER_PRIORITY_SHIFT 3
-#define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
-
-#define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
-#define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
-
-#define I915_PRIORITY_WAIT		((u8)BIT(0))
-#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
-#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
-
-#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
-
-struct i915_sched_attr {
-	/**
-	 * @priority: execution and service priority
-	 *
-	 * All clients are equal, but some are more equal than others!
-	 *
-	 * Requests from a context with a greater (more positive) value of
-	 * @priority will be executed before those with a lower @priority
-	 * value, forming a simple QoS.
-	 *
-	 * The &drm_i915_private.kernel_context is assigned the lowest priority.
-	 */
-	int priority;
-};
-
-/*
- * "People assume that time is a strict progression of cause to effect, but
- * actually, from a nonlinear, non-subjective viewpoint, it's more like a big
- * ball of wibbly-wobbly, timey-wimey ... stuff." -The Doctor, 2015
- *
- * Requests exist in a complex web of interdependencies. Each request
- * has to wait for some other request to complete before it is ready to be run
- * (e.g. we have to wait until the pixels have been rendering into a texture
- * before we can copy from it). We track the readiness of a request in terms
- * of fences, but we also need to keep the dependency tree for the lifetime
- * of the request (beyond the life of an individual fence). We use the tree
- * at various points to reorder the requests whilst keeping the requests
- * in order with respect to their various dependencies.
- *
- * There is no active component to the "scheduler". As we know the dependency
- * DAG of each request, we are able to insert it into a sorted queue when it
- * is ready, and are able to reorder its portion of the graph to accommodate
- * dynamic priority changes.
- */
-struct i915_sched_node {
-	struct list_head signalers_list; /* those before us, we depend upon */
-	struct list_head waiters_list; /* those after us, they depend upon us */
-	struct list_head link;
-	struct i915_sched_attr attr;
-	unsigned int flags;
-#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
-};
-
-struct i915_dependency {
-	struct i915_sched_node *signaler;
-	struct list_head signal_link;
-	struct list_head wait_link;
-	struct list_head dfs_link;
-	unsigned long flags;
-#define I915_DEPENDENCY_ALLOC BIT(0)
-};
-
-struct i915_priolist {
-	struct list_head requests[I915_PRIORITY_COUNT];
-	struct rb_node node;
-	unsigned long used;
-	int priority;
-};
+#include "i915_scheduler_types.h"
 
 #define priolist_for_each_request(it, plist, idx) \
 	for (idx = 0; idx < ARRAY_SIZE((plist)->requests); idx++) \
diff --git a/drivers/gpu/drm/i915/i915_scheduler_types.h b/drivers/gpu/drm/i915/i915_scheduler_types.h
new file mode 100644
index 000000000000..5c94b3eb5c81
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_scheduler_types.h
@@ -0,0 +1,98 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef _I915_SCHEDULER_TYPES_H_
+#define _I915_SCHEDULER_TYPES_H_
+
+#include <linux/list.h>
+#include <linux/rbtree.h>
+
+#include <uapi/drm/i915_drm.h>
+
+struct drm_i915_private;
+struct i915_request;
+struct intel_engine_cs;
+
+enum {
+	I915_PRIORITY_MIN = I915_CONTEXT_MIN_USER_PRIORITY - 1,
+	I915_PRIORITY_NORMAL = I915_CONTEXT_DEFAULT_PRIORITY,
+	I915_PRIORITY_MAX = I915_CONTEXT_MAX_USER_PRIORITY + 1,
+
+	I915_PRIORITY_INVALID = INT_MIN
+};
+
+#define I915_USER_PRIORITY_SHIFT 3
+#define I915_USER_PRIORITY(x) ((x) << I915_USER_PRIORITY_SHIFT)
+
+#define I915_PRIORITY_COUNT BIT(I915_USER_PRIORITY_SHIFT)
+#define I915_PRIORITY_MASK (I915_PRIORITY_COUNT - 1)
+
+#define I915_PRIORITY_WAIT		((u8)BIT(0))
+#define I915_PRIORITY_NEWCLIENT		((u8)BIT(1))
+#define I915_PRIORITY_NOSEMAPHORE	((u8)BIT(2))
+
+#define __NO_PREEMPTION (I915_PRIORITY_WAIT)
+
+struct i915_sched_attr {
+	/**
+	 * @priority: execution and service priority
+	 *
+	 * All clients are equal, but some are more equal than others!
+	 *
+	 * Requests from a context with a greater (more positive) value of
+	 * @priority will be executed before those with a lower @priority
+	 * value, forming a simple QoS.
+	 *
+	 * The &drm_i915_private.kernel_context is assigned the lowest priority.
+	 */
+	int priority;
+};
+
+/*
+ * "People assume that time is a strict progression of cause to effect, but
+ * actually, from a nonlinear, non-subjective viewpoint, it's more like a big
+ * ball of wibbly-wobbly, timey-wimey ... stuff." -The Doctor, 2015
+ *
+ * Requests exist in a complex web of interdependencies. Each request
+ * has to wait for some other request to complete before it is ready to be run
+ * (e.g. we have to wait until the pixels have been rendering into a texture
+ * before we can copy from it). We track the readiness of a request in terms
+ * of fences, but we also need to keep the dependency tree for the lifetime
+ * of the request (beyond the life of an individual fence). We use the tree
+ * at various points to reorder the requests whilst keeping the requests
+ * in order with respect to their various dependencies.
+ *
+ * There is no active component to the "scheduler". As we know the dependency
+ * DAG of each request, we are able to insert it into a sorted queue when it
+ * is ready, and are able to reorder its portion of the graph to accommodate
+ * dynamic priority changes.
+ */
+struct i915_sched_node {
+	struct list_head signalers_list; /* those before us, we depend upon */
+	struct list_head waiters_list; /* those after us, they depend upon us */
+	struct list_head link;
+	struct i915_sched_attr attr;
+	unsigned int flags;
+#define I915_SCHED_HAS_SEMAPHORE	BIT(0)
+};
+
+struct i915_dependency {
+	struct i915_sched_node *signaler;
+	struct list_head signal_link;
+	struct list_head wait_link;
+	struct list_head dfs_link;
+	unsigned long flags;
+#define I915_DEPENDENCY_ALLOC BIT(0)
+};
+
+struct i915_priolist {
+	struct list_head requests[I915_PRIORITY_COUNT];
+	struct rb_node node;
+	unsigned long used;
+	int priority;
+};
+
+#endif /* _I915_SCHEDULER_TYPES_H_ */
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 9126c8206490..454aa72aee18 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -27,6 +27,7 @@
 
 #include <linux/lockdep.h>
 
+#include "i915_active.h"
 #include "i915_syncmap.h"
 #include "i915_timeline_types.h"
 
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index 8ff146dc05ba..d42053544d7c 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -9,9 +9,10 @@
 
 #include <linux/list.h>
 #include <linux/kref.h>
+#include <linux/mutex.h>
 #include <linux/types.h>
 
-#include "i915_active.h"
+#include "i915_active_types.h"
 
 struct drm_i915_private;
 struct i915_vma;
diff --git a/drivers/gpu/drm/i915/intel_device_info.h b/drivers/gpu/drm/i915/intel_device_info.h
index 6234570a9b17..d20c33a10c11 100644
--- a/drivers/gpu/drm/i915/intel_device_info.h
+++ b/drivers/gpu/drm/i915/intel_device_info.h
@@ -27,6 +27,7 @@
 
 #include <uapi/drm/i915_drm.h>
 
+#include "intel_engine_types.h"
 #include "intel_display.h"
 
 struct drm_printer;
@@ -149,8 +150,6 @@ struct sseu_dev_info {
 	u8 eu_mask[GEN_MAX_SLICES * GEN_MAX_SUBSLICES];
 };
 
-typedef u8 intel_engine_mask_t;
-
 struct intel_device_info {
 	u16 gen_mask;
 
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 88ed7ba8886f..cef1e71a7401 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -13,8 +13,10 @@
 #include <linux/list.h>
 #include <linux/types.h>
 
+#include "i915_gem.h"
+#include "i915_scheduler_types.h"
+#include "i915_selftest.h"
 #include "i915_timeline_types.h"
-#include "intel_device_info.h"
 #include "intel_workarounds_types.h"
 
 #include "i915_gem_batch_pool.h"
@@ -25,11 +27,15 @@
 
 #define I915_CMD_HASH_ORDER 9
 
+struct dma_fence;
 struct drm_i915_reg_table;
 struct i915_gem_context;
 struct i915_request;
 struct i915_sched_attr;
 
+typedef u8 intel_engine_mask_t;
+#define ALL_ENGINES ((intel_engine_mask_t)~0ul)
+
 struct intel_hw_status_page {
 	struct i915_vma *vma;
 	u32 *addr;
diff --git a/drivers/gpu/drm/i915/intel_guc_submission.c b/drivers/gpu/drm/i915/intel_guc_submission.c
index c4ad73980988..9fdfa6585403 100644
--- a/drivers/gpu/drm/i915/intel_guc_submission.c
+++ b/drivers/gpu/drm/i915/intel_guc_submission.c
@@ -367,7 +367,7 @@ static void guc_stage_desc_init(struct intel_guc_client *client)
 	struct intel_engine_cs *engine;
 	struct i915_gem_context *ctx = client->owner;
 	struct guc_stage_desc *desc;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 	u32 gfx_addr;
 
 	desc = __get_stage_desc(client);
diff --git a/drivers/gpu/drm/i915/intel_hangcheck.c b/drivers/gpu/drm/i915/intel_hangcheck.c
index 57ed49dc19c4..4e3a7afa7540 100644
--- a/drivers/gpu/drm/i915/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/intel_hangcheck.c
@@ -221,8 +221,8 @@ static void hangcheck_declare_hang(struct drm_i915_private *i915,
 				   unsigned int stuck)
 {
 	struct intel_engine_cs *engine;
+	intel_engine_mask_t tmp;
 	char msg[80];
-	unsigned int tmp;
 	int len;
 
 	/* If some rings hung but others were still busy, only
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 0759a90c0d5a..028bdbb5f3a7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1466,10 +1466,10 @@ static int igt_vm_isolation(void *arg)
 }
 
 static __maybe_unused const char *
-__engine_name(struct drm_i915_private *i915, unsigned int engines)
+__engine_name(struct drm_i915_private *i915, intel_engine_mask_t engines)
 {
 	struct intel_engine_cs *engine;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 
 	if (engines == ALL_ENGINES)
 		return "all";
@@ -1482,10 +1482,10 @@ __engine_name(struct drm_i915_private *i915, unsigned int engines)
 
 static int __igt_switch_to_kernel_context(struct drm_i915_private *i915,
 					  struct i915_gem_context *ctx,
-					  unsigned int engines)
+					  intel_engine_mask_t engines)
 {
 	struct intel_engine_cs *engine;
-	unsigned int tmp;
+	intel_engine_mask_t tmp;
 	int pass;
 
 	GEM_TRACE("Testing %s\n", __engine_name(i915, engines));
diff --git a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
index 76b4fa150f2e..050bd1e19e02 100644
--- a/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
+++ b/drivers/gpu/drm/i915/selftests/intel_hangcheck.c
@@ -1124,7 +1124,8 @@ static int igt_reset_engines(void *arg)
 	return 0;
 }
 
-static u32 fake_hangcheck(struct drm_i915_private *i915, u32 mask)
+static u32 fake_hangcheck(struct drm_i915_private *i915,
+			  intel_engine_mask_t mask)
 {
 	u32 count = i915_reset_count(&i915->gpu_error);
 
diff --git a/drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c b/drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c
new file mode 100644
index 000000000000..8afa2c3719fb
--- /dev/null
+++ b/drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c
@@ -0,0 +1,7 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2019 Intel Corporation
+ */
+
+#include "i915_scheduler_types.h"
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
  2019-03-19 11:57 ` [PATCH 02/18] drm/i915: Flush pages on acquisition Chris Wilson
  2019-03-19 11:57 ` [PATCH 03/18] drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 13:41   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 05/18] drm/i915: Introduce a mutex for file_priv->context_idr Chris Wilson
                   ` (20 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

In later patches, it became apparent that userspace can see a partially
constructed GEM context and begin using it before it was ready, to much
hilarity. Close this window of opportunity by lifting the registration of
the context with userspace (the insertion of the context into the filp's
idr) to the very end of the CONTEXT_CREATE ioctl.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 138 ++++++++++--------
 drivers/gpu/drm/i915/i915_gem_gtt.c           |   7 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |   8 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |   2 +-
 .../gpu/drm/i915/selftests/i915_gem_context.c |  12 +-
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   2 +-
 drivers/gpu/drm/i915/selftests/mock_context.c |  17 ++-
 7 files changed, 111 insertions(+), 75 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2fa24326307a..dff4220df911 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -337,15 +337,13 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
 }
 
 static struct i915_gem_context *
-__create_hw_context(struct drm_i915_private *dev_priv,
-		    struct drm_i915_file_private *file_priv)
+__create_context(struct drm_i915_private *dev_priv)
 {
 	struct i915_gem_context *ctx;
-	int ret;
 	int i;
 
 	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
-	if (ctx == NULL)
+	if (!ctx)
 		return ERR_PTR(-ENOMEM);
 
 	kref_init(&ctx->ref);
@@ -362,29 +360,6 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 	INIT_LIST_HEAD(&ctx->handles_list);
 	INIT_LIST_HEAD(&ctx->hw_id_link);
 
-	/* Default context will never have a file_priv */
-	ret = DEFAULT_CONTEXT_HANDLE;
-	if (file_priv) {
-		ret = idr_alloc(&file_priv->context_idr, ctx,
-				DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
-		if (ret < 0)
-			goto err_lut;
-	}
-	ctx->user_handle = ret;
-
-	ctx->file_priv = file_priv;
-	if (file_priv) {
-		ctx->pid = get_task_pid(current, PIDTYPE_PID);
-		ctx->name = kasprintf(GFP_KERNEL, "%s[%d]/%x",
-				      current->comm,
-				      pid_nr(ctx->pid),
-				      ctx->user_handle);
-		if (!ctx->name) {
-			ret = -ENOMEM;
-			goto err_pid;
-		}
-	}
-
 	/* NB: Mark all slices as needing a remap so that when the context first
 	 * loads it will restore whatever remap state already exists. If there
 	 * is no remap info, it will be a NOP. */
@@ -401,25 +376,10 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 		ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
 
 	return ctx;
-
-err_pid:
-	put_pid(ctx->pid);
-	idr_remove(&file_priv->context_idr, ctx->user_handle);
-err_lut:
-	context_close(ctx);
-	return ERR_PTR(ret);
-}
-
-static void __destroy_hw_context(struct i915_gem_context *ctx,
-				 struct drm_i915_file_private *file_priv)
-{
-	idr_remove(&file_priv->context_idr, ctx->user_handle);
-	context_close(ctx);
 }
 
 static struct i915_gem_context *
-i915_gem_create_context(struct drm_i915_private *dev_priv,
-			struct drm_i915_file_private *file_priv)
+i915_gem_create_context(struct drm_i915_private *dev_priv)
 {
 	struct i915_gem_context *ctx;
 
@@ -428,18 +388,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 	/* Reap the most stale context */
 	contexts_free_first(dev_priv);
 
-	ctx = __create_hw_context(dev_priv, file_priv);
+	ctx = __create_context(dev_priv);
 	if (IS_ERR(ctx))
 		return ctx;
 
 	if (HAS_FULL_PPGTT(dev_priv)) {
 		struct i915_hw_ppgtt *ppgtt;
 
-		ppgtt = i915_ppgtt_create(dev_priv, file_priv);
+		ppgtt = i915_ppgtt_create(dev_priv);
 		if (IS_ERR(ppgtt)) {
 			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
 					 PTR_ERR(ppgtt));
-			__destroy_hw_context(ctx, file_priv);
+			context_close(ctx);
 			return ERR_CAST(ppgtt);
 		}
 
@@ -475,7 +435,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	if (ret)
 		return ERR_PTR(ret);
 
-	ctx = i915_gem_create_context(to_i915(dev), NULL);
+	ctx = i915_gem_create_context(to_i915(dev));
 	if (IS_ERR(ctx))
 		goto out;
 
@@ -511,7 +471,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 	struct i915_gem_context *ctx;
 	int err;
 
-	ctx = i915_gem_create_context(i915, NULL);
+	ctx = i915_gem_create_context(i915);
 	if (IS_ERR(ctx))
 		return ctx;
 
@@ -625,25 +585,74 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+static int gem_context_register(struct i915_gem_context *ctx,
+				struct drm_i915_file_private *fpriv)
+{
+	int ret;
+
+	ctx->file_priv = fpriv;
+	if (ctx->ppgtt)
+		ctx->ppgtt->vm.file = fpriv;
+
+	ctx->pid = get_task_pid(current, PIDTYPE_PID);
+	ctx->name = kasprintf(GFP_KERNEL, "%s[%d]",
+			      current->comm, pid_nr(ctx->pid));
+	if (!ctx->name) {
+		ret = -ENOMEM;
+		goto err_pid;
+	}
+
+	/* And finally expose ourselves to userspace via the idr */
+	ret = idr_alloc(&fpriv->context_idr, ctx,
+			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
+	if (ret < 0)
+		goto err_name;
+
+	ctx->user_handle = ret;
+
+	return 0;
+
+err_name:
+	kfree(fetch_and_zero(&ctx->name));
+err_pid:
+	put_pid(fetch_and_zero(&ctx->pid));
+	return ret;
+}
+
 int i915_gem_context_open(struct drm_i915_private *i915,
 			  struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
+	int err;
 
 	idr_init(&file_priv->context_idr);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	ctx = i915_gem_create_context(i915, file_priv);
-	mutex_unlock(&i915->drm.struct_mutex);
+
+	ctx = i915_gem_create_context(i915);
 	if (IS_ERR(ctx)) {
-		idr_destroy(&file_priv->context_idr);
-		return PTR_ERR(ctx);
+		err = PTR_ERR(ctx);
+		goto err;
 	}
 
+	err = gem_context_register(ctx, file_priv);
+	if (err)
+		goto err_ctx;
+
+	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
 	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
 
+	mutex_unlock(&i915->drm.struct_mutex);
+
 	return 0;
+
+err_ctx:
+	context_close(ctx);
+err:
+	mutex_unlock(&i915->drm.struct_mutex);
+	idr_destroy(&file_priv->context_idr);
+	return PTR_ERR(ctx);
 }
 
 void i915_gem_context_close(struct drm_file *file)
@@ -835,17 +844,28 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(i915, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
+	ctx = i915_gem_create_context(i915);
+	if (IS_ERR(ctx)) {
+		ret = PTR_ERR(ctx);
+		goto err_unlock;
+	}
 
-	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
+	ret = gem_context_register(ctx, file_priv);
+	if (ret)
+		goto err_ctx;
+
+	mutex_unlock(&dev->struct_mutex);
 
 	args->ctx_id = ctx->user_handle;
 	DRM_DEBUG("HW context %d created\n", args->ctx_id);
 
 	return 0;
+
+err_ctx:
+	context_close(ctx);
+err_unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
 }
 
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
@@ -870,7 +890,9 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		goto out;
 
-	__destroy_hw_context(ctx, file_priv);
+	idr_remove(&file_priv->context_idr, ctx->user_handle);
+	context_close(ctx);
+
 	mutex_unlock(&dev->struct_mutex);
 
 out:
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b8055c8d4e71..b9e0e3a00223 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2069,8 +2069,7 @@ __hw_ppgtt_create(struct drm_i915_private *i915)
 }
 
 struct i915_hw_ppgtt *
-i915_ppgtt_create(struct drm_i915_private *i915,
-		  struct drm_i915_file_private *fpriv)
+i915_ppgtt_create(struct drm_i915_private *i915)
 {
 	struct i915_hw_ppgtt *ppgtt;
 
@@ -2078,8 +2077,6 @@ i915_ppgtt_create(struct drm_i915_private *i915,
 	if (IS_ERR(ppgtt))
 		return ppgtt;
 
-	ppgtt->vm.file = fpriv;
-
 	trace_i915_ppgtt_create(&ppgtt->vm);
 
 	return ppgtt;
@@ -2657,7 +2654,7 @@ int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
 	struct i915_hw_ppgtt *ppgtt;
 	int err;
 
-	ppgtt = i915_ppgtt_create(i915, ERR_PTR(-EPERM));
+	ppgtt = i915_ppgtt_create(i915);
 	if (IS_ERR(ppgtt))
 		return PTR_ERR(ppgtt);
 
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 47a54fbd30bf..8fe03067e698 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -603,15 +603,17 @@ int i915_gem_init_ggtt(struct drm_i915_private *dev_priv);
 void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv);
 
 int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
-void i915_ppgtt_release(struct kref *kref);
-struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
-					struct drm_i915_file_private *fpriv);
+
+struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv);
 void i915_ppgtt_close(struct i915_address_space *vm);
+void i915_ppgtt_release(struct kref *kref);
+
 static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
 		kref_get(&ppgtt->ref);
 }
+
 static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index 218cfc361de3..c5c8ba6c059f 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1710,7 +1710,7 @@ int i915_gem_huge_page_mock_selftests(void)
 	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
-	ppgtt = i915_ppgtt_create(dev_priv, ERR_PTR(-ENODEV));
+	ppgtt = i915_ppgtt_create(dev_priv);
 	if (IS_ERR(ppgtt)) {
 		err = PTR_ERR(ppgtt);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 028bdbb5f3a7..ed72400f2395 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
 	}
 
 	for (n = 0; n < nctx; n++) {
-		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
+		ctx[n] = live_context(i915, file);
 		if (IS_ERR(ctx[n])) {
 			err = PTR_ERR(ctx[n]);
 			goto out_unlock;
@@ -514,7 +514,7 @@ static int igt_ctx_exec(void *arg)
 		struct i915_gem_context *ctx;
 		unsigned int id;
 
-		ctx = i915_gem_create_context(i915, file->driver_priv);
+		ctx = live_context(i915, file);
 		if (IS_ERR(ctx)) {
 			err = PTR_ERR(ctx);
 			goto out_unlock;
@@ -960,7 +960,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = live_context(i915, file);
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1070,7 +1070,7 @@ static int igt_ctx_readonly(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = live_context(i915, file);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1390,13 +1390,13 @@ static int igt_vm_isolation(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx_a = i915_gem_create_context(i915, file->driver_priv);
+	ctx_a = live_context(i915, file);
 	if (IS_ERR(ctx_a)) {
 		err = PTR_ERR(ctx_a);
 		goto out_unlock;
 	}
 
-	ctx_b = i915_gem_create_context(i915, file->driver_priv);
+	ctx_b = live_context(i915, file);
 	if (IS_ERR(ctx_b)) {
 		err = PTR_ERR(ctx_b);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 826fd51c331e..01084f6b4fb7 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1010,7 +1010,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 		return PTR_ERR(file);
 
 	mutex_lock(&dev_priv->drm.struct_mutex);
-	ppgtt = i915_ppgtt_create(dev_priv, file->driver_priv);
+	ppgtt = i915_ppgtt_create(dev_priv);
 	if (IS_ERR(ppgtt)) {
 		err = PTR_ERR(ppgtt);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 8efa6892c6cd..1cc8be732435 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -88,9 +88,24 @@ void mock_init_contexts(struct drm_i915_private *i915)
 struct i915_gem_context *
 live_context(struct drm_i915_private *i915, struct drm_file *file)
 {
+	struct i915_gem_context *ctx;
+	int err;
+
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	return i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915);
+	if (IS_ERR(ctx))
+		return ctx;
+
+	err = gem_context_register(ctx, file->driver_priv);
+	if (err)
+		goto err_ctx;
+
+	return ctx;
+
+err_ctx:
+	i915_gem_context_put(ctx);
+	return ERR_PTR(err);
 }
 
 struct i915_gem_context *
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 05/18] drm/i915: Introduce a mutex for file_priv->context_idr
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (2 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 10:36   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 06/18] drm/i915: Stop storing ctx->user_handle Chris Wilson
                   ` (19 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

Define a mutex for the exclusive use of interacting with the per-file
context-idr, that was previously guarded by struct_mutex. This allows us
to reduce the coverage of struct_mutex, with a view to removing the last
bits coordinating GEM context later. (In the short term, we avoid taking
struct_mutex while using the extended constructor functions, preventing
some nasty recursion.)

v2: s/context_lock/context_idr_lock/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c | 47 +++++++++++--------------
 2 files changed, 23 insertions(+), 26 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 86080a6e0f45..219348121897 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -216,7 +216,9 @@ struct drm_i915_file_private {
  */
 #define DRM_I915_THROTTLE_JIFFIES msecs_to_jiffies(20)
 	} mm;
+
 	struct idr context_idr;
+	struct mutex context_idr_lock; /* guards context_idr */
 
 	unsigned int bsd_engine;
 
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index dff4220df911..799684d05704 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -579,9 +579,7 @@ void i915_gem_contexts_fini(struct drm_i915_private *i915)
 
 static int context_idr_cleanup(int id, void *p, void *data)
 {
-	struct i915_gem_context *ctx = p;
-
-	context_close(ctx);
+	context_close(p);
 	return 0;
 }
 
@@ -603,13 +601,15 @@ static int gem_context_register(struct i915_gem_context *ctx,
 	}
 
 	/* And finally expose ourselves to userspace via the idr */
+	mutex_lock(&fpriv->context_idr_lock);
 	ret = idr_alloc(&fpriv->context_idr, ctx,
 			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
+	if (ret >= 0)
+		ctx->user_handle = ret;
+	mutex_unlock(&fpriv->context_idr_lock);
 	if (ret < 0)
 		goto err_name;
 
-	ctx->user_handle = ret;
-
 	return 0;
 
 err_name:
@@ -627,10 +627,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	int err;
 
 	idr_init(&file_priv->context_idr);
+	mutex_init(&file_priv->context_idr_lock);
 
 	mutex_lock(&i915->drm.struct_mutex);
-
 	ctx = i915_gem_create_context(i915);
+	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto err;
@@ -643,14 +644,14 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
 	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
 
-	mutex_unlock(&i915->drm.struct_mutex);
-
 	return 0;
 
 err_ctx:
+	mutex_lock(&i915->drm.struct_mutex);
 	context_close(ctx);
-err:
 	mutex_unlock(&i915->drm.struct_mutex);
+err:
+	mutex_destroy(&file_priv->context_idr_lock);
 	idr_destroy(&file_priv->context_idr);
 	return PTR_ERR(ctx);
 }
@@ -663,6 +664,7 @@ void i915_gem_context_close(struct drm_file *file)
 
 	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
 	idr_destroy(&file_priv->context_idr);
+	mutex_destroy(&file_priv->context_idr_lock);
 }
 
 static struct i915_request *
@@ -845,25 +847,22 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 		return ret;
 
 	ctx = i915_gem_create_context(i915);
-	if (IS_ERR(ctx)) {
-		ret = PTR_ERR(ctx);
-		goto err_unlock;
-	}
+	mutex_unlock(&dev->struct_mutex);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
 
 	ret = gem_context_register(ctx, file_priv);
 	if (ret)
 		goto err_ctx;
 
-	mutex_unlock(&dev->struct_mutex);
-
 	args->ctx_id = ctx->user_handle;
 	DRM_DEBUG("HW context %d created\n", args->ctx_id);
 
 	return 0;
 
 err_ctx:
+	mutex_lock(&dev->struct_mutex);
 	context_close(ctx);
-err_unlock:
 	mutex_unlock(&dev->struct_mutex);
 	return ret;
 }
@@ -874,7 +873,6 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	struct drm_i915_gem_context_destroy *args = data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
-	int ret;
 
 	if (args->pad != 0)
 		return -EINVAL;
@@ -882,21 +880,18 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
 		return -ENOENT;
 
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
+		return -EINTR;
+
+	ctx = idr_remove(&file_priv->context_idr, args->ctx_id);
+	mutex_unlock(&file_priv->context_idr_lock);
 	if (!ctx)
 		return -ENOENT;
 
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		goto out;
-
-	idr_remove(&file_priv->context_idr, ctx->user_handle);
+	mutex_lock(&dev->struct_mutex);
 	context_close(ctx);
-
 	mutex_unlock(&dev->struct_mutex);
 
-out:
-	i915_gem_context_put(ctx);
 	return 0;
 }
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 06/18] drm/i915: Stop storing ctx->user_handle
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (3 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 05/18] drm/i915: Introduce a mutex for file_priv->context_idr Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 12:58   ` [PATCH v2] " Chris Wilson
  2019-03-19 11:57 ` [PATCH 07/18] drm/i915: Stop storing the context name as the timeline name Chris Wilson
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

The user_handle need only be known by userspace for it to lookup the
context via the idr; internally we have no use for it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  5 ++--
 drivers/gpu/drm/i915/i915_gem_context.c       | 23 ++++++++-----------
 drivers/gpu/drm/i915/i915_gem_context.h       |  5 ----
 drivers/gpu/drm/i915/i915_gem_context_types.h |  9 --------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 11 ++++-----
 drivers/gpu/drm/i915/i915_gpu_error.h         |  1 -
 6 files changed, 15 insertions(+), 39 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f4a07190a0e8..7970770f23a9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -409,9 +409,8 @@ static void print_context_stats(struct seq_file *m,
 
 			rcu_read_lock();
 			task = pid_task(ctx->pid ?: file->pid, PIDTYPE_PID);
-			snprintf(name, sizeof(name), "%s/%d",
-				 task ? task->comm : "<unknown>",
-				 ctx->user_handle);
+			snprintf(name, sizeof(name), "%s",
+				 task ? task->comm : "<unknown>");
 			rcu_read_unlock();
 
 			print_file_stats(m, name, stats);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 799684d05704..95c5103e15a5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -602,20 +602,15 @@ static int gem_context_register(struct i915_gem_context *ctx,
 
 	/* And finally expose ourselves to userspace via the idr */
 	mutex_lock(&fpriv->context_idr_lock);
-	ret = idr_alloc(&fpriv->context_idr, ctx,
-			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
-	if (ret >= 0)
-		ctx->user_handle = ret;
+	ret = idr_alloc(&fpriv->context_idr, ctx, 0, 0, GFP_KERNEL);
 	mutex_unlock(&fpriv->context_idr_lock);
-	if (ret < 0)
-		goto err_name;
-
-	return 0;
+	if (ret >= 0)
+		goto out;
 
-err_name:
 	kfree(fetch_and_zero(&ctx->name));
 err_pid:
 	put_pid(fetch_and_zero(&ctx->pid));
+out:
 	return ret;
 }
 
@@ -638,11 +633,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	}
 
 	err = gem_context_register(ctx, file_priv);
-	if (err)
+	if (err < 0)
 		goto err_ctx;
 
-	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
 	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
+	GEM_BUG_ON(err > 0);
 
 	return 0;
 
@@ -852,10 +847,10 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 		return PTR_ERR(ctx);
 
 	ret = gem_context_register(ctx, file_priv);
-	if (ret)
+	if (ret < 0)
 		goto err_ctx;
 
-	args->ctx_id = ctx->user_handle;
+	args->ctx_id = ret;
 	DRM_DEBUG("HW context %d created\n", args->ctx_id);
 
 	return 0;
@@ -877,7 +872,7 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	if (args->pad != 0)
 		return -EINVAL;
 
-	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
+	if (!args->ctx_id)
 		return -ENOENT;
 
 	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 8a1377691d6d..849b2a83c1ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -126,11 +126,6 @@ static inline void i915_gem_context_unpin_hw_id(struct i915_gem_context *ctx)
 	atomic_dec(&ctx->hw_id_pin_count);
 }
 
-static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
-{
-	return c->user_handle == DEFAULT_CONTEXT_HANDLE;
-}
-
 static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
 {
 	return !ctx->file_priv;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index 2bf19730eaa9..63ae8eb21939 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -129,15 +129,6 @@ struct i915_gem_context {
 	struct list_head active_engines;
 	struct mutex mutex;
 
-	/**
-	 * @user_handle: userspace identifier
-	 *
-	 * A unique per-file identifier is generated from
-	 * &drm_i915_file_private.contexts.
-	 */
-	u32 user_handle;
-#define DEFAULT_CONTEXT_HANDLE 0
-
 	struct i915_sched_attr sched;
 
 	/** hw_contexts: per-engine logical HW state */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index e8674347f589..b101f037b61f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -454,8 +454,8 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
 				const char *header,
 				const struct drm_i915_error_context *ctx)
 {
-	err_printf(m, "%s%s[%d] user_handle %d hw_id %d, prio %d, guilty %d active %d\n",
-		   header, ctx->comm, ctx->pid, ctx->handle, ctx->hw_id,
+	err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
+		   header, ctx->comm, ctx->pid, ctx->hw_id,
 		   ctx->sched_attr.priority, ctx->guilty, ctx->active);
 }
 
@@ -758,11 +758,9 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 		if (obj) {
 			err_puts(m, m->i915->engine[i]->name);
 			if (ee->context.pid)
-				err_printf(m, " (submitted by %s [%d], ctx %d [%d])",
+				err_printf(m, " (submitted by %s [%d])",
 					   ee->context.comm,
-					   ee->context.pid,
-					   ee->context.handle,
-					   ee->context.hw_id);
+					   ee->context.pid);
 			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
 				   upper_32_bits(obj->gtt_offset),
 				   lower_32_bits(obj->gtt_offset));
@@ -1330,7 +1328,6 @@ static void record_context(struct drm_i915_error_context *e,
 		rcu_read_unlock();
 	}
 
-	e->handle = ctx->user_handle;
 	e->hw_id = ctx->hw_id;
 	e->sched_attr = ctx->sched;
 	e->guilty = atomic_read(&ctx->guilty_count);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index d011cb90bee1..5dc761e85d9d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -116,7 +116,6 @@ struct i915_gpu_state {
 		struct drm_i915_error_context {
 			char comm[TASK_COMM_LEN];
 			pid_t pid;
-			u32 handle;
 			u32 hw_id;
 			int active;
 			int guilty;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 07/18] drm/i915: Stop storing the context name as the timeline name
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (4 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 06/18] drm/i915: Stop storing ctx->user_handle Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 12:46   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 08/18] drm/i915: Introduce the i915_user_extension_method Chris Wilson
                   ` (17 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

The timeline->name is only used for convenience in pretty printing the
i915_request.fence->ops->get_timeline_name() and it is just as
convenient to pull it from the gem_context directly. The few instances
of its use inside GEM_TRACE() has proven more of a nuisance than
helpful, so not worth saving imo.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c        | 5 ++---
 drivers/gpu/drm/i915/i915_request.c            | 7 ++-----
 drivers/gpu/drm/i915/i915_timeline.c           | 5 +----
 drivers/gpu/drm/i915/i915_timeline.h           | 2 --
 drivers/gpu/drm/i915/i915_timeline_types.h     | 1 -
 drivers/gpu/drm/i915/intel_engine_cs.c         | 3 +--
 drivers/gpu/drm/i915/intel_lrc.c               | 2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c        | 4 +---
 drivers/gpu/drm/i915/selftests/i915_timeline.c | 6 +++---
 drivers/gpu/drm/i915/selftests/mock_engine.c   | 9 ++-------
 10 files changed, 13 insertions(+), 31 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 95c5103e15a5..196982f38a28 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -673,9 +673,8 @@ last_request_on_engine(struct i915_timeline *timeline,
 	rq = i915_active_request_raw(&timeline->last_request,
 				     &engine->i915->drm.struct_mutex);
 	if (rq && rq->engine == engine) {
-		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
-			  timeline->name, engine->name,
-			  rq->fence.context, rq->fence.seqno);
+		GEM_TRACE("last request on engine %s: %llx:%llu\n",
+			  engine->name, rq->fence.context, rq->fence.seqno);
 		GEM_BUG_ON(rq->timeline != timeline);
 		return rq;
 	}
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 0a3d94517d0a..1529824d7c61 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -66,7 +66,7 @@ static const char *i915_fence_get_timeline_name(struct dma_fence *fence)
 	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
 		return "signaled";
 
-	return to_request(fence)->timeline->name;
+	return to_request(fence)->gem_context->name ?: "[i915]";
 }
 
 static bool i915_fence_signaled(struct dma_fence *fence)
@@ -167,7 +167,6 @@ static void advance_ring(struct i915_request *request)
 		 * is just about to be. Either works, if we miss the last two
 		 * noops - they are safe to be replayed on a reset.
 		 */
-		GEM_TRACE("marking %s as inactive\n", ring->timeline->name);
 		tail = READ_ONCE(request->tail);
 		list_del(&ring->active_link);
 	} else {
@@ -1064,10 +1063,8 @@ void i915_request_add(struct i915_request *request)
 	__i915_active_request_set(&timeline->last_request, request);
 
 	list_add_tail(&request->ring_link, &ring->request_list);
-	if (list_is_first(&request->ring_link, &ring->request_list)) {
-		GEM_TRACE("marking %s as active\n", ring->timeline->name);
+	if (list_is_first(&request->ring_link, &ring->request_list))
 		list_add(&ring->active_link, &request->i915->gt.active_rings);
-	}
 	request->i915->gt.active_engines |= request->engine->mask;
 	request->emitted_jiffies = jiffies;
 
diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
index 8484ba6e51d1..2f4907364920 100644
--- a/drivers/gpu/drm/i915/i915_timeline.c
+++ b/drivers/gpu/drm/i915/i915_timeline.c
@@ -197,7 +197,6 @@ static void cacheline_free(struct i915_timeline_cacheline *cl)
 
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *timeline,
-		       const char *name,
 		       struct i915_vma *hwsp)
 {
 	void *vaddr;
@@ -213,7 +212,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
 	BUILD_BUG_ON(KSYNCMAP < I915_NUM_ENGINES);
 
 	timeline->i915 = i915;
-	timeline->name = name;
 	timeline->pin_count = 0;
 	timeline->has_initial_breadcrumb = !hwsp;
 	timeline->hwsp_cacheline = NULL;
@@ -342,7 +340,6 @@ void i915_timeline_fini(struct i915_timeline *timeline)
 
 struct i915_timeline *
 i915_timeline_create(struct drm_i915_private *i915,
-		     const char *name,
 		     struct i915_vma *global_hwsp)
 {
 	struct i915_timeline *timeline;
@@ -352,7 +349,7 @@ i915_timeline_create(struct drm_i915_private *i915,
 	if (!timeline)
 		return ERR_PTR(-ENOMEM);
 
-	err = i915_timeline_init(i915, timeline, name, global_hwsp);
+	err = i915_timeline_init(i915, timeline, global_hwsp);
 	if (err) {
 		kfree(timeline);
 		return ERR_PTR(err);
diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
index 454aa72aee18..4ca7f80bdf6d 100644
--- a/drivers/gpu/drm/i915/i915_timeline.h
+++ b/drivers/gpu/drm/i915/i915_timeline.h
@@ -33,7 +33,6 @@
 
 int i915_timeline_init(struct drm_i915_private *i915,
 		       struct i915_timeline *tl,
-		       const char *name,
 		       struct i915_vma *hwsp);
 void i915_timeline_fini(struct i915_timeline *tl);
 
@@ -58,7 +57,6 @@ i915_timeline_set_subclass(struct i915_timeline *timeline,
 
 struct i915_timeline *
 i915_timeline_create(struct drm_i915_private *i915,
-		     const char *name,
 		     struct i915_vma *global_hwsp);
 
 static inline struct i915_timeline *
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index d42053544d7c..1f5b55d9ffb5 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -72,7 +72,6 @@ struct i915_timeline {
 	struct i915_active_request barrier;
 
 	struct list_head link;
-	const char *name;
 	struct drm_i915_private *i915;
 
 	struct kref kref;
diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
index 314b86b6f88d..d2a051c53c4a 100644
--- a/drivers/gpu/drm/i915/intel_engine_cs.c
+++ b/drivers/gpu/drm/i915/intel_engine_cs.c
@@ -579,7 +579,6 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
 
 	err = i915_timeline_init(engine->i915,
 				 &engine->timeline,
-				 engine->name,
 				 engine->status_page.vma);
 	if (err)
 		goto err_hwsp;
@@ -658,7 +657,7 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
 		return -ENOMEM;
 
 	if (i915_timeline_init(engine->i915,
-			       &frame->timeline, "measure",
+			       &frame->timeline,
 			       engine->status_page.vma))
 		goto out_frame;
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7e0c20a2d733..b3009086b50e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2802,7 +2802,7 @@ populate_lr_context(struct intel_context *ce,
 
 static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
 {
-	return i915_timeline_create(ctx->i915, ctx->name, NULL);
+	return i915_timeline_create(ctx->i915, NULL);
 }
 
 static int execlists_context_deferred_alloc(struct intel_context *ce,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 0310d5d53bf9..4405ac1b32f3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1533,9 +1533,7 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
 	if (err)
 		return err;
 
-	timeline = i915_timeline_create(engine->i915,
-					engine->name,
-					engine->status_page.vma);
+	timeline = i915_timeline_create(engine->i915, engine->status_page.vma);
 	if (IS_ERR(timeline)) {
 		err = PTR_ERR(timeline);
 		goto err;
diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
index 844701759ffc..8e7bcaa1eb66 100644
--- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
+++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
@@ -64,7 +64,7 @@ static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
 		unsigned long cacheline;
 		int err;
 
-		tl = i915_timeline_create(state->i915, "mock", NULL);
+		tl = i915_timeline_create(state->i915, NULL);
 		if (IS_ERR(tl))
 			return PTR_ERR(tl);
 
@@ -476,7 +476,7 @@ checked_i915_timeline_create(struct drm_i915_private *i915)
 {
 	struct i915_timeline *tl;
 
-	tl = i915_timeline_create(i915, "live", NULL);
+	tl = i915_timeline_create(i915, NULL);
 	if (IS_ERR(tl))
 		return tl;
 
@@ -658,7 +658,7 @@ static int live_hwsp_wrap(void *arg)
 	mutex_lock(&i915->drm.struct_mutex);
 	wakeref = intel_runtime_pm_get(i915);
 
-	tl = i915_timeline_create(i915, __func__, NULL);
+	tl = i915_timeline_create(i915, NULL);
 	if (IS_ERR(tl)) {
 		err = PTR_ERR(tl);
 		goto out_rpm;
diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
index 61744819172b..61a8206ed677 100644
--- a/drivers/gpu/drm/i915/selftests/mock_engine.c
+++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
@@ -50,9 +50,7 @@ static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
 	if (!ring)
 		return NULL;
 
-	if (i915_timeline_init(engine->i915,
-			       &ring->timeline, engine->name,
-			       NULL)) {
+	if (i915_timeline_init(engine->i915, &ring->timeline, NULL)) {
 		kfree(ring);
 		return NULL;
 	}
@@ -259,10 +257,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
 	engine->base.reset.finish = mock_reset_finish;
 	engine->base.cancel_requests = mock_cancel_requests;
 
-	if (i915_timeline_init(i915,
-			       &engine->base.timeline,
-			       engine->base.name,
-			       NULL))
+	if (i915_timeline_init(i915, &engine->base.timeline, NULL))
 		goto err_free;
 	i915_timeline_set_subclass(&engine->base.timeline, TIMELINE_ENGINE);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 08/18] drm/i915: Introduce the i915_user_extension_method
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (5 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 07/18] drm/i915: Stop storing the context name as the timeline name Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 11:57 ` [PATCH 09/18] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.
v4: Fixup local types to match new uabi

Opens:
- do we include the result as an out-field in each chain?
struct i915_user_extension {
	__u64 next_extension;
	__u64 name;
	__s32 result;
	__u32 mbz; /* reserved for future use */
};
* Undecided, so provision some room for future expansion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/Makefile               |  1 +
 drivers/gpu/drm/i915/i915_user_extensions.c | 61 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_user_extensions.h | 20 +++++++
 drivers/gpu/drm/i915/i915_utils.h           | 31 +++++++++++
 include/uapi/drm/i915_drm.h                 | 22 ++++++++
 5 files changed, 135 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 197b081769b5..1f3e8b145fc0 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -46,6 +46,7 @@ i915-y := i915_drv.o \
 	  i915_sw_fence.o \
 	  i915_syncmap.o \
 	  i915_sysfs.o \
+	  i915_user_extensions.o \
 	  intel_csr.o \
 	  intel_device_info.o \
 	  intel_pm.o \
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
new file mode 100644
index 000000000000..c822d0aafd2d
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.c
@@ -0,0 +1,61 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include <linux/nospec.h>
+#include <linux/sched/signal.h>
+#include <linux/uaccess.h>
+
+#include <uapi/drm/i915_drm.h>
+
+#include "i915_user_extensions.h"
+#include "i915_utils.h"
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned int count,
+			 void *data)
+{
+	unsigned int stackdepth = 512;
+
+	while (ext) {
+		int i, err;
+		u32 name;
+		u64 next;
+
+		if (!stackdepth--) /* recursion vs useful flexibility */
+			return -E2BIG;
+
+		err = check_user_mbz(&ext->flags);
+		if (err)
+			return err;
+
+		for (i = 0; i < ARRAY_SIZE(ext->rsvd); i++) {
+			err = check_user_mbz(&ext->rsvd[i]);
+			if (err)
+				return err;
+		}
+
+		if (get_user(name, &ext->name))
+			return -EFAULT;
+
+		err = -EINVAL;
+		if (name < count) {
+			name = array_index_nospec(name, count);
+			if (tbl[name])
+				err = tbl[name](ext, data);
+		}
+		if (err)
+			return err;
+
+		if (get_user(next, &ext->next_extension) ||
+		    overflows_type(next, ext))
+			return -EFAULT;
+
+		ext = u64_to_user_ptr(next);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
new file mode 100644
index 000000000000..a14bf6bba9a1
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.h
@@ -0,0 +1,20 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef I915_USER_EXTENSIONS_H
+#define I915_USER_EXTENSIONS_H
+
+struct i915_user_extension;
+
+typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
+				      void *data);
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned int count,
+			 void *data);
+
+#endif /* I915_USER_EXTENSIONS_H */
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 540e20eb032c..2dbe8933b50a 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -105,6 +105,37 @@
 	__T;								\
 })
 
+/*
+ * container_of_user: Extract the superclass from a pointer to a member.
+ *
+ * Exactly like container_of() with the exception that it plays nicely
+ * with sparse for __user @ptr.
+ */
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })
+
+/*
+ * check_user_mbz: Check that a user value exists and is zero
+ *
+ * Frequently in our uABI we reserve space for future extensions, and
+ * two ensure that userspace is prepared we enforce that space must
+ * be zero. (Then any future extension can safely assume a default value
+ * of 0.)
+ *
+ * check_user_mbz() combines checking that the user pointer is accessible
+ * and that the contained value is zero.
+ *
+ * Returns: -EFAULT if not accessible, -EINVAL if !zero, or 0 on success.
+ */
+#define check_user_mbz(U) ({						\
+	typeof(*(U)) mbz__;						\
+	get_user(mbz__, (U)) ? -EFAULT : mbz__ ? -EINVAL : 0;		\
+})
+
 static inline u64 ptr_to_u64(const void *ptr)
 {
 	return (uintptr_t)ptr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index aa2d4c73a97d..1c69ed16a923 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -62,6 +62,28 @@ extern "C" {
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * i915_user_extension: Base class for defining a chain of extensions
+ *
+ * Many interfaces need to grow over time. In most cases we can simply
+ * extend the struct and have userspace pass in more data. Another option,
+ * as demonstrated by Vulkan's approach to providing extensions for forward
+ * and backward compatibility, is to use a list of optional structs to
+ * provide those extra details.
+ *
+ * The key advantage to using an extension chain is that it allows us to
+ * redefine the interface more easily than an ever growing struct of
+ * increasing complexity, and for large parts of that interface to be
+ * entirely optional. The downside is more pointer chasing; chasing across
+ * the __user boundary with pointers encapsulated inside u64.
+ */
+struct i915_user_extension {
+	__u64 next_extension;
+	__u32 name;
+	__u32 flags; /* All undefined bits must be zero. */
+	__u32 rsvd[4]; /* Reserved for future use; must be zero. */
+};
+
 /*
  * MOCS indexes used for GPU surfaces, defining the cacheability of the
  * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 09/18] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (6 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 08/18] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 13:00   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 10/18] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
                   ` (15 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit
v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
similarly, turned out different!)

v5: Fix up test unwind for aliasing-ppgtt (snb)
v6: Tighten language for uapi struct drm_i915_gem_vm_control.
v7: Patch the context image for runtime ppgtt switching!

Testcase: igt/gem_vm_create
Testcase: igt/gem_ctx_param/vm
Testcase: igt/gem_ctx_clone/vm
Testcase: igt/gem_ctx_shared
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c               |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_gem_context.c       | 331 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  19 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  10 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
 .../gpu/drm/i915/selftests/i915_gem_context.c | 238 ++++++++++---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
 drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
 include/uapi/drm/i915_drm.h                   |  43 +++
 11 files changed, 580 insertions(+), 81 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index a3b00ecc58c9..fa991144e0f2 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3121,6 +3121,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 219348121897..87ef2e031b2e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -220,6 +220,9 @@ struct drm_i915_file_private {
 	struct idr context_idr;
 	struct mutex context_idr_lock; /* guards context_idr */
 
+	struct idr vm_idr;
+	struct mutex vm_idr_lock; /* guards vm_idr */
+
 	unsigned int bsd_engine;
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 196982f38a28..966fbbc154d3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -90,6 +90,7 @@
 #include "i915_drv.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
+#include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
 #include "intel_workarounds.h"
 
@@ -120,12 +121,15 @@ static void lut_close(struct i915_gem_context *ctx)
 		list_del(&lut->obj_link);
 		i915_lut_handle_free(lut);
 	}
+	INIT_LIST_HEAD(&ctx->handles_list);
 
 	rcu_read_lock();
 	radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) {
 		struct i915_vma *vma = rcu_dereference_raw(*slot);
 
 		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
+
+		vma->open_count--;
 		__i915_gem_object_release_unless_active(vma->obj);
 	}
 	rcu_read_unlock();
@@ -305,8 +309,6 @@ static void context_close(struct i915_gem_context *ctx)
 	 * the ppgtt).
 	 */
 	lut_close(ctx);
-	if (ctx->ppgtt)
-		i915_ppgtt_close(&ctx->ppgtt->vm);
 
 	ctx->file_priv = ERR_PTR(-EBADF);
 	i915_gem_context_put(ctx);
@@ -378,6 +380,28 @@ __create_context(struct drm_i915_private *dev_priv)
 	return ctx;
 }
 
+static struct i915_hw_ppgtt *
+__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_hw_ppgtt *old = ctx->ppgtt;
+
+	ctx->ppgtt = i915_ppgtt_get(ppgtt);
+	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
+
+	return old;
+}
+
+static void __assign_ppgtt(struct i915_gem_context *ctx,
+			   struct i915_hw_ppgtt *ppgtt)
+{
+	if (ppgtt == ctx->ppgtt)
+		return;
+
+	ppgtt = __set_ppgtt(ctx, ppgtt);
+	if (ppgtt)
+		i915_ppgtt_put(ppgtt);
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv)
 {
@@ -403,8 +427,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv)
 			return ERR_CAST(ppgtt);
 		}
 
-		ctx->ppgtt = ppgtt;
-		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
+		__assign_ppgtt(ctx, ppgtt);
+		i915_ppgtt_put(ppgtt);
 	}
 
 	trace_i915_context_create(ctx);
@@ -583,6 +607,12 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+static int vm_idr_cleanup(int id, void *p, void *data)
+{
+	i915_ppgtt_put(p);
+	return 0;
+}
+
 static int gem_context_register(struct i915_gem_context *ctx,
 				struct drm_i915_file_private *fpriv)
 {
@@ -621,8 +651,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	struct i915_gem_context *ctx;
 	int err;
 
-	idr_init(&file_priv->context_idr);
 	mutex_init(&file_priv->context_idr_lock);
+	mutex_init(&file_priv->vm_idr_lock);
+
+	idr_init(&file_priv->context_idr);
+	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	ctx = i915_gem_create_context(i915);
@@ -646,8 +679,10 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	context_close(ctx);
 	mutex_unlock(&i915->drm.struct_mutex);
 err:
-	mutex_destroy(&file_priv->context_idr_lock);
+	idr_destroy(&file_priv->vm_idr);
 	idr_destroy(&file_priv->context_idr);
+	mutex_destroy(&file_priv->vm_idr_lock);
+	mutex_destroy(&file_priv->context_idr_lock);
 	return PTR_ERR(ctx);
 }
 
@@ -660,6 +695,99 @@ void i915_gem_context_close(struct drm_file *file)
 	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
 	idr_destroy(&file_priv->context_idr);
 	mutex_destroy(&file_priv->context_idr_lock);
+
+	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
+	idr_destroy(&file_priv->vm_idr);
+	mutex_destroy(&file_priv->vm_idr_lock);
+}
+
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_vm_control *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+
+	if (!HAS_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (args->flags)
+		return -EINVAL;
+
+	ppgtt = i915_ppgtt_create(i915);
+	if (IS_ERR(ppgtt))
+		return PTR_ERR(ppgtt);
+
+	ppgtt->vm.file = file_priv;
+
+	if (args->extensions) {
+		err = i915_user_extensions(u64_to_user_ptr(args->extensions),
+					   NULL, 0,
+					   ppgtt);
+		if (err)
+			goto err_put;
+	}
+
+	err = mutex_lock_interruptible(&file_priv->vm_idr_lock);
+	if (err)
+		goto err_put;
+
+	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+	if (err < 0)
+		goto err_unlock;
+
+	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
+	ppgtt->user_handle = err;
+
+	mutex_unlock(&file_priv->vm_idr_lock);
+
+	args->vm_id = err;
+	return 0;
+
+err_unlock:
+	mutex_unlock(&file_priv->vm_idr_lock);
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_vm_control *args = data;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+	u32 id;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	id = args->vm_id;
+	if (!id)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&file_priv->vm_idr_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_remove(&file_priv->vm_idr, id);
+	if (ppgtt) {
+		GEM_BUG_ON(ppgtt->user_handle != id);
+		ppgtt->user_handle = 0;
+	}
+
+	mutex_unlock(&file_priv->vm_idr_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	i915_ppgtt_put(ppgtt);
+	return 0;
 }
 
 static struct i915_request *
@@ -702,12 +830,13 @@ static void cb_retire(struct i915_active *base)
 I915_SELFTEST_DECLARE(static intel_engine_mask_t context_barrier_inject_fault);
 static int context_barrier_task(struct i915_gem_context *ctx,
 				intel_engine_mask_t engines,
+				int (*emit)(struct i915_request *rq, void *data),
 				void (*task)(void *data),
 				void *data)
 {
 	struct drm_i915_private *i915 = ctx->i915;
 	struct context_barrier_task *cb;
-	struct intel_context *ce;
+	struct intel_context *ce, *next;
 	intel_wakeref_t wakeref;
 	int err = 0;
 
@@ -722,11 +851,11 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 	i915_active_acquire(&cb->base);
 
 	wakeref = intel_runtime_pm_get(i915);
-	list_for_each_entry(ce, &ctx->active_engines, active_link) {
+	rbtree_postorder_for_each_entry_safe(ce, next, &ctx->hw_contexts, node) {
 		struct intel_engine_cs *engine = ce->engine;
 		struct i915_request *rq;
 
-		if (!(ce->engine->mask & engines))
+		if (!(engine->mask & engines))
 			continue;
 
 		if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
@@ -741,7 +870,12 @@ static int context_barrier_task(struct i915_gem_context *ctx,
 			break;
 		}
 
-		err = i915_active_ref(&cb->base, rq->fence.context, rq);
+		err = 0;
+		if (emit)
+			err = emit(rq, data);
+		if (err == 0)
+			err = i915_active_ref(&cb->base, rq->fence.context, rq);
+
 		i915_request_add(rq);
 		if (err)
 			break;
@@ -804,6 +938,170 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 	return 0;
 }
 
+static int get_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int ret;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	/* XXX rcu acquire? */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ppgtt = i915_ppgtt_get(ctx->ppgtt);
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	ret = mutex_lock_interruptible(&file_priv->vm_idr_lock);
+	if (ret)
+		goto err_put;
+
+	if (!ppgtt->user_handle) {
+		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+		GEM_BUG_ON(!ret);
+		if (ret < 0)
+			goto err_unlock;
+
+		ppgtt->user_handle = ret;
+		i915_ppgtt_get(ppgtt);
+	}
+
+	args->size = 0;
+	args->value = ppgtt->user_handle;
+
+	ret = 0;
+err_unlock:
+	mutex_unlock(&file_priv->vm_idr_lock);
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return ret;
+}
+
+static void set_ppgtt_barrier(void *data)
+{
+	struct i915_hw_ppgtt *old = data;
+
+	if (INTEL_GEN(old->vm.i915) < 8)
+		gen6_ppgtt_unpin_all(old);
+
+	i915_ppgtt_put(old);
+}
+
+static int emit_ppgtt_update(struct i915_request *rq, void *data)
+{
+	struct i915_hw_ppgtt *ppgtt = rq->gem_context->ppgtt;
+	struct intel_engine_cs *engine = rq->engine;
+	u32 *cs;
+	int i;
+
+	if (i915_vm_is_4lvl(&ppgtt->vm)) {
+		const dma_addr_t pd_daddr = px_dma(&ppgtt->pml4);
+
+		cs = intel_ring_begin(rq, 6);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		*cs++ = MI_LOAD_REGISTER_IMM(2);
+
+		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
+		*cs++ = upper_32_bits(pd_daddr);
+		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
+		*cs++ = lower_32_bits(pd_daddr);
+
+		*cs++ = MI_NOOP;
+		intel_ring_advance(rq, cs);
+	} else if (HAS_LOGICAL_RING_CONTEXTS(engine->i915)) {
+		cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2);
+		if (IS_ERR(cs))
+			return PTR_ERR(cs);
+
+		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES);
+		for (i = GEN8_3LVL_PDPES; i--; ) {
+			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
+
+			*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, i));
+			*cs++ = upper_32_bits(pd_daddr);
+			*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, i));
+			*cs++ = lower_32_bits(pd_daddr);
+		}
+		*cs++ = MI_NOOP;
+		intel_ring_advance(rq, cs);
+	} else {
+		/* ppGTT is not part of the legacy context image */
+		gen6_ppgtt_pin(ppgtt);
+	}
+
+	return 0;
+}
+
+static int set_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt, *old;
+	int err;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	if (upper_32_bits(args->value))
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&file_priv->vm_idr_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_find(&file_priv->vm_idr, args->value);
+	if (ppgtt) {
+		GEM_BUG_ON(ppgtt->user_handle != args->value);
+		i915_ppgtt_get(ppgtt);
+	}
+	mutex_unlock(&file_priv->vm_idr_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (err)
+		goto out;
+
+	if (ppgtt == ctx->ppgtt)
+		goto unlock;
+
+	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
+	lut_close(ctx);
+
+	old = __set_ppgtt(ctx, ppgtt);
+
+	/*
+	 * We need to flush any requests using the current ppgtt before
+	 * we release it as the requests do not hold a reference themselves,
+	 * only indirectly through the context.
+	 */
+	err = context_barrier_task(ctx, ALL_ENGINES,
+				   emit_ppgtt_update,
+				   set_ppgtt_barrier,
+				   old);
+	if (err) {
+		ctx->ppgtt = old;
+		ctx->desc_template = default_desc_template(ctx->i915, old);
+		i915_ppgtt_put(ppgtt);
+	}
+
+unlock:
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+out:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
 {
 	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
@@ -984,6 +1282,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = get_sseu(ctx, args);
 		break;
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1285,9 +1586,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
 			ret = -EINVAL;
@@ -1343,9 +1641,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 					I915_USER_PRIORITY(priority);
 		}
 		break;
+
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = set_sseu(ctx, args);
 		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = set_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 849b2a83c1ec..23dcb01bfd82 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -148,6 +148,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev);
 
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
+
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index b9e0e3a00223..736c845eb77f 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -1937,6 +1937,8 @@ int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
 	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
 	int err;
 
+	GEM_BUG_ON(ppgtt->base.vm.closed);
+
 	/*
 	 * Workaround the limited maximum vma->pin_count and the aliasing_ppgtt
 	 * which will be pinned into every active context.
@@ -1975,6 +1977,17 @@ void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base)
 	i915_vma_unpin(ppgtt->vma);
 }
 
+void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base)
+{
+	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
+
+	if (!ppgtt->pin_count)
+		return;
+
+	ppgtt->pin_count = 0;
+	i915_vma_unpin(ppgtt->vma);
+}
+
 static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
 {
 	struct i915_ggtt * const ggtt = &i915->ggtt;
@@ -2082,12 +2095,6 @@ i915_ppgtt_create(struct drm_i915_private *i915)
 	return ppgtt;
 }
 
-void i915_ppgtt_close(struct i915_address_space *vm)
-{
-	GEM_BUG_ON(vm->closed);
-	vm->closed = true;
-}
-
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
 {
 	struct list_head *phases[] = {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index 8fe03067e698..f597f35b109b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -396,6 +396,8 @@ struct i915_hw_ppgtt {
 		struct i915_page_directory_pointer pdp;	/* GEN8+ */
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
+
+	u32 user_handle;
 };
 
 struct gen6_hw_ppgtt {
@@ -605,13 +607,12 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv);
 int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
 
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv);
-void i915_ppgtt_close(struct i915_address_space *vm);
 void i915_ppgtt_release(struct kref *kref);
 
-static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
+static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
-	if (ppgtt)
-		kref_get(&ppgtt->ref);
+	kref_get(&ppgtt->ref);
+	return ppgtt;
 }
 
 static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
@@ -622,6 +623,7 @@ static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 
 int gen6_ppgtt_pin(struct i915_hw_ppgtt *base);
 void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base);
+void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base);
 
 void i915_check_and_clear_faults(struct drm_i915_private *dev_priv);
 void i915_gem_suspend_gtt_mappings(struct drm_i915_private *dev_priv);
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index c5c8ba6c059f..90721b54e7ae 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1732,7 +1732,6 @@ int i915_gem_huge_page_mock_selftests(void)
 	err = i915_subtests(tests, ppgtt);
 
 out_close:
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index ed72400f2395..5e7e2a9193fe 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -373,7 +373,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
 	return 0;
 }
 
-static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
+static noinline int cpu_check(struct drm_i915_gem_object *obj,
+			      unsigned int idx, unsigned int max)
 {
 	unsigned int n, m, needs_flush;
 	int err;
@@ -391,8 +392,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (m = 0; m < max; m++) {
 			if (map[m] != m) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], m);
+				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
+				       __builtin_return_address(0), idx,
+				       n, real_page_count(obj), m, max,
+				       map[m], m);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -400,8 +403,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (; m < DW_PER_PAGE; m++) {
 			if (map[m] != STACK_MAGIC) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], STACK_MAGIC);
+				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
+				       __builtin_return_address(0), idx, n, m,
+				       map[m], STACK_MAGIC);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -479,12 +483,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
 static int igt_ctx_exec(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct drm_i915_gem_object *obj = NULL;
-	unsigned long ncontexts, ndwords, dw;
-	struct igt_live_test t;
-	struct drm_file *file;
-	IGT_TIMEOUT(end_time);
-	LIST_HEAD(objects);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int err = -ENODEV;
 
 	/*
@@ -496,44 +496,167 @@ static int igt_ctx_exec(void *arg)
 	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return 0;
 
+	for_each_engine(engine, i915, id) {
+		struct drm_i915_gem_object *obj = NULL;
+		unsigned long ncontexts, ndwords, dw;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		if (!engine->context_size)
+			continue; /* No logical context support in HW */
+
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
+			goto out_unlock;
+
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			struct i915_gem_context *ctx;
+			intel_wakeref_t wakeref;
+
+			ctx = live_context(i915, file);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
+
+			if (!obj) {
+				obj = create_test_object(ctx, file, &objects);
+				if (IS_ERR(obj)) {
+					err = PTR_ERR(obj);
+					goto out_unlock;
+				}
+			}
+
+			with_intel_runtime_pm(i915, wakeref)
+				err = gpu_fill(obj, ctx, engine, dw);
+			if (err) {
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				       ndwords, dw, max_dwords(obj),
+				       engine->name, ctx->hw_id,
+				       yesno(!!ctx->ppgtt), err);
+				goto out_unlock;
+			}
+
+			if (++dw == max_dwords(obj)) {
+				obj = NULL;
+				dw = 0;
+			}
+
+			ndwords++;
+			ncontexts++;
+		}
+
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
+
+out_unlock:
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
+
+		mock_file_free(i915, file);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int igt_shared_ctx_exec(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *parent;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct igt_live_test t;
+	struct drm_file *file;
+	int err = 0;
+
+	/*
+	 * Create a few different contexts with the same mm and write
+	 * through each ctx using the GPU making sure those writes end
+	 * up in the expected pages of our obj.
+	 */
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
+		return 0;
+
 	file = mock_file(i915);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
 	mutex_lock(&i915->drm.struct_mutex);
 
+	parent = live_context(i915, file);
+	if (IS_ERR(parent)) {
+		err = PTR_ERR(parent);
+		goto out_unlock;
+	}
+
+	if (!parent->ppgtt) { /* not full-ppgtt; nothing to share */
+		err = 0;
+		goto out_unlock;
+	}
+
 	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
-	ncontexts = 0;
-	ndwords = 0;
-	dw = 0;
-	while (!time_after(jiffies, end_time)) {
-		struct intel_engine_cs *engine;
-		struct i915_gem_context *ctx;
-		unsigned int id;
+	for_each_engine(engine, i915, id) {
+		unsigned long ncontexts, ndwords, dw;
+		struct drm_i915_gem_object *obj = NULL;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-		ctx = live_context(i915, file);
-		if (IS_ERR(ctx)) {
-			err = PTR_ERR(ctx);
-			goto out_unlock;
-		}
+		if (!intel_engine_can_store_dword(engine))
+			continue;
 
-		for_each_engine(engine, i915, id) {
+		dw = 0;
+		ndwords = 0;
+		ncontexts = 0;
+		while (!time_after(jiffies, end_time)) {
+			struct i915_gem_context *ctx;
 			intel_wakeref_t wakeref;
 
-			if (!engine->context_size)
-				continue; /* No logical context support in HW */
+			ctx = kernel_context(i915);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_test;
+			}
 
-			if (!intel_engine_can_store_dword(engine))
-				continue;
+			__assign_ppgtt(ctx, parent->ppgtt);
 
 			if (!obj) {
-				obj = create_test_object(ctx, file, &objects);
+				obj = create_test_object(parent, file, &objects);
 				if (IS_ERR(obj)) {
 					err = PTR_ERR(obj);
-					goto out_unlock;
+					kernel_context_close(ctx);
+					goto out_test;
 				}
 			}
 
@@ -545,35 +668,39 @@ static int igt_ctx_exec(void *arg)
 				       ndwords, dw, max_dwords(obj),
 				       engine->name, ctx->hw_id,
 				       yesno(!!ctx->ppgtt), err);
-				goto out_unlock;
+				kernel_context_close(ctx);
+				goto out_test;
 			}
 
 			if (++dw == max_dwords(obj)) {
 				obj = NULL;
 				dw = 0;
 			}
+
 			ndwords++;
+			ncontexts++;
+
+			kernel_context_close(ctx);
 		}
-		ncontexts++;
-	}
-	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
-		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
 
-	dw = 0;
-	list_for_each_entry(obj, &objects, st_link) {
-		unsigned int rem =
-			min_t(unsigned int, ndwords - dw, max_dwords(obj));
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
 
-		err = cpu_check(obj, rem);
-		if (err)
-			break;
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				goto out_test;
 
-		dw += rem;
+			dw += rem;
+		}
 	}
-
-out_unlock:
+out_test:
 	if (igt_live_test_end(&t))
 		err = -EIO;
+out_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	mock_file_free(i915, file);
@@ -1046,7 +1173,7 @@ static int igt_ctx_readonly(void *arg)
 	struct drm_i915_gem_object *obj = NULL;
 	struct i915_gem_context *ctx;
 	struct i915_hw_ppgtt *ppgtt;
-	unsigned long ndwords, dw;
+	unsigned long idx, ndwords, dw;
 	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
@@ -1127,6 +1254,7 @@ static int igt_ctx_readonly(void *arg)
 		ndwords, RUNTIME_INFO(i915)->num_engines);
 
 	dw = 0;
+	idx = 0;
 	list_for_each_entry(obj, &objects, st_link) {
 		unsigned int rem =
 			min_t(unsigned int, ndwords - dw, max_dwords(obj));
@@ -1136,7 +1264,7 @@ static int igt_ctx_readonly(void *arg)
 		if (i915_gem_object_is_readonly(obj))
 			num_writes = 0;
 
-		err = cpu_check(obj, num_writes);
+		err = cpu_check(obj, idx++, num_writes);
 		if (err)
 			break;
 
@@ -1619,7 +1747,8 @@ static int mock_context_barrier(void *arg)
 	}
 
 	counter = 0;
-	err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
+	err = context_barrier_task(ctx, 0,
+				   NULL, mock_barrier_task, &counter);
 	if (err) {
 		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
 		goto out;
@@ -1631,8 +1760,8 @@ static int mock_context_barrier(void *arg)
 	}
 
 	counter = 0;
-	err = context_barrier_task(ctx,
-				   ALL_ENGINES, mock_barrier_task, &counter);
+	err = context_barrier_task(ctx, ALL_ENGINES,
+				   NULL, mock_barrier_task, &counter);
 	if (err) {
 		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
 		goto out;
@@ -1655,8 +1784,8 @@ static int mock_context_barrier(void *arg)
 
 	counter = 0;
 	context_barrier_inject_fault = BIT(RCS0);
-	err = context_barrier_task(ctx,
-				   ALL_ENGINES, mock_barrier_task, &counter);
+	err = context_barrier_task(ctx, ALL_ENGINES,
+				   NULL, mock_barrier_task, &counter);
 	context_barrier_inject_fault = 0;
 	if (err == -ENXIO)
 		err = 0;
@@ -1670,8 +1799,8 @@ static int mock_context_barrier(void *arg)
 		goto out;
 
 	counter = 0;
-	err = context_barrier_task(ctx,
-				   ALL_ENGINES, mock_barrier_task, &counter);
+	err = context_barrier_task(ctx, ALL_ENGINES,
+				   NULL, mock_barrier_task, &counter);
 	if (err) {
 		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
 		goto out;
@@ -1719,6 +1848,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_ctx_exec),
 		SUBTEST(igt_ctx_readonly),
 		SUBTEST(igt_ctx_sseu),
+		SUBTEST(igt_shared_ctx_exec),
 		SUBTEST(igt_vm_isolation),
 	};
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 01084f6b4fb7..9cca66e4420a 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 
 	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
 
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 out_unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 1cc8be732435..cfc9012c8e49 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
 		goto err_handles;
 
 	if (name) {
+		struct i915_hw_ppgtt *ppgtt;
+
 		ctx->name = kstrdup(name, GFP_KERNEL);
 		if (!ctx->name)
 			goto err_put;
 
-		ctx->ppgtt = mock_ppgtt(i915, name);
-		if (!ctx->ppgtt)
+		ppgtt = mock_ppgtt(i915, name);
+		if (!ppgtt)
 			goto err_put;
+
+		__set_ppgtt(ctx, ppgtt);
 	}
 
 	return ctx;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 1c69ed16a923..9af7a8e6a46e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -343,6 +343,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_PERF_ADD_CONFIG	0x37
 #define DRM_I915_PERF_REMOVE_CONFIG	0x38
 #define DRM_I915_QUERY			0x39
+#define DRM_I915_GEM_VM_CREATE		0x3a
+#define DRM_I915_GEM_VM_DESTROY		0x3b
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -402,6 +404,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
 #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1453,6 +1457,33 @@ struct drm_i915_gem_context_destroy {
 	__u32 pad;
 };
 
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned in the outparam @vm_id.
+ *
+ * No flags are defined, with all bits reserved and must be zero.
+ *
+ * An extension chain maybe provided, starting with @extensions, and terminated
+ * by the @next_extension being 0. Currently, no extensions are defined.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id, specified in @vm_id.
+ *
+ * No extensions or flags are allowed currently, and so must be zero.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 vm_id;
+};
+
 struct drm_i915_reg_read {
 	/*
 	 * Register offset.
@@ -1542,7 +1573,19 @@ struct drm_i915_gem_context_param {
  * On creation, all new contexts are marked as recoverable.
  */
 #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
+
+	/*
+	 * The id of the associated virtual memory address space (ppGTT) of
+	 * this context. Can be retrieved and passed to another context
+	 * (on the same fd) for both to use the same ppGTT and so share
+	 * address layouts, and avoid reloading the page tables on context
+	 * switches between themselves.
+	 *
+	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
+	 */
+#define I915_CONTEXT_PARAM_VM		0x9
 /* Must be kept compact -- no holes and well documented */
+
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 10/18] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (7 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 09/18] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 11:57 ` [PATCH 11/18] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

It can be useful to have a single ioctl to create a context with all
the initial parameters instead of a series of create + setparam + setparam
ioctls. This extension to create context allows any of the parameters
to be passed in as a linked list to be applied to the newly constructed
context.

v2: Make a local copy of user setparam (Tvrtko)
v3: Use flags to detect availability of extension interface

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c | 452 +++++++++++++-----------
 include/uapi/drm/i915_drm.h             | 180 +++++-----
 3 files changed, 349 insertions(+), 285 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index fa991144e0f2..9a0fa3b21e9d 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3110,7 +3110,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_SET_SPRITE_COLORKEY, intel_sprite_set_colorkey_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GET_SPRITE_COLORKEY, drm_noop, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 966fbbc154d3..0d16edbb38c3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1102,198 +1102,6 @@ static int set_ppgtt(struct i915_gem_context *ctx,
 	return err;
 }
 
-static bool client_is_banned(struct drm_i915_file_private *file_priv)
-{
-	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
-}
-
-int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
-				  struct drm_file *file)
-{
-	struct drm_i915_private *i915 = to_i915(dev);
-	struct drm_i915_gem_context_create *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (!DRIVER_CAPS(i915)->has_logical_contexts)
-		return -ENODEV;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	ret = i915_terminally_wedged(i915);
-	if (ret)
-		return ret;
-
-	if (client_is_banned(file_priv)) {
-		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
-			  current->comm,
-			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
-
-		return -EIO;
-	}
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
-	ctx = i915_gem_create_context(i915);
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
-
-	ret = gem_context_register(ctx, file_priv);
-	if (ret < 0)
-		goto err_ctx;
-
-	args->ctx_id = ret;
-	DRM_DEBUG("HW context %d created\n", args->ctx_id);
-
-	return 0;
-
-err_ctx:
-	mutex_lock(&dev->struct_mutex);
-	context_close(ctx);
-	mutex_unlock(&dev->struct_mutex);
-	return ret;
-}
-
-int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
-				   struct drm_file *file)
-{
-	struct drm_i915_gem_context_destroy *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	if (!args->ctx_id)
-		return -ENOENT;
-
-	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
-		return -EINTR;
-
-	ctx = idr_remove(&file_priv->context_idr, args->ctx_id);
-	mutex_unlock(&file_priv->context_idr_lock);
-	if (!ctx)
-		return -ENOENT;
-
-	mutex_lock(&dev->struct_mutex);
-	context_close(ctx);
-	mutex_unlock(&dev->struct_mutex);
-
-	return 0;
-}
-
-static int get_sseu(struct i915_gem_context *ctx,
-		    struct drm_i915_gem_context_param *args)
-{
-	struct drm_i915_gem_context_param_sseu user_sseu;
-	struct intel_engine_cs *engine;
-	struct intel_context *ce;
-
-	if (args->size == 0)
-		goto out;
-	else if (args->size < sizeof(user_sseu))
-		return -EINVAL;
-
-	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
-			   sizeof(user_sseu)))
-		return -EFAULT;
-
-	if (user_sseu.flags || user_sseu.rsvd)
-		return -EINVAL;
-
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
-	if (!engine)
-		return -EINVAL;
-
-	ce = intel_context_pin_lock(ctx, engine); /* serialises with set_sseu */
-	if (IS_ERR(ce))
-		return PTR_ERR(ce);
-
-	user_sseu.slice_mask = ce->sseu.slice_mask;
-	user_sseu.subslice_mask = ce->sseu.subslice_mask;
-	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
-	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
-
-	intel_context_pin_unlock(ce);
-
-	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
-			 sizeof(user_sseu)))
-		return -EFAULT;
-
-out:
-	args->size = sizeof(user_sseu);
-
-	return 0;
-}
-
-int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
-{
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
-	int ret = 0;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
-	case I915_CONTEXT_PARAM_NO_ZEROMAP:
-		args->size = 0;
-		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-		break;
-	case I915_CONTEXT_PARAM_GTT_SIZE:
-		args->size = 0;
-
-		if (ctx->ppgtt)
-			args->value = ctx->ppgtt->vm.total;
-		else if (to_i915(dev)->mm.aliasing_ppgtt)
-			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
-		else
-			args->value = to_i915(dev)->ggtt.vm.total;
-		break;
-	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
-		args->size = 0;
-		args->value = i915_gem_context_no_error_capture(ctx);
-		break;
-	case I915_CONTEXT_PARAM_BANNABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_bannable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_RECOVERABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_recoverable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_PRIORITY:
-		args->size = 0;
-		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
-		break;
-	case I915_CONTEXT_PARAM_SSEU:
-		ret = get_sseu(ctx, args);
-		break;
-	case I915_CONTEXT_PARAM_VM:
-		ret = get_ppgtt(ctx, args);
-		break;
-	default:
-		ret = -EINVAL;
-		break;
-	}
-
-	i915_gem_context_put(ctx);
-	return ret;
-}
-
 static int gen8_emit_rpcs_config(struct i915_request *rq,
 				 struct intel_context *ce,
 				 struct intel_sseu sseu)
@@ -1573,18 +1381,11 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 }
 
-int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
+static int ctx_setparam(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
 {
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
 	int ret = 0;
 
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
 	switch (args->param) {
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
@@ -1594,6 +1395,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
 		break;
+
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1602,6 +1404,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_no_error_capture(ctx);
 		break;
+
 	case I915_CONTEXT_PARAM_BANNABLE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1628,7 +1431,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 			if (args->size)
 				ret = -EINVAL;
-			else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
+			else if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
 				ret = -ENODEV;
 			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
 				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
@@ -1656,6 +1459,251 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 
+	return ret;
+}
+
+struct create_ext {
+	struct i915_gem_context *ctx;
+	struct drm_i915_file_private *fpriv;
+};
+
+static int create_setparam(struct i915_user_extension __user *ext, void *data)
+{
+	struct drm_i915_gem_context_create_ext_setparam local;
+	const struct create_ext *arg = data;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	if (local.param.ctx_id)
+		return -EINVAL;
+
+	return ctx_setparam(arg->ctx, &local.param);
+}
+
+static const i915_user_extension_fn create_extensions[] = {
+	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+};
+
+static bool client_is_banned(struct drm_i915_file_private *file_priv)
+{
+	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
+}
+
+int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_context_create_ext *args = data;
+	struct create_ext ext_data;
+	int ret;
+
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
+		return -ENODEV;
+
+	if (args->flags & I915_CONTEXT_CREATE_FLAGS_UNKNOWN)
+		return -EINVAL;
+
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ret;
+
+	ext_data.fpriv = file->driver_priv;
+	if (client_is_banned(ext_data.fpriv)) {
+		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
+			  current->comm,
+			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
+		return -EIO;
+	}
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	ext_data.ctx = i915_gem_create_context(i915);
+	mutex_unlock(&dev->struct_mutex);
+	if (IS_ERR(ext_data.ctx))
+		return PTR_ERR(ext_data.ctx);
+
+	if (args->flags & I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS) {
+		ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
+					   create_extensions,
+					   ARRAY_SIZE(create_extensions),
+					   &ext_data);
+		if (ret)
+			goto err_ctx;
+	}
+
+	ret = gem_context_register(ext_data.ctx, ext_data.fpriv);
+	if (ret < 0)
+		goto err_ctx;
+
+	args->ctx_id = ret;
+	DRM_DEBUG("HW context %d created\n", args->ctx_id);
+
+	return 0;
+
+err_ctx:
+	mutex_lock(&dev->struct_mutex);
+	context_close(ext_data.ctx);
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
+int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file)
+{
+	struct drm_i915_gem_context_destroy *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+
+	if (args->pad != 0)
+		return -EINVAL;
+
+	if (!args->ctx_id)
+		return -ENOENT;
+
+	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
+		return -EINTR;
+
+	ctx = idr_remove(&file_priv->context_idr, args->ctx_id);
+	mutex_unlock(&file_priv->context_idr_lock);
+	if (!ctx)
+		return -ENOENT;
+
+	mutex_lock(&dev->struct_mutex);
+	context_close(ctx);
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+}
+
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+
+	if (args->size == 0)
+		goto out;
+	else if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.flags || user_sseu.rsvd)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.engine_class,
+					  user_sseu.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	ce = intel_context_pin_lock(ctx, engine); /* serialises with set_sseu */
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	intel_context_pin_unlock(ce);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+out:
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
+int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret = 0;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	switch (args->param) {
+	case I915_CONTEXT_PARAM_NO_ZEROMAP:
+		args->size = 0;
+		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
+		break;
+
+	case I915_CONTEXT_PARAM_GTT_SIZE:
+		args->size = 0;
+		if (ctx->ppgtt)
+			args->value = ctx->ppgtt->vm.total;
+		else if (to_i915(dev)->mm.aliasing_ppgtt)
+			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
+		else
+			args->value = to_i915(dev)->ggtt.vm.total;
+		break;
+
+	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->size = 0;
+		args->value = i915_gem_context_no_error_capture(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_BANNABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_bannable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_recoverable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->size = 0;
+		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
+		break;
+
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	i915_gem_context_put(ctx);
+	return ret;
+}
+
+int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = ctx_setparam(ctx, args);
+
 	i915_gem_context_put(ctx);
 	return ret;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9af7a8e6a46e..d45b79746fc4 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -394,6 +394,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
 #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
@@ -1447,92 +1448,17 @@ struct drm_i915_gem_wait {
 };
 
 struct drm_i915_gem_context_create {
-	/*  output: id of new context*/
-	__u32 ctx_id;
-	__u32 pad;
-};
-
-struct drm_i915_gem_context_destroy {
-	__u32 ctx_id;
-	__u32 pad;
-};
-
-/*
- * DRM_I915_GEM_VM_CREATE -
- *
- * Create a new virtual memory address space (ppGTT) for use within a context
- * on the same file. Extensions can be provided to configure exactly how the
- * address space is setup upon creation.
- *
- * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
- * returned in the outparam @vm_id.
- *
- * No flags are defined, with all bits reserved and must be zero.
- *
- * An extension chain maybe provided, starting with @extensions, and terminated
- * by the @next_extension being 0. Currently, no extensions are defined.
- *
- * DRM_I915_GEM_VM_DESTROY -
- *
- * Destroys a previously created VM id, specified in @vm_id.
- *
- * No extensions or flags are allowed currently, and so must be zero.
- */
-struct drm_i915_gem_vm_control {
-	__u64 extensions;
-	__u32 flags;
-	__u32 vm_id;
-};
-
-struct drm_i915_reg_read {
-	/*
-	 * Register offset.
-	 * For 64bit wide registers where the upper 32bits don't immediately
-	 * follow the lower 32bits, the offset of the lower 32bits must
-	 * be specified
-	 */
-	__u64 offset;
-#define I915_REG_READ_8B_WA (1ul << 0)
-
-	__u64 val; /* Return value */
-};
-/* Known registers:
- *
- * Render engine timestamp - 0x2358 + 64bit - gen7+
- * - Note this register returns an invalid value if using the default
- *   single instruction 8byte read, in order to workaround that pass
- *   flag I915_REG_READ_8B_WA in offset field.
- *
- */
-
-struct drm_i915_reset_stats {
-	__u32 ctx_id;
-	__u32 flags;
-
-	/* All resets since boot/module reload, for all contexts */
-	__u32 reset_count;
-
-	/* Number of batches lost when active in GPU, for this context */
-	__u32 batch_active;
-
-	/* Number of batches lost pending for execution, for this context */
-	__u32 batch_pending;
-
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 pad;
 };
 
-struct drm_i915_gem_userptr {
-	__u64 user_ptr;
-	__u64 user_size;
+struct drm_i915_gem_context_create_ext {
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
-#define I915_USERPTR_READ_ONLY 0x1
-#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
-	/**
-	 * Returned handle for the object.
-	 *
-	 * Object handles are nonzero.
-	 */
-	__u32 handle;
+#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
+#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
+	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
+	__u64 extensions;
 };
 
 struct drm_i915_gem_context_param {
@@ -1648,6 +1574,96 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct drm_i915_gem_context_create_ext_setparam {
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	struct i915_user_extension base;
+	struct drm_i915_gem_context_param param;
+};
+
+struct drm_i915_gem_context_destroy {
+	__u32 ctx_id;
+	__u32 pad;
+};
+
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned in the outparam @id.
+ *
+ * No flags are defined, with all bits reserved and must be zero.
+ *
+ * An extension chain maybe provided, starting with @extensions, and terminated
+ * by the @next_extension being 0. Currently, no extensions are defined.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id, specified in @id.
+ *
+ * No extensions or flags are allowed currently, and so must be zero.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 vm_id;
+};
+
+struct drm_i915_reg_read {
+	/*
+	 * Register offset.
+	 * For 64bit wide registers where the upper 32bits don't immediately
+	 * follow the lower 32bits, the offset of the lower 32bits must
+	 * be specified
+	 */
+	__u64 offset;
+#define I915_REG_READ_8B_WA (1ul << 0)
+
+	__u64 val; /* Return value */
+};
+
+/* Known registers:
+ *
+ * Render engine timestamp - 0x2358 + 64bit - gen7+
+ * - Note this register returns an invalid value if using the default
+ *   single instruction 8byte read, in order to workaround that pass
+ *   flag I915_REG_READ_8B_WA in offset field.
+ *
+ */
+
+struct drm_i915_reset_stats {
+	__u32 ctx_id;
+	__u32 flags;
+
+	/* All resets since boot/module reload, for all contexts */
+	__u32 reset_count;
+
+	/* Number of batches lost when active in GPU, for this context */
+	__u32 batch_active;
+
+	/* Number of batches lost pending for execution, for this context */
+	__u32 batch_pending;
+
+	__u32 pad;
+};
+
+struct drm_i915_gem_userptr {
+	__u64 user_ptr;
+	__u64 user_size;
+	__u32 flags;
+#define I915_USERPTR_READ_ONLY 0x1
+#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 11/18] drm/i915: Allow contexts to share a single timeline across all engines
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (8 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 10/18] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 11:57 ` [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

v2: Move the specialised timeline ordering to its own function.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 31 +++++--
 drivers/gpu/drm/i915/i915_gem_context_types.h |  2 +
 drivers/gpu/drm/i915/i915_request.c           | 80 +++++++++++++------
 drivers/gpu/drm/i915/i915_request.h           |  5 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          | 39 +++++++--
 drivers/gpu/drm/i915/i915_sw_fence.h          | 13 ++-
 drivers/gpu/drm/i915/intel_lrc.c              |  5 +-
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
 include/uapi/drm/i915_drm.h                   |  3 +-
 9 files changed, 138 insertions(+), 42 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 0d16edbb38c3..fc1f64e19507 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -238,6 +238,9 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		intel_context_put(it);
 
+	if (ctx->timeline)
+		i915_timeline_put(ctx->timeline);
+
 	kfree(ctx->name);
 	put_pid(ctx->pid);
 
@@ -403,12 +406,16 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
 }
 
 static struct i915_gem_context *
-i915_gem_create_context(struct drm_i915_private *dev_priv)
+i915_gem_create_context(struct drm_i915_private *dev_priv, unsigned int flags)
 {
 	struct i915_gem_context *ctx;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
+	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE &&
+	    !HAS_EXECLISTS(dev_priv))
+		return ERR_PTR(-EINVAL);
+
 	/* Reap the most stale context */
 	contexts_free_first(dev_priv);
 
@@ -431,6 +438,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv)
 		i915_ppgtt_put(ppgtt);
 	}
 
+	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) {
+		struct i915_timeline *timeline;
+
+		timeline = i915_timeline_create(dev_priv, NULL);
+		if (IS_ERR(timeline)) {
+			context_close(ctx);
+			return ERR_CAST(timeline);
+		}
+
+		ctx->timeline = timeline;
+	}
+
 	trace_i915_context_create(ctx);
 
 	return ctx;
@@ -459,7 +478,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	if (ret)
 		return ERR_PTR(ret);
 
-	ctx = i915_gem_create_context(to_i915(dev));
+	ctx = i915_gem_create_context(to_i915(dev), 0);
 	if (IS_ERR(ctx))
 		goto out;
 
@@ -495,7 +514,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 	struct i915_gem_context *ctx;
 	int err;
 
-	ctx = i915_gem_create_context(i915);
+	ctx = i915_gem_create_context(i915, 0);
 	if (IS_ERR(ctx))
 		return ctx;
 
@@ -658,7 +677,7 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	ctx = i915_gem_create_context(i915);
+	ctx = i915_gem_create_context(i915, 0);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
@@ -800,7 +819,7 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	rq = i915_active_request_raw(&timeline->last_request,
 				     &engine->i915->drm.struct_mutex);
-	if (rq && rq->engine == engine) {
+	if (rq && rq->engine->mask & engine->mask) {
 		GEM_TRACE("last request on engine %s: %llx:%llu\n",
 			  engine->name, rq->fence.context, rq->fence.seqno);
 		GEM_BUG_ON(rq->timeline != timeline);
@@ -1520,7 +1539,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ext_data.ctx = i915_gem_create_context(i915);
+	ext_data.ctx = i915_gem_create_context(i915, args->flags);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ext_data.ctx))
 		return PTR_ERR(ext_data.ctx);
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index 63ae8eb21939..e2ec58b10fb2 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -41,6 +41,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct i915_timeline *timeline;
+
 	/**
 	 * @ppgtt: unique address space (GTT)
 	 *
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 1529824d7c61..e9c2094ab8ea 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -992,6 +992,60 @@ void i915_request_skip(struct i915_request *rq, int error)
 	memset(vaddr + head, 0, rq->postfix - head);
 }
 
+static struct i915_request *
+__i915_request_add_to_timeline(struct i915_request *rq)
+{
+	struct i915_timeline *timeline = rq->timeline;
+	struct i915_request *prev;
+
+	/*
+	 * Dependency tracking and request ordering along the timeline
+	 * is special cased so that we can eliminate redundant ordering
+	 * operations while building the request (we know that the timeline
+	 * itself is ordered, and here we guarantee it).
+	 *
+	 * As we know we will need to emit tracking along the timeline,
+	 * we embed the hooks into our request struct -- at the cost of
+	 * having to have specialised no-allocation interfaces (which will
+	 * be beneficial elsewhere).
+	 *
+	 * A second benefit to open-coding i915_request_await_request is
+	 * that we can apply a slight variant of the rules specialised
+	 * for timelines that jump between engines (such as virtual engines).
+	 * If we consider the case of virtual engine, we must emit a dma-fence
+	 * to prevent scheduling of the second request until the first is
+	 * complete (to maximise our greedy late load balancing) and this
+	 * precludes optimising to use semaphores serialisation of a single
+	 * timeline across engines.
+	 */
+	prev = i915_active_request_raw(&timeline->last_request,
+				       &rq->i915->drm.struct_mutex);
+	if (prev && !i915_request_completed(prev)) {
+		if (is_power_of_2(prev->engine->mask | rq->engine->mask))
+			i915_sw_fence_await_sw_fence(&rq->submit,
+						     &prev->submit,
+						     &rq->submitq);
+		else
+			__i915_sw_fence_await_dma_fence(&rq->submit,
+							&prev->fence,
+							&rq->dmaq);
+		if (rq->engine->schedule)
+			__i915_sched_node_add_dependency(&rq->sched,
+							 &prev->sched,
+							 &rq->dep,
+							 0);
+	}
+
+	spin_lock_irq(&timeline->lock);
+	list_add_tail(&rq->link, &timeline->requests);
+	spin_unlock_irq(&timeline->lock);
+
+	GEM_BUG_ON(timeline->seqno != rq->fence.seqno);
+	__i915_active_request_set(&timeline->last_request, rq);
+
+	return prev;
+}
+
 /*
  * NB: This function is not allowed to fail. Doing so would mean the the
  * request is not being tracked for completion but the work itself is
@@ -1036,31 +1090,7 @@ void i915_request_add(struct i915_request *request)
 	GEM_BUG_ON(IS_ERR(cs));
 	request->postfix = intel_ring_offset(request, cs);
 
-	/*
-	 * Seal the request and mark it as pending execution. Note that
-	 * we may inspect this state, without holding any locks, during
-	 * hangcheck. Hence we apply the barrier to ensure that we do not
-	 * see a more recent value in the hws than we are tracking.
-	 */
-
-	prev = i915_active_request_raw(&timeline->last_request,
-				       &request->i915->drm.struct_mutex);
-	if (prev && !i915_request_completed(prev)) {
-		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
-					     &request->submitq);
-		if (engine->schedule)
-			__i915_sched_node_add_dependency(&request->sched,
-							 &prev->sched,
-							 &request->dep,
-							 0);
-	}
-
-	spin_lock_irq(&timeline->lock);
-	list_add_tail(&request->link, &timeline->requests);
-	spin_unlock_irq(&timeline->lock);
-
-	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
-	__i915_active_request_set(&timeline->last_request, request);
+	prev = __i915_request_add_to_timeline(request);
 
 	list_add_tail(&request->ring_link, &ring->request_list);
 	if (list_is_first(&request->ring_link, &ring->request_list))
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 8c8fa5010644..cd6c130964cd 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -128,7 +128,10 @@ struct i915_request {
 	 * It is used by the driver to then queue the request for execution.
 	 */
 	struct i915_sw_fence submit;
-	wait_queue_entry_t submitq;
+	union {
+		wait_queue_entry_t submitq;
+		struct i915_sw_dma_fence_cb dmaq;
+	};
 	struct list_head execute_cb;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 8d1400d378d7..5387aafd3424 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -359,11 +359,6 @@ int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 	return __i915_sw_fence_await_sw_fence(fence, signaler, NULL, gfp);
 }
 
-struct i915_sw_dma_fence_cb {
-	struct dma_fence_cb base;
-	struct i915_sw_fence *fence;
-};
-
 struct i915_sw_dma_fence_cb_timer {
 	struct i915_sw_dma_fence_cb base;
 	struct dma_fence *dma;
@@ -480,6 +475,40 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	return ret;
 }
 
+static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
+				     struct dma_fence_cb *data)
+{
+	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
+
+	i915_sw_fence_complete(cb->fence);
+}
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb)
+{
+	int ret;
+
+	debug_fence_assert(fence);
+
+	if (dma_fence_is_signaled(dma))
+		return 0;
+
+	cb->fence = fence;
+	i915_sw_fence_await(fence);
+
+	ret = dma_fence_add_callback(dma, &cb->base, __dma_i915_sw_fence_wake);
+	if (ret == 0) {
+		ret = 1;
+	} else {
+		i915_sw_fence_complete(fence);
+		if (ret == -ENOENT) /* fence already signaled */
+			ret = 0;
+	}
+
+	return ret;
+}
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 6dec9e1d1102..9cb5c3b307a6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -9,14 +9,13 @@
 #ifndef _I915_SW_FENCE_H_
 #define _I915_SW_FENCE_H_
 
+#include <linux/dma-fence.h>
 #include <linux/gfp.h>
 #include <linux/kref.h>
 #include <linux/notifier.h> /* for NOTIFY_DONE */
 #include <linux/wait.h>
 
 struct completion;
-struct dma_fence;
-struct dma_fence_ops;
 struct reservation_object;
 
 struct i915_sw_fence {
@@ -68,10 +67,20 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 				     struct i915_sw_fence *after,
 				     gfp_t gfp);
+
+struct i915_sw_dma_fence_cb {
+	struct dma_fence_cb base;
+	struct i915_sw_fence *fence;
+};
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb);
 int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  struct dma_fence *dma,
 				  unsigned long timeout,
 				  gfp_t gfp);
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b3009086b50e..0e64317ddcbf 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2802,7 +2802,10 @@ populate_lr_context(struct intel_context *ce,
 
 static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
 {
-	return i915_timeline_create(ctx->i915, NULL);
+	if (ctx->timeline)
+		return i915_timeline_get(ctx->timeline);
+	else
+		return i915_timeline_create(ctx->i915, NULL);
 }
 
 static int execlists_context_deferred_alloc(struct intel_context *ce,
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index cfc9012c8e49..163aa9b66f25 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -97,7 +97,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
 
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	ctx = i915_gem_create_context(i915);
+	ctx = i915_gem_create_context(i915, 0);
 	if (IS_ERR(ctx))
 		return ctx;
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index d45b79746fc4..9999f7d6a5a9 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1456,8 +1456,9 @@ struct drm_i915_gem_context_create_ext {
 	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
+#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
-	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
+	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
 	__u64 extensions;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (9 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 11/18] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 13:13   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 13/18] drm/i915: Allow a context to define its set of engines Chris Wilson
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

A usecase arose out of handling context recovery in mesa, whereby they
wish to recreate a context with fresh logical state but preserving all
other details of the original. Currently, they create a new context and
iterate over which bits they want to copy across, but it would much more
convenient if they were able to just pass in a target context to clone
during creation. This essentially extends the setparam during creation
to pull the details from a target context instead of the user supplied
parameters.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 154 ++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h             |  14 +++
 2 files changed, 168 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fc1f64e19507..f36648329074 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1500,8 +1500,162 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
 	return ctx_setparam(arg->ctx, &local.param);
 }
 
+static int clone_flags(struct i915_gem_context *dst,
+		       struct i915_gem_context *src)
+{
+	dst->user_flags = src->user_flags;
+	return 0;
+}
+
+static int clone_schedattr(struct i915_gem_context *dst,
+			   struct i915_gem_context *src)
+{
+	dst->sched = src->sched;
+	return 0;
+}
+
+static int clone_sseu(struct i915_gem_context *dst,
+		      struct i915_gem_context *src)
+{
+	const struct intel_sseu default_sseu =
+		intel_device_default_sseu(dst->i915);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, dst->i915, id) {
+		struct intel_context *ce;
+		struct intel_sseu sseu;
+
+		ce = intel_context_lookup(src, engine);
+		if (!ce)
+			continue;
+
+		sseu = ce->sseu;
+		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
+			continue;
+
+		ce = intel_context_pin_lock(dst, engine);
+		if (IS_ERR(ce))
+			return PTR_ERR(ce);
+
+		ce->sseu = sseu;
+		intel_context_pin_unlock(ce);
+	}
+
+	return 0;
+}
+
+static int clone_timeline(struct i915_gem_context *dst,
+			  struct i915_gem_context *src)
+{
+	if (src->timeline) {
+		GEM_BUG_ON(src->timeline == dst->timeline);
+
+		if (dst->timeline)
+			i915_timeline_put(dst->timeline);
+		dst->timeline = i915_timeline_get(src->timeline);
+	}
+
+	return 0;
+}
+
+static int clone_vm(struct i915_gem_context *dst,
+		    struct i915_gem_context *src)
+{
+	struct i915_hw_ppgtt *ppgtt;
+
+	rcu_read_lock();
+	do {
+		ppgtt = READ_ONCE(src->ppgtt);
+		if (!ppgtt)
+			break;
+
+		if (!kref_get_unless_zero(&ppgtt->ref))
+			continue;
+
+		/*
+		 * This ppgtt may have be reallocated between
+		 * the read and the kref, and reassigned to a third
+		 * context. In order to avoid inadvertent sharing
+		 * of this ppgtt with that third context (and not
+		 * src), we have to confirm that we have the same
+		 * ppgtt after passing through the strong memory
+		 * barrier implied by a successful
+		 * kref_get_unless_zero().
+		 *
+		 * Once we have acquired the current ppgtt of src,
+		 * we no longer care if it is released from src, as
+		 * it cannot be reallocated elsewhere.
+		 */
+
+		if (ppgtt == READ_ONCE(src->ppgtt))
+			break;
+
+		i915_ppgtt_put(ppgtt);
+	} while (1);
+	rcu_read_unlock();
+
+	if (ppgtt) {
+		__assign_ppgtt(dst, ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+
+	return 0;
+}
+
+static int create_clone(struct i915_user_extension __user *ext, void *data)
+{
+	static int (* const fn[])(struct i915_gem_context *dst,
+				  struct i915_gem_context *src) = {
+#define MAP(x, y) [ilog2(I915_CONTEXT_CLONE_##x)] = y
+		MAP(FLAGS, clone_flags),
+		MAP(SCHEDATTR, clone_schedattr),
+		MAP(SSEU, clone_sseu),
+		MAP(TIMELINE, clone_timeline),
+		MAP(VM, clone_vm),
+#undef MAP
+	};
+	struct drm_i915_gem_context_create_ext_clone local;
+	const struct create_ext *arg = data;
+	struct i915_gem_context *dst = arg->ctx;
+	struct i915_gem_context *src;
+	int err, bit;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	BUILD_BUG_ON(GENMASK(BITS_PER_TYPE(local.flags) - 1, ARRAY_SIZE(fn)) !=
+		     I915_CONTEXT_CLONE_UNKNOWN);
+
+	if (local.flags & I915_CONTEXT_CLONE_UNKNOWN)
+		return -EINVAL;
+
+	if (local.rsvd)
+		return -EINVAL;
+
+	rcu_read_lock();
+	src = __i915_gem_context_lookup_rcu(arg->fpriv, local.clone_id);
+	rcu_read_unlock();
+	if (!src)
+		return -ENOENT;
+
+	GEM_BUG_ON(src == dst);
+
+	for (bit = 0; bit < ARRAY_SIZE(fn); bit++) {
+		if (!(local.flags & BIT(bit)))
+			continue;
+
+		err = fn[bit](dst, src);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+	[I915_CONTEXT_CREATE_EXT_CLONE] = create_clone,
 };
 
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9999f7d6a5a9..a5bdb86858f6 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1581,6 +1581,20 @@ struct drm_i915_gem_context_create_ext_setparam {
 	struct drm_i915_gem_context_param param;
 };
 
+struct drm_i915_gem_context_create_ext_clone {
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
+	struct i915_user_extension base;
+	__u32 clone_id;
+	__u32 flags;
+#define I915_CONTEXT_CLONE_FLAGS	(1u << 0)
+#define I915_CONTEXT_CLONE_SCHEDATTR	(1u << 1)
+#define I915_CONTEXT_CLONE_SSEU		(1u << 2)
+#define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
+#define I915_CONTEXT_CLONE_VM		(1u << 4)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+	__u64 rsvd;
+};
+
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 13/18] drm/i915: Allow a context to define its set of engines
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (10 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 13:20   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 14/18] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
                   ` (11 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
execbuf. It's use will be revealed in the next patch.

v2: Fixup freeing of local on success of get_engines()
v3: Allow empty engines[]
v4: s/nengine/num_engines/

Testcase: igt/gem_ctx_engines
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 226 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context_types.h |  21 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  19 +-
 drivers/gpu/drm/i915/i915_utils.h             |  36 +++
 include/uapi/drm/i915_drm.h                   |  42 +++-
 5 files changed, 331 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f36648329074..f038c15e73d8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -86,7 +86,9 @@
  */
 
 #include <linux/log2.h>
+
 #include <drm/i915_drm.h>
+
 #include "i915_drv.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
@@ -101,6 +103,21 @@ static struct i915_global_gem_context {
 	struct kmem_cache *slab_luts;
 } global;
 
+static struct intel_engine_cs *
+lookup_user_engine(struct i915_gem_context *ctx,
+		   unsigned long flags, u16 class, u16 instance)
+#define LOOKUP_USER_INDEX BIT(0)
+{
+	if (flags & LOOKUP_USER_INDEX) {
+		if (instance >= ctx->num_engines)
+			return NULL;
+
+		return ctx->engines[instance];
+	}
+
+	return intel_engine_lookup_user(ctx->i915, class, instance);
+}
+
 struct i915_lut_handle *i915_lut_handle_alloc(void)
 {
 	return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
@@ -235,6 +252,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
+	kfree(ctx->engines);
+
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		intel_context_put(it);
 
@@ -1377,9 +1396,9 @@ static int set_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1397,9 +1416,166 @@ static int set_sseu(struct i915_gem_context *ctx,
 
 	args->size = sizeof(user_sseu);
 
+	return 0;
+};
+
+struct set_engines {
+	struct i915_gem_context *ctx;
+	struct intel_engine_cs **engines;
+	unsigned int num_engines;
+};
+
+static const i915_user_extension_fn set_engines__extensions[] = {
+};
+
+static int
+set_engines(struct i915_gem_context *ctx,
+	    const struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines __user *user;
+	struct set_engines set = { .ctx = ctx };
+	u64 size, extensions;
+	unsigned int n;
+	int err;
+
+	user = u64_to_user_ptr(args->value);
+	size = args->size;
+	if (!size)
+		goto out;
+
+	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
+	if (size < sizeof(*user) ||
+	    !IS_ALIGNED(size, sizeof(*user->class_instance)))
+		return -EINVAL;
+
+	/* Internal limitation of u64 bitmaps + a few bits of u64 in the uABI */
+	set.num_engines =
+		(size - sizeof(*user)) / sizeof(*user->class_instance);
+	if (set.num_engines > I915_EXEC_RING_MASK + 1)
+		return -EINVAL;
+
+	set.engines = kmalloc_array(set.num_engines,
+				    sizeof(*set.engines),
+				    GFP_KERNEL);
+	if (!set.engines)
+		return -ENOMEM;
+
+	for (n = 0; n < set.num_engines; n++) {
+		u16 class, inst;
+
+		if (get_user(class, &user->class_instance[n].engine_class) ||
+		    get_user(inst, &user->class_instance[n].engine_instance)) {
+			kfree(set.engines);
+			return -EFAULT;
+		}
+
+		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
+		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
+			set.engines[n] = NULL;
+			continue;
+		}
+
+		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
+		if (!set.engines[n]) {
+			kfree(set.engines);
+			return -ENOENT;
+		}
+	}
+
+	err = -EFAULT;
+	if (!get_user(extensions, &user->extensions))
+		err = i915_user_extensions(u64_to_user_ptr(extensions),
+					   set_engines__extensions,
+					   ARRAY_SIZE(set_engines__extensions),
+					   &set);
+	if (err) {
+		kfree(set.engines);
+		return err;
+	}
+
+out:
+	mutex_lock(&ctx->i915->drm.struct_mutex);
+	kfree(ctx->engines);
+	ctx->engines = set.engines;
+	ctx->num_engines = set.num_engines;
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
 	return 0;
 }
 
+static int
+get_engines(struct i915_gem_context *ctx,
+	    struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines *local;
+	size_t n, count, size;
+	int err = 0;
+
+restart:
+	if (!READ_ONCE(ctx->engines)) {
+		args->size = 0;
+		return 0;
+	}
+
+	count = READ_ONCE(ctx->num_engines);
+
+	/* Be paranoid in case we have an impedance mismatch */
+	if (!check_struct_size(local, class_instance, count, &size))
+		return -ENOMEM;
+	if (unlikely(overflows_type(size, args->size)))
+		return -ENOMEM;
+
+	if (!args->size) {
+		args->size = size;
+		return 0;
+	}
+
+	if (args->size < size)
+		return -EINVAL;
+
+	local = kmalloc(size, GFP_KERNEL);
+	if (!local)
+		return -ENOMEM;
+
+	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
+		err = -EINTR;
+		goto out;
+	}
+
+	if (!ctx->engines || ctx->num_engines != count) {
+		mutex_unlock(&ctx->i915->drm.struct_mutex);
+		kfree(local);
+		goto restart;
+	}
+
+	local->extensions = 0;
+	for (n = 0; n < count; n++) {
+		if (ctx->engines[n]) {
+			local->class_instance[n].engine_class =
+				ctx->engines[n]->uabi_class;
+			local->class_instance[n].engine_instance =
+				ctx->engines[n]->instance;
+		} else {
+			local->class_instance[n].engine_class =
+				I915_ENGINE_CLASS_INVALID;
+			local->class_instance[n].engine_instance =
+				I915_ENGINE_CLASS_INVALID_NONE;
+		}
+	}
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
+		err = -EFAULT;
+		goto out;
+	}
+	args->size = size;
+
+out:
+	kfree(local);
+	return err;
+}
+
 static int ctx_setparam(struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
 {
@@ -1472,6 +1648,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
 		ret = set_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = set_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@@ -1500,6 +1680,35 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
 	return ctx_setparam(arg->ctx, &local.param);
 }
 
+static int clone_engines(struct i915_gem_context *dst,
+			 struct i915_gem_context *src)
+{
+	struct intel_engine_cs **engines;
+	unsigned int num_engines;
+
+	mutex_lock(&src->i915->drm.struct_mutex); /* serialise src->engine[] */
+
+	/* handle ZERO_SIZE_PTR on behalf of kmemdup */
+	num_engines = src->num_engines;
+	engines = src->engines;
+	if (!ZERO_OR_NULL_PTR(engines)) {
+		engines = kmemdup(engines,
+				  sizeof(*engines) * num_engines,
+				  GFP_KERNEL);
+		if (!engines) {
+			mutex_unlock(&src->i915->drm.struct_mutex);
+			return -ENOMEM;
+		}
+	}
+
+	mutex_unlock(&src->i915->drm.struct_mutex);
+
+	kfree(dst->engines);
+	dst->engines = engines;
+	dst->num_engines = num_engines;
+	return 0;
+}
+
 static int clone_flags(struct i915_gem_context *dst,
 		       struct i915_gem_context *src)
 {
@@ -1608,6 +1817,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 	static int (* const fn[])(struct i915_gem_context *dst,
 				  struct i915_gem_context *src) = {
 #define MAP(x, y) [ilog2(I915_CONTEXT_CLONE_##x)] = y
+		MAP(ENGINES, clone_engines),
 		MAP(FLAGS, clone_flags),
 		MAP(SCHEDATTR, clone_schedattr),
 		MAP(SSEU, clone_sseu),
@@ -1770,9 +1980,9 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1853,6 +2063,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = get_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index e2ec58b10fb2..46b6080b2240 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -41,6 +41,20 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	/**
+	 * @engines: User defined engines for this context
+	 *
+	 * NULL means to use legacy definitions (including random meaning of
+	 * I915_EXEC_BSD with I915_EXEC_BSD_SELECTOR overrides).
+	 *
+	 * If defined, execbuf uses the I915_EXEC_MASK as an index into
+	 * array, and various uAPI other the ability to lookup up an
+	 * index from this array to select an engine operate on.
+	 *
+	 * User defined by I915_CONTEXT_PARAM_ENGINE.
+	 */
+	struct intel_engine_cs **engines;
+
 	struct i915_timeline *timeline;
 
 	/**
@@ -110,6 +124,13 @@ struct i915_gem_context {
 #define CONTEXT_CLOSED			1
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
 
+	/**
+	 * @num_engines: Number of user defined engines for this context
+	 *
+	 * See @engines for the elements.
+	 */
+	unsigned int num_engines;
+
 	/**
 	 * @hw_id: - unique identifier for the context
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 3d672c9edb94..66b3921cc8bd 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2089,13 +2089,20 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
 };
 
 static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
+eb_select_engine(struct i915_execbuffer *eb,
 		 struct drm_file *file,
 		 struct drm_i915_gem_execbuffer2 *args)
 {
 	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
+	if (eb->ctx->engines) {
+		if (user_ring_id >= eb->ctx->num_engines)
+			return NULL;
+
+		return eb->ctx->engines[user_ring_id];
+	}
+
 	if (user_ring_id > I915_USER_RINGS) {
 		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
 		return NULL;
@@ -2108,11 +2115,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 		return NULL;
 	}
 
-	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(dev_priv, VCS1)) {
+	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(eb->i915, VCS1)) {
 		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
 
 		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
-			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
+			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
 		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
 			   bsd_idx <= I915_EXEC_BSD_RING2) {
 			bsd_idx >>= I915_EXEC_BSD_SHIFT;
@@ -2123,9 +2130,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 			return NULL;
 		}
 
-		engine = dev_priv->engine[_VCS(bsd_idx)];
+		engine = eb->i915->engine[_VCS(bsd_idx)];
 	} else {
-		engine = dev_priv->engine[user_ring_map[user_ring_id]];
+		engine = eb->i915->engine[user_ring_map[user_ring_id]];
 	}
 
 	if (!engine) {
@@ -2335,7 +2342,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
+	eb.engine = eb_select_engine(&eb, file, args);
 	if (!eb.engine) {
 		err = -EINVAL;
 		goto err_engine;
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 2dbe8933b50a..1436fe2fb5f8 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -25,6 +25,9 @@
 #ifndef __I915_UTILS_H
 #define __I915_UTILS_H
 
+#include <linux/kernel.h>
+#include <linux/overflow.h>
+
 #undef WARN_ON
 /* Many gcc seem to no see through this and fall over :( */
 #if 0
@@ -73,6 +76,39 @@
 #define overflows_type(x, T) \
 	(sizeof(x) > sizeof(T) && (x) >> BITS_PER_TYPE(T))
 
+static inline bool
+__check_struct_size(size_t base, size_t arr, size_t count, size_t *size)
+{
+	size_t sz;
+
+	if (check_mul_overflow(count, arr, &sz))
+		return false;
+
+	if (check_add_overflow(sz, base, &sz))
+		return false;
+
+	*size = sz;
+	return true;
+}
+
+/**
+ * check_struct_size() - Calculate size of structure with trailing array.
+ * @p: Pointer to the structure.
+ * @member: Name of the array member.
+ * @n: Number of elements in the array.
+ * @sz: Total size of structure and array
+ *
+ * Calculates size of memory needed for structure @p followed by an
+ * array of @n @member elements, like struct_size() but reports
+ * whether it overflowed, and the resultant size in @sz
+ *
+ * Return: false if the calculation overflowed.
+ */
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))
+
 #define ptr_mask_bits(ptr, n) ({					\
 	unsigned long __v = (unsigned long)(ptr);			\
 	(typeof(ptr))(__v & -BIT(n));					\
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a5bdb86858f6..4e67c2395b46 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -126,6 +126,8 @@ enum drm_i915_gem_engine_class {
 	I915_ENGINE_CLASS_INVALID	= -1
 };
 
+#define I915_ENGINE_CLASS_INVALID_NONE -1
+
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
  *
@@ -1511,6 +1513,26 @@ struct drm_i915_gem_context_param {
 	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
 	 */
 #define I915_CONTEXT_PARAM_VM		0x9
+
+/*
+ * I915_CONTEXT_PARAM_ENGINES:
+ *
+ * Bind this context to operate on this subset of available engines. Henceforth,
+ * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
+ * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
+ * and upwards. Slots 0...N are filled in using the specified (class, instance).
+ * Use
+ *	engine_class: I915_ENGINE_CLASS_INVALID,
+ *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
+ * to specify a gap in the array that can be filled in later, e.g. by a
+ * virtual engine used for load balancing.
+ *
+ * Setting the number of engines bound to the context to 0, by passing a zero
+ * sized argument, will revert back to default settings.
+ *
+ * See struct i915_context_param_engines.
+ */
+#define I915_CONTEXT_PARAM_ENGINES	0xa
 /* Must be kept compact -- no holes and well documented */
 
 	__u64 value;
@@ -1575,6 +1597,23 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct i915_context_param_engines {
+	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+
+	struct {
+		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
+		__u16 engine_instance;
+	} class_instance[0];
+} __attribute__((packed));
+
+#define I915_DEFINE_CONTEXT_PARAM_ENGINES(name__, N__) struct { \
+	__u64 extensions; \
+	struct { \
+		__u16 engine_class; \
+		__u16 engine_instance; \
+	} class_instance[N__]; \
+} __attribute__((packed)) name__
+
 struct drm_i915_gem_context_create_ext_setparam {
 #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
 	struct i915_user_extension base;
@@ -1591,7 +1630,8 @@ struct drm_i915_gem_context_create_ext_clone {
 #define I915_CONTEXT_CLONE_SSEU		(1u << 2)
 #define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
 #define I915_CONTEXT_CLONE_VM		(1u << 4)
-#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+#define I915_CONTEXT_CLONE_ENGINES	(1u << 5)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_ENGINES << 1)
 	__u64 rsvd;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 14/18] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (11 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 13/18] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 13:22   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 15/18] drm/i915: Load balancing across a virtual engine Chris Wilson
                   ` (10 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

Allow the user to specify a local engine index (as opposed to
class:index) that they can use to refer to a preset engine inside the
ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
This will be useful for setting SSEU parameters on virtual engines that
are local to the context and do not have a valid global class:instance
lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
 include/uapi/drm/i915_drm.h             |  3 ++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f038c15e73d8..cbd76ef95115 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1381,6 +1381,7 @@ static int set_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_sseu sseu;
+	unsigned long lookup;
 	int ret;
 
 	if (args->size < sizeof(user_sseu))
@@ -1393,10 +1394,17 @@ static int set_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
@@ -1967,6 +1975,7 @@ static int get_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_context *ce;
+	unsigned long lookup;
 
 	if (args->size == 0)
 		goto out;
@@ -1977,10 +1986,17 @@ static int get_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 4e67c2395b46..8ef6d60929c6 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1567,9 +1567,10 @@ struct drm_i915_gem_context_param_sseu {
 	__u16 engine_instance;
 
 	/*
-	 * Unused for now. Must be cleared to zero.
+	 * Unknown flags must be cleared to zero.
 	 */
 	__u32 flags;
+#define I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX (1u << 0)
 
 	/*
 	 * Mask of slices to enable for the context. Valid values are a subset
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 15/18] drm/i915: Load balancing across a virtual engine
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (12 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 14/18] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-20 15:59   ` Tvrtko Ursulin
  2019-03-19 11:57 ` [PATCH 16/18] drm/i915: Extend execution fence to support a callback Chris Wilson
                   ` (9 subsequent siblings)
  23 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

Having allowed the user to define a set of engines that they will want
to only use, we go one step further and allow them to bind those engines
into a single virtual instance. Submitting a batch to the virtual engine
will then forward it to any one of the set in a manner as best to
distribute load.  The virtual engine has a single timeline across all
engines (it operates as a single queue), so it is not able to concurrently
run batches across multiple engines by itself; that is left up to the user
to submit multiple concurrent batches to multiple queues. Multiple users
will be load balanced across the system.

The mechanism used for load balancing in this patch is a late greedy
balancer. When a request is ready for execution, it is added to each
engine's queue, and when an engine is ready for its next request it
claims it from the virtual engine. The first engine to do so, wins, i.e.
the request is executed at the earliest opportunity (idle moment) in the
system.

As not all HW is created equal, the user is still able to skip the
virtual engine and execute the batch on a specific engine, all within the
same queue. It will then be executed in order on the correct engine,
with execution on other virtual engines being moved away due to the load
detection.

A couple of areas for potential improvement left!

- The virtual engine always take priority over equal-priority tasks.
Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
and hopefully the virtual and real engines are not then congested (i.e.
all work is via virtual engines, or all work is to the real engine).

- We require the breadcrumb irq around every virtual engine request. For
normal engines, we eliminate the need for the slow round trip via
interrupt by using the submit fence and queueing in order. For virtual
engines, we have to allow any job to transfer to a new ring, and cannot
coalesce the submissions, so require the completion fence instead,
forcing the persistent use of interrupts.

- We only drip feed single requests through each virtual engine and onto
the physical engines, even if there was enough work to fill all ELSP,
leaving small stalls with an idle CS event at the end of every request.
Could we be greedy and fill both slots? Being lazy is virtuous for load
distribution on less-than-full workloads though.

Other areas of improvement are more general, such as reducing lock
contention, reducing dispatch overhead, looking at direct submission
rather than bouncing around tasklets etc.

sseu: Lift the restriction to allow sseu to be reconfigured on virtual
engines composed of RENDER_CLASS (rcs).

v2: macroize check_user_mbz()
v3: Cancel virtual engines on wedging
v4: Commence commenting

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    | 126 ++++-
 drivers/gpu/drm/i915/i915_scheduler.c      |  18 +-
 drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
 drivers/gpu/drm/i915/intel_lrc.c           | 567 ++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h           |  11 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 ++++++
 include/uapi/drm/i915_drm.h                |  30 ++
 9 files changed, 912 insertions(+), 19 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 5c073fe73664..3ca855505715 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -96,4 +96,9 @@ static inline bool __tasklet_enable(struct tasklet_struct *t)
 	return atomic_dec_and_test(&t->count);
 }
 
+static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
+{
+	return test_bit(TASKLET_STATE_SCHED, &t->state);
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index cbd76ef95115..8d8fcc8c7a86 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -86,6 +86,7 @@
  */
 
 #include <linux/log2.h>
+#include <linux/nospec.h>
 
 #include <drm/i915_drm.h>
 
@@ -94,6 +95,7 @@
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
+#include "intel_lrc.h"
 #include "intel_workarounds.h"
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
@@ -241,6 +243,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
 	mutex_unlock(&i915->contexts.mutex);
 }
 
+static void free_engines(struct intel_engine_cs **engines, int count)
+{
+	int i;
+
+	if (ZERO_OR_NULL_PTR(engines))
+		return;
+
+	/* We own the veng we created; regular engines are ignored */
+	for (i = 0; i < count; i++)
+		intel_virtual_engine_destroy(engines[i]);
+
+	kfree(engines);
+}
+
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
 	struct intel_context *it, *n;
@@ -251,8 +267,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
-
-	kfree(ctx->engines);
+	free_engines(ctx->engines, ctx->num_engines);
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		intel_context_put(it);
@@ -1239,7 +1254,6 @@ __i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
 	int ret = 0;
 
 	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
-	GEM_BUG_ON(engine->id != RCS0);
 
 	ce = intel_context_pin_lock(ctx, engine);
 	if (IS_ERR(ce))
@@ -1433,7 +1447,80 @@ struct set_engines {
 	unsigned int num_engines;
 };
 
+static int
+set_engines__load_balance(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_load_balance __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *ve;
+	unsigned int n;
+	u64 mask;
+	u16 idx;
+	int err;
+
+	if (!HAS_EXECLISTS(set->ctx->i915))
+		return -ENODEV;
+
+	if (USES_GUC_SUBMISSION(set->ctx->i915))
+		return -ENODEV; /* not implement yet */
+
+	if (get_user(idx, &ext->engine_index))
+		return -EFAULT;
+
+	if (idx >= set->num_engines)
+		return -EINVAL;
+
+	idx = array_index_nospec(idx, set->num_engines);
+	if (set->engines[idx])
+		return -EEXIST;
+
+	err = check_user_mbz(&ext->mbz16);
+	if (err)
+		return err;
+
+	err = check_user_mbz(&ext->flags);
+	if (err)
+		return err;
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (get_user(mask, &ext->engines_mask))
+		return -EFAULT;
+
+	mask &= GENMASK_ULL(set->num_engines - 1, 0) & ~BIT_ULL(idx);
+	if (!mask)
+		return -EINVAL;
+
+	if (is_power_of_2(mask)) {
+		ve = set->engines[__ffs64(mask)];
+	} else {
+		struct intel_engine_cs *stack[64];
+		int bit;
+
+		n = 0;
+		for_each_set_bit(bit, (unsigned long *)&mask, set->num_engines)
+			stack[n++] = set->engines[bit];
+
+		ve = intel_execlists_create_virtual(set->ctx, stack, n);
+	}
+	if (IS_ERR(ve))
+		return PTR_ERR(ve);
+
+	if (cmpxchg(&set->engines[idx], NULL, ve)) {
+		intel_virtual_engine_destroy(ve);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
+	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 };
 
 static int
@@ -1497,13 +1584,13 @@ set_engines(struct i915_gem_context *ctx,
 					   ARRAY_SIZE(set_engines__extensions),
 					   &set);
 	if (err) {
-		kfree(set.engines);
+		free_engines(set.engines, set.num_engines);
 		return err;
 	}
 
 out:
 	mutex_lock(&ctx->i915->drm.struct_mutex);
-	kfree(ctx->engines);
+	free_engines(ctx->engines, ctx->num_engines);
 	ctx->engines = set.engines;
 	ctx->num_engines = set.num_engines;
 	mutex_unlock(&ctx->i915->drm.struct_mutex);
@@ -1692,7 +1779,7 @@ static int clone_engines(struct i915_gem_context *dst,
 			 struct i915_gem_context *src)
 {
 	struct intel_engine_cs **engines;
-	unsigned int num_engines;
+	unsigned int num_engines, i;
 
 	mutex_lock(&src->i915->drm.struct_mutex); /* serialise src->engine[] */
 
@@ -1707,11 +1794,36 @@ static int clone_engines(struct i915_gem_context *dst,
 			mutex_unlock(&src->i915->drm.struct_mutex);
 			return -ENOMEM;
 		}
+
+		/*
+		 * Virtual engines are singletons; they can only exist
+		 * inside a single context, because they embed their
+		 * HW context... As each virtual context implies a single
+		 * timeline (each engine can only dequeue a single request
+		 * at any time), it would be surprising for two contexts
+		 * to use the same engine. So let's create a copy of
+		 * the virtual engine instead.
+		 */
+		for (i = 0; i < num_engines; i++) {
+			struct intel_engine_cs *engine = engines[i];
+
+			if (!engine || !intel_engine_is_virtual(engine))
+				continue;
+
+			engine = intel_execlists_clone_virtual(dst, engine);
+			if (IS_ERR(engine)) {
+				free_engines(engines, i);
+				mutex_unlock(&src->i915->drm.struct_mutex);
+				return PTR_ERR(engine);
+			}
+
+			engines[i] = engine;
+		}
 	}
 
 	mutex_unlock(&src->i915->drm.struct_mutex);
 
-	kfree(dst->engines);
+	free_engines(dst->engines, dst->num_engines);
 	dst->engines = engines;
 	dst->num_engines = num_engines;
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index e0f609d01564..8cff4f6d6158 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -247,17 +247,26 @@ sched_lock_engine(const struct i915_sched_node *node,
 		  struct intel_engine_cs *locked,
 		  struct sched_cache *cache)
 {
-	struct intel_engine_cs *engine = node_to_request(node)->engine;
+	const struct i915_request *rq = node_to_request(node);
+	struct intel_engine_cs *engine;
 
 	GEM_BUG_ON(!locked);
 
-	if (engine != locked) {
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	while (locked != (engine = READ_ONCE(rq->engine))) {
 		spin_unlock(&locked->timeline.lock);
 		memset(cache, 0, sizeof(*cache));
 		spin_lock(&engine->timeline.lock);
+		locked = engine;
 	}
 
-	return engine;
+	GEM_BUG_ON(locked != engine);
+	return locked;
 }
 
 static bool inflight(const struct i915_request *rq,
@@ -370,8 +379,11 @@ static void __i915_schedule(struct i915_request *rq,
 		if (prio <= node->attr.priority || node_signaled(node))
 			continue;
 
+		GEM_BUG_ON(node_to_request(node)->engine != engine);
+
 		node->attr.priority = prio;
 		if (!list_empty(&node->link)) {
+			GEM_BUG_ON(intel_engine_is_virtual(engine));
 			if (!cache.priolist)
 				cache.priolist =
 					i915_sched_lookup_priolist(engine,
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index 1f5b55d9ffb5..57c79830bb8c 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -26,6 +26,7 @@ struct i915_timeline {
 	spinlock_t lock;
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
+#define TIMELINE_VIRTUAL 2
 	struct mutex mutex; /* protects the flow of requests */
 
 	unsigned int pin_count;
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index cef1e71a7401..4d526767c371 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -224,6 +224,7 @@ struct intel_engine_execlists {
 	 * @queue: queue of requests, in priority lists
 	 */
 	struct rb_root_cached queue;
+	struct rb_root_cached virtual;
 
 	/**
 	 * @csb_write: control register for Context Switch buffer
@@ -429,6 +430,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+#define I915_ENGINE_IS_VIRTUAL       BIT(4)
 	unsigned int flags;
 
 	/*
@@ -512,6 +514,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
+static inline bool
+intel_engine_is_virtual(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_IS_VIRTUAL;
+}
+
 #define instdone_slice_mask(dev_priv__) \
 	(IS_GEN(dev_priv__, 7) ? \
 	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0e64317ddcbf..a7f0235f19c5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -166,6 +166,41 @@
 
 #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
+struct virtual_engine {
+	struct intel_engine_cs base;
+	struct intel_context context;
+
+	/*
+	 * We allow only a single request through the virtual engine at a time
+	 * (each request in the timeline waits for the completion fence of
+	 * the previous before being submitted). By restricting ourselves to
+	 * only submitting a single request, each request is placed on to a
+	 * physical to maximise load spreading (by virtue of the late greedy
+	 * scheduling -- each real engine takes the next available request
+	 * upon idling).
+	 */
+	struct i915_request *request;
+
+	/*
+	 * We keep a rbtree of available virtual engines inside each physical
+	 * engine, sorted by priority. Here we preallocate the nodes we need
+	 * for the virtual engine, indexed by physical_engine->id.
+	 */
+	struct ve_node {
+		struct rb_node rb;
+		int prio;
+	} nodes[I915_NUM_ENGINES];
+
+	/* And finally, which physical engines this virtual engine maps onto. */
+	unsigned int count;
+	struct intel_engine_cs *siblings[0];
+};
+
+static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
+{
+	return container_of(engine, struct virtual_engine, base);
+}
+
 static int execlists_context_deferred_alloc(struct intel_context *ce,
 					    struct intel_engine_cs *engine);
 static void execlists_init_reg_state(u32 *reg_state,
@@ -229,7 +264,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 }
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
-				const struct i915_request *rq)
+				const struct i915_request *rq,
+				struct rb_node *rb)
 {
 	int last_prio;
 
@@ -264,6 +300,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, link)) > last_prio)
 		return true;
 
+	if (rb) { /* XXX virtual precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		bool preempt = false;
+
+		if (engine == ve->siblings[0]) { /* only preempt one sibling */
+			spin_lock(&ve->base.timeline.lock);
+			if (ve->request)
+				preempt = rq_prio(ve->request) > last_prio;
+			spin_unlock(&ve->base.timeline.lock);
+		}
+
+		if (preempt)
+			return preempt;
+	}
+
 	/*
 	 * If the inflight context did not trigger the preemption, then maybe
 	 * it was the set of queued requests? Pick the highest priority in
@@ -382,6 +434,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	list_for_each_entry_safe_reverse(rq, rn,
 					 &engine->timeline.requests,
 					 link) {
+		struct intel_engine_cs *owner;
+
 		if (i915_request_completed(rq))
 			break;
 
@@ -390,14 +444,30 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		GEM_BUG_ON(rq->hw_context->active);
 
-		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-		if (rq_prio(rq) != prio) {
-			prio = rq_prio(rq);
-			pl = i915_sched_lookup_priolist(engine, prio);
-		}
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+		/*
+		 * Push the request back into the queue for later resubmission.
+		 * If this request is not native to this physical engine (i.e.
+		 * it came from a virtual source), push it back onto the virtual
+		 * engine so that it can be moved across onto another physical
+		 * engine as load dictates.
+		 */
+		owner = rq->hw_context->engine;
+		if (likely(owner == engine)) {
+			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+			if (rq_prio(rq) != prio) {
+				prio = rq_prio(rq);
+				pl = i915_sched_lookup_priolist(engine, prio);
+			}
+			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
 
-		list_add(&rq->sched.link, pl);
+			list_add(&rq->sched.link, pl);
+		} else {
+			if (__i915_request_has_started(rq))
+				rq->sched.attr.priority |= ACTIVE_PRIORITY;
+
+			rq->engine = owner;
+			owner->submit_request(rq);
+		}
 
 		active = rq;
 	}
@@ -659,6 +729,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 						  execlists));
 }
 
+static void virtual_update_register_offsets(u32 *regs,
+					    struct intel_engine_cs *engine)
+{
+	u32 base = engine->mmio_base;
+
+	regs[CTX_CONTEXT_CONTROL] =
+		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
+	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
+	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
+	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
+	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
+
+	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
+	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
+	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
+	regs[CTX_SECOND_BB_HEAD_U] =
+		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
+	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
+	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
+
+	regs[CTX_CTX_TIMESTAMP] =
+		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
+	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
+	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
+	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
+	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
+	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
+	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
+	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
+	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
+
+	if (engine->class == RENDER_CLASS) {
+		regs[CTX_RCS_INDIRECT_CTX] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
+		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
+		regs[CTX_BB_PER_CTX_PTR] =
+			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
+
+		regs[CTX_R_PWR_CLK_STATE] =
+			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+	}
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -691,6 +805,37 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
+	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+		struct intel_engine_cs *active;
+
+		if (!rq) { /* lazily cleanup after another engine handled rq */
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		/*
+		 * We track when the HW has completed saving the context image
+		 * (i.e. when we have seen the final CS event switching out of
+		 * the context) and must not overwrite the context image before
+		 * then. This restricts us to only using the active engine
+		 * while the previous virtualized request is inflight (so
+		 * we reuse the register offsets). This is a very small
+		 * hystersis on the greedy seelction algorithm.
+		 */
+		active = READ_ONCE(ve->context.active);
+		if (active && active != engine) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		break;
+	}
+
 	if (last) {
 		/*
 		 * Don't resubmit or switch until all outstanding
@@ -712,7 +857,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last)) {
+		if (need_preempt(engine, last, rb)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -752,6 +897,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		last->tail = last->wa_tail;
 	}
 
+	while (rb) { /* XXX virtual is always taking precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq;
+
+		spin_lock(&ve->base.timeline.lock);
+
+		rq = ve->request;
+		if (unlikely(!rq)) { /* lost the race to a sibling */
+			spin_unlock(&ve->base.timeline.lock);
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		if (rq_prio(rq) >= queue_prio(execlists)) {
+			if (last && !can_merge_rq(last, rq)) {
+				spin_unlock(&ve->base.timeline.lock);
+				return; /* leave this rq for another engine */
+			}
+
+			GEM_BUG_ON(rq->engine != &ve->base);
+			ve->request = NULL;
+			ve->base.execlists.queue_priority_hint = INT_MIN;
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+
+			GEM_BUG_ON(rq->hw_context != &ve->context);
+			rq->engine = engine;
+
+			if (engine != ve->siblings[0]) {
+				u32 *regs = ve->context.lrc_reg_state;
+				unsigned int n;
+
+				GEM_BUG_ON(READ_ONCE(ve->context.active));
+				virtual_update_register_offsets(regs, engine);
+
+				/*
+				 * Move the bound engine to the top of the list
+				 * for future execution. We then kick this
+				 * tasklet first before checking others, so that
+				 * we preferentially reuse this set of bound
+				 * registers.
+				 */
+				for (n = 1; n < ve->count; n++) {
+					if (ve->siblings[n] == engine) {
+						swap(ve->siblings[n],
+						     ve->siblings[0]);
+						break;
+					}
+				}
+
+				GEM_BUG_ON(ve->siblings[0] != engine);
+			}
+
+			__i915_request_submit(rq);
+			trace_i915_request_in(rq, port_index(port, execlists));
+			submit = true;
+			last = rq;
+		}
+
+		spin_unlock(&ve->base.timeline.lock);
+		break;
+	}
+
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
@@ -971,6 +1182,24 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
 		i915_priolist_free(p);
 	}
 
+	/* Cancel all attached virtual engines */
+	while ((rb = rb_first_cached(&execlists->virtual))) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+
+		rb_erase_cached(rb, &execlists->virtual);
+		RB_CLEAR_NODE(rb);
+
+		spin_lock(&ve->base.timeline.lock);
+		if (ve->request) {
+			__i915_request_submit(ve->request);
+			dma_fence_set_error(&ve->request->fence, -EIO);
+			i915_request_mark_complete(ve->request);
+			ve->request = NULL;
+		}
+		spin_unlock(&ve->base.timeline.lock);
+	}
+
 	/* Remaining _unready_ requests will be nop'ed when submitted */
 
 	execlists->queue_priority_hint = INT_MIN;
@@ -2897,6 +3126,303 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	}
 }
 
+static void virtual_context_destroy(struct kref *kref)
+{
+	struct virtual_engine *ve =
+		container_of(kref, typeof(*ve), context.ref);
+	unsigned int n;
+
+	GEM_BUG_ON(ve->request);
+	GEM_BUG_ON(ve->context.active);
+
+	for (n = 0; n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct rb_node *node = &ve->nodes[sibling->id].rb;
+
+		if (RB_EMPTY_NODE(node))
+			continue;
+
+		spin_lock_irq(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(node))
+			rb_erase_cached(node, &sibling->execlists.virtual);
+
+		spin_unlock_irq(&sibling->timeline.lock);
+	}
+	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
+
+	if (ve->context.state)
+		__execlists_context_fini(&ve->context);
+
+	i915_timeline_fini(&ve->base.timeline);
+	kfree(ve);
+}
+
+static void virtual_engine_initial_hint(struct virtual_engine *ve)
+{
+	int swp;
+
+	/*
+	 * Pick a random sibling on starting to help spread the load around.
+	 *
+	 * New contexts are typically created with exactly the same order
+	 * of siblings, and often started in batches. Due to the way we iterate
+	 * the array of sibling when submitting requests, sibling[0] is
+	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
+	 * randomised across the system, we also help spread the load by the
+	 * first engine we inspect being different each time.
+	 *
+	 * NB This does not force us to execute on this engine, it will just
+	 * typically be the first we inspect for submission.
+	 */
+	swp = prandom_u32_max(ve->count);
+	if (!swp)
+		return;
+
+	swap(ve->siblings[swp], ve->siblings[0]);
+	virtual_update_register_offsets(ve->context.lrc_reg_state,
+					ve->siblings[0]);
+}
+
+static int virtual_context_pin(struct intel_context *ce)
+{
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+	int err;
+
+	/* Note: we must use a real engine class for setting up reg state */
+	err = __execlists_context_pin(ce, ve->siblings[0]);
+	if (err)
+		return err;
+
+	virtual_engine_initial_hint(ve);
+	return 0;
+}
+
+static const struct intel_context_ops virtual_context_ops = {
+	.pin = virtual_context_pin,
+	.unpin = execlists_context_unpin,
+
+	.destroy = virtual_context_destroy,
+};
+
+static void virtual_submission_tasklet(unsigned long data)
+{
+	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int n;
+	int prio;
+
+	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
+	if (prio == INT_MIN)
+		return;
+
+	local_irq_disable();
+	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct ve_node * const node = &ve->nodes[sibling->id];
+		struct rb_node **parent, *rb;
+		bool first;
+
+		spin_lock(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(&node->rb)) {
+			/*
+			 * Cheat and avoid rebalancing the tree if we can
+			 * reuse this node in situ.
+			 */
+			first = rb_first_cached(&sibling->execlists.virtual) ==
+				&node->rb;
+			if (prio == node->prio || (prio > node->prio && first))
+				goto submit_engine;
+
+			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
+		}
+
+		rb = NULL;
+		first = true;
+		parent = &sibling->execlists.virtual.rb_root.rb_node;
+		while (*parent) {
+			struct ve_node *other;
+
+			rb = *parent;
+			other = rb_entry(rb, typeof(*other), rb);
+			if (prio > other->prio) {
+				parent = &rb->rb_left;
+			} else {
+				parent = &rb->rb_right;
+				first = false;
+			}
+		}
+
+		rb_link_node(&node->rb, rb, parent);
+		rb_insert_color_cached(&node->rb,
+				       &sibling->execlists.virtual,
+				       first);
+
+submit_engine:
+		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
+		node->prio = prio;
+		if (first && prio > sibling->execlists.queue_priority_hint) {
+			sibling->execlists.queue_priority_hint = prio;
+			tasklet_hi_schedule(&sibling->execlists.tasklet);
+		}
+
+		spin_unlock(&sibling->timeline.lock);
+	}
+	local_irq_enable();
+}
+
+static void virtual_submit_request(struct i915_request *request)
+{
+	struct virtual_engine *ve = to_virtual_engine(request->engine);
+
+	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
+
+	GEM_BUG_ON(ve->request);
+	ve->base.execlists.queue_priority_hint = rq_prio(request);
+	WRITE_ONCE(ve->request, request);
+
+	tasklet_schedule(&ve->base.execlists.tasklet);
+}
+
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count)
+{
+	struct virtual_engine *ve;
+	unsigned int n;
+	int err;
+
+	if (!count)
+		return ERR_PTR(-EINVAL);
+
+	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	ve->base.i915 = ctx->i915;
+	ve->base.id = -1;
+	ve->base.class = OTHER_CLASS;
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	err = i915_timeline_init(ctx->i915, &ve->base.timeline, NULL);
+	if (err)
+		goto err_put;
+	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
+
+	ve->base.cops = &virtual_context_ops;
+	ve->base.request_alloc = execlists_request_alloc;
+
+	ve->base.schedule = i915_schedule;
+	ve->base.submit_request = virtual_submit_request;
+
+	ve->base.execlists.queue_priority_hint = INT_MIN;
+	tasklet_init(&ve->base.execlists.tasklet,
+		     virtual_submission_tasklet,
+		     (unsigned long)ve);
+
+	intel_context_init(&ve->context, ctx, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask)
+			continue;
+
+		/*
+		 * The virtual engine implementation is tightly coupled to
+		 * the execlists backend -- we push out request directly
+		 * into a tree inside each physical engine. We could support
+		 * layering if we handling cloning of the requests and
+		 * submitting a copy into each backend.
+		 */
+		if (sibling->execlists.tasklet.func !=
+		    execlists_submission_tasklet) {
+			err = -ENODEV;
+			goto err_put;
+		}
+
+		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
+		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
+
+		ve->siblings[ve->count++] = sibling;
+		ve->base.mask |= sibling->mask;
+
+		/*
+		 * All physical engines must be compatible for their emission
+		 * functions (as we build the instructions during request
+		 * construction and do not alter them before submission
+		 * on the physical engine). We use the engine class as a guide
+		 * here, although that could be refined.
+		 */
+		if (ve->base.class != OTHER_CLASS) {
+			if (ve->base.class != sibling->class) {
+				err = -EINVAL;
+				goto err_put;
+			}
+			continue;
+		}
+
+		ve->base.class = sibling->class;
+		snprintf(ve->base.name, sizeof(ve->base.name),
+			 "v%dx%d", ve->base.class, count);
+		ve->base.context_size = sibling->context_size;
+
+		ve->base.emit_bb_start = sibling->emit_bb_start;
+		ve->base.emit_flush = sibling->emit_flush;
+		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
+		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
+		ve->base.emit_fini_breadcrumb_dw =
+			sibling->emit_fini_breadcrumb_dw;
+	}
+
+	/* gracefully replace a degenerate virtual engine */
+	if (ve->count == 1) {
+		struct intel_engine_cs *actual = ve->siblings[0];
+		intel_context_put(&ve->context);
+		return actual;
+	}
+
+	__intel_context_insert(ctx, &ve->base, &ve->context);
+	return &ve->base;
+
+err_put:
+	intel_context_put(&ve->context);
+	return ERR_PTR(err);
+}
+
+struct intel_engine_cs *
+intel_execlists_clone_virtual(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *src)
+{
+	struct virtual_engine *se = to_virtual_engine(src);
+	struct intel_engine_cs *dst;
+
+	dst = intel_execlists_create_virtual(ctx,
+					     se->siblings,
+					     se->count);
+	if (IS_ERR(dst))
+		return dst;
+
+	return dst;
+}
+
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (!engine || !intel_engine_is_virtual(engine))
+		return;
+
+	__intel_context_remove(&ve->context);
+	intel_context_put(&ve->context);
+}
+
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
@@ -2954,6 +3480,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\tQ ");
 	}
 
+	last = NULL;
+	count = 0;
+	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (rq) {
+			if (count++ < max - 1)
+				show_request(m, rq, "\t\tV ");
+			else
+				last = rq;
+		}
+	}
+	if (last) {
+		if (count > max) {
+			drm_printf(m,
+				   "\t\t...skipping %d virtual requests...\n",
+				   count - max);
+		}
+		show_request(m, last, "\t\tV ");
+	}
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f1aec8a6986f..9d90dc68e02b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,6 +112,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							const char *prefix),
 				   unsigned int max);
 
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count);
+
+struct intel_engine_cs *
+intel_execlists_clone_virtual(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *src);
+
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
+
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 9e871eb0bfb1..6df033960350 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -10,6 +10,7 @@
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
+#include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
 
@@ -1057,6 +1058,169 @@ static int live_preempt_smoke(void *arg)
 	return err;
 }
 
+static int nop_virtual_engine(struct drm_i915_private *i915,
+			      struct intel_engine_cs **siblings,
+			      unsigned int nsibling,
+			      unsigned int nctx,
+			      unsigned int flags)
+#define CHAIN BIT(0)
+{
+	IGT_TIMEOUT(end_time);
+	struct i915_request *request[16];
+	struct i915_gem_context *ctx[16];
+	struct intel_engine_cs *ve[16];
+	unsigned long n, prime, nc;
+	struct igt_live_test t;
+	ktime_t times[2] = {};
+	int err;
+
+	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
+
+	for (n = 0; n < nctx; n++) {
+		ctx[n] = kernel_context(i915);
+		if (!ctx[n])
+			return -ENOMEM;
+
+		ve[n] = intel_execlists_create_virtual(ctx[n],
+						       siblings, nsibling);
+		if (IS_ERR(ve[n]))
+			return PTR_ERR(ve[n]);
+	}
+
+	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
+	if (err)
+		goto out;
+
+	for_each_prime_number_from(prime, 1, 8192) {
+		times[1] = ktime_get_raw();
+
+		if (flags & CHAIN) {
+			for (nc = 0; nc < nctx; nc++) {
+				for (n = 0; n < prime; n++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		} else {
+			for (n = 0; n < prime; n++) {
+				for (nc = 0; nc < nctx; nc++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		}
+
+		for (nc = 0; nc < nctx; nc++) {
+			if (i915_request_wait(request[nc],
+					      I915_WAIT_LOCKED,
+					      HZ / 10) < 0) {
+				pr_err("%s(%s): wait for %llx:%lld timed out\n",
+				       __func__, ve[0]->name,
+				       request[nc]->fence.context,
+				       request[nc]->fence.seqno);
+
+				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
+					  __func__, ve[0]->name,
+					  request[nc]->fence.context,
+					  request[nc]->fence.seqno);
+				GEM_TRACE_DUMP();
+				i915_gem_set_wedged(i915);
+				break;
+			}
+		}
+
+		times[1] = ktime_sub(ktime_get_raw(), times[1]);
+		if (prime == 1)
+			times[0] = times[1];
+
+		if (__igt_timeout(end_time, NULL))
+			break;
+	}
+
+	err = igt_live_test_end(&t);
+	if (err)
+		goto out;
+
+	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
+		nctx, ve[0]->name, ktime_to_ns(times[0]),
+		prime, div64_u64(ktime_to_ns(times[1]), prime));
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (nc = 0; nc < nctx; nc++) {
+		intel_virtual_engine_destroy(ve[nc]);
+		kernel_context_close(ctx[nc]);
+	}
+	return err;
+}
+
+static int live_virtual_engine(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	unsigned int class, inst;
+	int err = -ENODEV;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for_each_engine(engine, i915, id) {
+		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
+		if (err) {
+			pr_err("Failed to wrap engine %s: err=%d\n",
+			       engine->name, err);
+			goto out_unlock;
+		}
+	}
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		int nsibling, n;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (n = 1; n <= nsibling + 1; n++) {
+			err = nop_virtual_engine(i915, siblings, nsibling,
+						 n, 0);
+			if (err)
+				goto out_unlock;
+		}
+
+		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
+		if (err)
+			goto out_unlock;
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1068,6 +1232,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
+		SUBTEST(live_virtual_engine),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 8ef6d60929c6..9c94c037d13b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -127,6 +127,7 @@ enum drm_i915_gem_engine_class {
 };
 
 #define I915_ENGINE_CLASS_INVALID_NONE -1
+#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
 
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
@@ -1598,8 +1599,37 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+/*
+ * i915_context_engines_load_balance:
+ *
+ * Enable load balancing across this set of engines.
+ *
+ * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
+ * used will proxy the execbuffer request onto one of the set of engines
+ * in such a way as to distribute the load evenly across the set.
+ *
+ * The set of engines must be compatible (e.g. the same HW class) as they
+ * will share the same logical GPU context and ring.
+ *
+ * To intermix rendering with the virtual engine and direct rendering onto
+ * the backing engines (bypassing the load balancing proxy), the context must
+ * be defined to use a single timeline for all engines.
+ */
+struct i915_context_engines_load_balance {
+	struct i915_user_extension base;
+
+	__u16 engine_index;
+	__u16 mbz16; /* reserved for future use; must be zero */
+	__u32 flags; /* all undefined flags must be zero */
+
+	__u64 engines_mask; /* selection mask of engines[] */
+
+	__u64 mbz64[4]; /* reserved for future use; must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 16/18] drm/i915: Extend execution fence to support a callback
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (13 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 15/18] drm/i915: Load balancing across a virtual engine Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 11:57 ` [PATCH 17/18] drm/i915/execlists: Virtual engine bonding Chris Wilson
                   ` (8 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_request.c | 84 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h |  4 ++
 2 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index e9c2094ab8ea..3faf06d2a9b0 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -38,6 +38,8 @@ struct execute_cb {
 	struct list_head link;
 	struct irq_work work;
 	struct i915_sw_fence *fence;
+	void (*hook)(struct i915_request *rq, struct dma_fence *signal);
+	struct i915_request *signal;
 };
 
 static struct i915_global_request {
@@ -342,6 +344,17 @@ static void irq_execute_cb(struct irq_work *wrk)
 	kmem_cache_free(global.slab_execute_cbs, cb);
 }
 
+static void irq_execute_cb_hook(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	cb->hook(container_of(cb->fence, struct i915_request, submit),
+		 &cb->signal->fence);
+	i915_request_put(cb->signal);
+
+	irq_execute_cb(wrk);
+}
+
 static void __notify_execute_cb(struct i915_request *rq)
 {
 	struct execute_cb *cb;
@@ -368,14 +381,19 @@ static void __notify_execute_cb(struct i915_request *rq)
 }
 
 static int
-i915_request_await_execution(struct i915_request *rq,
-			     struct i915_request *signal,
-			     gfp_t gfp)
+__i915_request_await_execution(struct i915_request *rq,
+			       struct i915_request *signal,
+			       void (*hook)(struct i915_request *rq,
+					    struct dma_fence *signal),
+			       gfp_t gfp)
 {
 	struct execute_cb *cb;
 
-	if (i915_request_is_active(signal))
+	if (i915_request_is_active(signal)) {
+		if (hook)
+			hook(rq, &signal->fence);
 		return 0;
+	}
 
 	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
 	if (!cb)
@@ -385,8 +403,18 @@ i915_request_await_execution(struct i915_request *rq,
 	i915_sw_fence_await(cb->fence);
 	init_irq_work(&cb->work, irq_execute_cb);
 
+	if (hook) {
+		cb->hook = hook;
+		cb->signal = i915_request_get(signal);
+		cb->work.func = irq_execute_cb_hook;
+	}
+
 	spin_lock_irq(&signal->lock);
 	if (i915_request_is_active(signal)) {
+		if (hook) {
+			hook(rq, &signal->fence);
+			i915_request_put(signal);
+		}
 		i915_sw_fence_complete(cb->fence);
 		kmem_cache_free(global.slab_execute_cbs, cb);
 	} else {
@@ -789,7 +817,7 @@ emit_semaphore_wait(struct i915_request *to,
 		return err;
 
 	/* Only submit our spinner after the signaler is running! */
-	err = i915_request_await_execution(to, from, gfp);
+	err = __i915_request_await_execution(to, from, NULL, gfp);
 	if (err)
 		return err;
 
@@ -909,6 +937,52 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 	return 0;
 }
 
+int
+i915_request_await_execution(struct i915_request *rq,
+			     struct dma_fence *fence,
+			     void (*hook)(struct i915_request *rq,
+					  struct dma_fence *signal))
+{
+	struct dma_fence **child = &fence;
+	unsigned int nchild = 1;
+	int ret;
+
+	if (dma_fence_is_array(fence)) {
+		struct dma_fence_array *array = to_dma_fence_array(fence);
+
+		/* XXX Error for signal-on-any fence arrays */
+
+		child = array->fences;
+		nchild = array->num_fences;
+		GEM_BUG_ON(!nchild);
+	}
+
+	do {
+		fence = *child++;
+		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+			continue;
+
+		/*
+		 * We don't squash repeated fence dependencies here as we
+		 * want to run our callback in all cases.
+		 */
+
+		if (dma_fence_is_i915(fence))
+			ret = __i915_request_await_execution(rq,
+							     to_request(fence),
+							     hook,
+							     I915_FENCE_GFP);
+		else
+			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
+							    I915_FENCE_TIMEOUT,
+							    GFP_KERNEL);
+		if (ret < 0)
+			return ret;
+	} while (--nchild);
+
+	return 0;
+}
+
 /**
  * i915_request_await_object - set this request to (async) wait upon a bo
  * @to: request we are wishing to use
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index cd6c130964cd..d4f6b2940130 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -265,6 +265,10 @@ int i915_request_await_object(struct i915_request *to,
 			      bool write);
 int i915_request_await_dma_fence(struct i915_request *rq,
 				 struct dma_fence *fence);
+int i915_request_await_execution(struct i915_request *rq,
+				 struct dma_fence *fence,
+				 void (*hook)(struct i915_request *rq,
+					      struct dma_fence *signal));
 
 void i915_request_add(struct i915_request *rq);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 17/18] drm/i915/execlists: Virtual engine bonding
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (14 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 16/18] drm/i915: Extend execution fence to support a callback Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 11:57 ` [PATCH 18/18] drm/i915: Allow specification of parallel execbuf Chris Wilson
                   ` (7 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

Some users require that when a master batch is executed on one particular
engine, a companion batch is run simultaneously on a specific slave
engine. For this purpose, we introduce virtual engine bonding, allowing
maps of master:slaves to be constructed to constrain which physical
engines a virtual engine may select given a fence on a master engine.

For the moment, we continue to ignore the issue of preemption deferring
the master request for later. Ideally, we would like to then also remove
the slave and run something else rather than have it stall the pipeline.
With load balancing, we should be able to move workload around it, but
there is a similar stall on the master pipeline while it may wait for
the slave to be executed. At the cost of more latency for the bonded
request, it may be interesting to launch both on their engines in
lockstep. (Bubbles abound.)

Opens: Also what about bonding an engine as its own master? It doesn't
break anything internally, so allow the silliness.

v2: Emancipate the bonds
v3: Couple in delayed scheduling for the selftests
v4: Handle invalid mutually exclusive bonding
v5: Mention what the uapi does
v6: s/nbond/num_bonds/

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c       |  50 +++++
 drivers/gpu/drm/i915/i915_request.c           |   1 +
 drivers/gpu/drm/i915/i915_request.h           |   3 +
 drivers/gpu/drm/i915/intel_engine_types.h     |   7 +
 drivers/gpu/drm/i915/intel_lrc.c              | 152 ++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h              |   4 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c    | 185 ++++++++++++++++++
 drivers/gpu/drm/i915/selftests/lib_sw_fence.c |   3 +
 include/uapi/drm/i915_drm.h                   |  33 ++++
 9 files changed, 438 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 8d8fcc8c7a86..20deee296d04 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1519,8 +1519,58 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 	return 0;
 }
 
+static int
+set_engines__bond(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_bond __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	unsigned int idx, class, instance;
+	struct intel_engine_cs *master;
+	u64 siblings;
+	int err;
+
+	if (get_user(idx, &ext->virtual_index))
+		return -EFAULT;
+
+	if (idx >= set->num_engines)
+		return -EINVAL;
+
+	idx = array_index_nospec(idx, set->num_engines);
+	if (!set->engines[idx])
+		return -EINVAL;
+
+	/*
+	 * A non-virtual engine has 0 siblings to choose between; and submit
+	 * fence will always be directed to the one engine.
+	 */
+	if (!intel_engine_is_virtual(set->engines[idx]))
+		return 0;
+
+	err = check_user_mbz(&ext->mbz);
+	if (err)
+		return err;
+
+	if (get_user(class, &ext->master_class))
+		return -EFAULT;
+
+	if (get_user(instance, &ext->master_instance))
+		return -EFAULT;
+
+	master = intel_engine_lookup_user(set->ctx->i915, class, instance);
+	if (!master)
+		return -EINVAL;
+
+	if (get_user(siblings, &ext->sibling_mask))
+		return -EFAULT;
+
+	return intel_virtual_engine_attach_bond(set->engines[idx],
+						master, siblings);
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
+	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 3faf06d2a9b0..06e5c2a50080 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -742,6 +742,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->batch = NULL;
 	rq->capture_list = NULL;
 	rq->waitboost = false;
+	rq->execution_mask = ALL_ENGINES;
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index d4f6b2940130..5bdab6881b13 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -32,6 +32,8 @@
 #include "i915_selftest.h"
 #include "i915_sw_fence.h"
 
+#include "intel_engine_types.h"
+
 #include <uapi/drm/i915_drm.h>
 
 struct drm_file;
@@ -145,6 +147,7 @@ struct i915_request {
 	 */
 	struct i915_sched_node sched;
 	struct i915_dependency dep;
+	intel_engine_mask_t execution_mask;
 
 	/*
 	 * A convenience pointer to the current breadcrumb value stored in
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index 4d526767c371..4c0f8ea1c546 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -390,6 +390,13 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct i915_request *rq);
 
+	/*
+	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
+	 * request down to the bonded pairs.
+	 */
+	void            (*bond_execute)(struct i915_request *rq,
+					struct dma_fence *signal);
+
 	/*
 	 * Call when the priority on a request has changed and it and its
 	 * dependencies may need rescheduling. Note the request itself may
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index a7f0235f19c5..dd94002752c5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -191,6 +191,18 @@ struct virtual_engine {
 		int prio;
 	} nodes[I915_NUM_ENGINES];
 
+	/*
+	 * Keep track of bonded pairs -- restrictions upon on our selection
+	 * of physical engines any particular request may be submitted to.
+	 * If we receive a submit-fence from a master engine, we will only
+	 * use one of sibling_mask physical engines.
+	 */
+	struct ve_bond {
+		struct intel_engine_cs *master;
+		intel_engine_mask_t sibling_mask;
+	} *bonds;
+	unsigned int num_bonds;
+
 	/* And finally, which physical engines this virtual engine maps onto. */
 	unsigned int count;
 	struct intel_engine_cs *siblings[0];
@@ -818,6 +830,12 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			continue;
 		}
 
+		if (!(rq->execution_mask & engine->mask)) {
+			/* We peeked too soon! */
+			rb = rb_next(rb);
+			continue;
+		}
+
 		/*
 		 * We track when the HW has completed saving the context image
 		 * (i.e. when we have seen the final CS event switching out of
@@ -912,6 +930,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 			rb = rb_first_cached(&execlists->virtual);
 			continue;
 		}
+		GEM_BUG_ON(!(rq->execution_mask & engine->mask));
 
 		if (rq_prio(rq) >= queue_prio(execlists)) {
 			if (last && !can_merge_rq(last, rq)) {
@@ -3154,6 +3173,8 @@ static void virtual_context_destroy(struct kref *kref)
 	if (ve->context.state)
 		__execlists_context_fini(&ve->context);
 
+	kfree(ve->bonds);
+
 	i915_timeline_fini(&ve->base.timeline);
 	kfree(ve);
 }
@@ -3205,9 +3226,30 @@ static const struct intel_context_ops virtual_context_ops = {
 	.destroy = virtual_context_destroy,
 };
 
+static unsigned long virtual_submission_mask(struct virtual_engine *ve)
+{
+	struct i915_request *rq;
+	unsigned long mask;
+
+	rq = READ_ONCE(ve->request);
+	if (!rq)
+		return 0;
+
+	/* The rq is ready for submission; rq->execution_mask is now stable. */
+	mask = rq->execution_mask;
+	if (unlikely(!mask)) {
+		/* Invalid selection, submit to a random engine in error */
+		i915_request_skip(rq, -ENODEV);
+		mask = ve->siblings[0]->mask;
+	}
+
+	return mask;
+}
+
 static void virtual_submission_tasklet(unsigned long data)
 {
 	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned long mask;
 	unsigned int n;
 	int prio;
 
@@ -3215,6 +3257,12 @@ static void virtual_submission_tasklet(unsigned long data)
 	if (prio == INT_MIN)
 		return;
 
+	rcu_read_lock();
+	mask = virtual_submission_mask(ve);
+	rcu_read_unlock();
+	if (unlikely(!mask))
+		return;
+
 	local_irq_disable();
 	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
 		struct intel_engine_cs *sibling = ve->siblings[n];
@@ -3222,6 +3270,17 @@ static void virtual_submission_tasklet(unsigned long data)
 		struct rb_node **parent, *rb;
 		bool first;
 
+		if (unlikely(!(mask & sibling->mask))) {
+			if (!RB_EMPTY_NODE(&node->rb)) {
+				spin_lock(&sibling->timeline.lock);
+				rb_erase_cached(&node->rb,
+						&sibling->execlists.virtual);
+				RB_CLEAR_NODE(&node->rb);
+				spin_unlock(&sibling->timeline.lock);
+			}
+			continue;
+		}
+
 		spin_lock(&sibling->timeline.lock);
 
 		if (!RB_EMPTY_NODE(&node->rb)) {
@@ -3284,6 +3343,37 @@ static void virtual_submit_request(struct i915_request *request)
 	tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
+static struct ve_bond *
+virtual_find_bond(struct virtual_engine *ve, struct intel_engine_cs *master)
+{
+	int i;
+
+	for (i = 0; i < ve->num_bonds; i++) {
+		if (ve->bonds[i].master == master)
+			return &ve->bonds[i];
+	}
+
+	return NULL;
+}
+
+static void
+virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
+{
+	struct virtual_engine *ve = to_virtual_engine(rq->engine);
+	struct ve_bond *bond;
+
+	bond = virtual_find_bond(ve, to_request(signal)->engine);
+	if (bond) {
+		intel_engine_mask_t old, new, cmp;
+
+		cmp = READ_ONCE(rq->execution_mask);
+		do {
+			old = cmp;
+			new = cmp & bond->sibling_mask;
+		} while ((cmp = cmpxchg(&rq->execution_mask, old, new)) != old);
+	}
+}
+
 struct intel_engine_cs *
 intel_execlists_create_virtual(struct i915_gem_context *ctx,
 			       struct intel_engine_cs **siblings,
@@ -3319,6 +3409,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 
 	ve->base.schedule = i915_schedule;
 	ve->base.submit_request = virtual_submit_request;
+	ve->base.bond_execute = virtual_bond_execute;
 
 	ve->base.execlists.queue_priority_hint = INT_MIN;
 	tasklet_init(&ve->base.execlists.tasklet,
@@ -3409,9 +3500,70 @@ intel_execlists_clone_virtual(struct i915_gem_context *ctx,
 	if (IS_ERR(dst))
 		return dst;
 
+	if (se->num_bonds) {
+		struct virtual_engine *de = to_virtual_engine(dst);
+
+		de->bonds = kmemdup(se->bonds,
+				    sizeof(*se->bonds) * se->num_bonds,
+				    GFP_KERNEL);
+		if (!de->bonds) {
+			intel_virtual_engine_destroy(dst);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		de->num_bonds = se->num_bonds;
+	}
+
 	return dst;
 }
 
+static unsigned long
+virtual_sibling_mask(struct virtual_engine *ve, unsigned long mask)
+{
+	unsigned long emask = 0;
+	int bit;
+
+	for_each_set_bit(bit, &mask, ve->count)
+		emask |= ve->siblings[bit]->mask;
+
+	return emask;
+}
+
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long mask)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct ve_bond *bond;
+
+	if (mask >> ve->count)
+		return -EINVAL;
+
+	mask = virtual_sibling_mask(ve, mask);
+	if (!mask)
+		return -EINVAL;
+
+	bond = virtual_find_bond(ve, master);
+	if (bond) {
+		bond->sibling_mask |= mask;
+		return 0;
+	}
+
+	bond = krealloc(ve->bonds,
+			sizeof(*bond) * (ve->num_bonds + 1),
+			GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond[ve->num_bonds].master = master;
+	bond[ve->num_bonds].sibling_mask = mask;
+
+	ve->bonds = bond;
+	ve->num_bonds++;
+
+	return 0;
+}
+
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
 {
 	struct virtual_engine *ve = to_virtual_engine(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 9d90dc68e02b..77b85648045a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -121,6 +121,10 @@ struct intel_engine_cs *
 intel_execlists_clone_virtual(struct i915_gem_context *ctx,
 			      struct intel_engine_cs *src);
 
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long mask);
+
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
 
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 6df033960350..bc8e13f80fb5 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -13,6 +13,7 @@
 #include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
 
@@ -1221,6 +1222,189 @@ static int live_virtual_engine(void *arg)
 	return err;
 }
 
+static int bond_virtual_engine(struct drm_i915_private *i915,
+			       unsigned int class,
+			       struct intel_engine_cs **siblings,
+			       unsigned int nsibling,
+			       unsigned int flags)
+#define BOND_SCHEDULE BIT(0)
+{
+	struct intel_engine_cs *master;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq[16];
+	enum intel_engine_id id;
+	unsigned long n;
+	int err;
+
+	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
+
+	ctx = kernel_context(i915);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = 0;
+	rq[0] = ERR_PTR(-ENOMEM);
+	for_each_engine(master, i915, id) {
+		struct i915_sw_fence fence = {};
+
+		if (master->class == class)
+			continue;
+
+		memset_p((void *)rq, ERR_PTR(-EINVAL), ARRAY_SIZE(rq));
+
+		rq[0] = i915_request_alloc(master, ctx);
+		if (IS_ERR(rq[0])) {
+			err = PTR_ERR(rq[0]);
+			goto out;
+		}
+		i915_request_get(rq[0]);
+
+		if (flags & BOND_SCHEDULE) {
+			onstack_fence_init(&fence);
+			err = i915_sw_fence_await_sw_fence_gfp(&rq[0]->submit,
+							       &fence,
+							       GFP_KERNEL);
+		}
+		i915_request_add(rq[0]);
+		if (err < 0)
+			goto out;
+
+		for (n = 0; n < nsibling; n++) {
+			struct intel_engine_cs *engine;
+
+			engine = intel_execlists_create_virtual(ctx,
+								siblings,
+								nsibling);
+			if (IS_ERR(engine)) {
+				err = PTR_ERR(engine);
+				onstack_fence_fini(&fence);
+				goto out;
+			}
+
+			err = intel_virtual_engine_attach_bond(engine,
+							       master,
+							       BIT(n));
+			if (err) {
+				intel_virtual_engine_destroy(engine);
+				onstack_fence_fini(&fence);
+				goto out;
+			}
+
+			rq[n + 1] = i915_request_alloc(engine, ctx);
+			if (IS_ERR(rq[n + 1])) {
+				err = PTR_ERR(rq[n + 1]);
+				intel_virtual_engine_destroy(engine);
+				onstack_fence_fini(&fence);
+				goto out;
+			}
+			i915_request_get(rq[n + 1]);
+
+			err = i915_request_await_execution(rq[n + 1],
+							   &rq[0]->fence,
+							   engine->bond_execute);
+			i915_request_add(rq[n + 1]);
+			intel_virtual_engine_destroy(engine);
+			if (err < 0) {
+				onstack_fence_fini(&fence);
+				goto out;
+			}
+		}
+		onstack_fence_fini(&fence);
+
+		if (i915_request_wait(rq[0],
+				      I915_WAIT_LOCKED,
+				      HZ / 10) < 0) {
+			pr_err("Master request did not execute (on %s)!\n",
+			       rq[0]->engine->name);
+			err = -EIO;
+			goto out;
+		}
+
+		for (n = 0; n < nsibling; n++) {
+			if (i915_request_wait(rq[n + 1],
+					      I915_WAIT_LOCKED,
+					      MAX_SCHEDULE_TIMEOUT) < 0) {
+				err = -EIO;
+				goto out;
+			}
+
+			if (rq[n + 1]->engine != siblings[n]) {
+				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
+				       siblings[n]->name,
+				       rq[n + 1]->engine->name,
+				       rq[0]->engine->name);
+				err = -EINVAL;
+				goto out;
+			}
+		}
+
+		for (n = 0; !IS_ERR(rq[n]); n++)
+			i915_request_put(rq[n]);
+		rq[0] = ERR_PTR(-ENOMEM);
+	}
+
+out:
+	for (n = 0; !IS_ERR(rq[n]); n++)
+		i915_request_put(rq[n]);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	kernel_context_close(ctx);
+	return err;
+}
+
+static int live_virtual_bond(void *arg)
+{
+	static const struct phase {
+		const char *name;
+		unsigned int flags;
+	} phases[] = {
+		{ "", 0 },
+		{ "schedule", BOND_SCHEDULE },
+		{ },
+	};
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	unsigned int class, inst;
+	int err = 0;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		const struct phase *p;
+		int nsibling;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			GEM_BUG_ON(nsibling == ARRAY_SIZE(siblings));
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (p = phases; p->name; p++) {
+			err = bond_virtual_engine(i915,
+						  class, siblings, nsibling,
+						  p->flags);
+			if (err) {
+				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
+				       __func__, p->name, class, nsibling, err);
+				goto out_unlock;
+			}
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1233,6 +1417,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
+		SUBTEST(live_virtual_bond),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
index 2bfa72c1654b..b976c12817c5 100644
--- a/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
+++ b/drivers/gpu/drm/i915/selftests/lib_sw_fence.c
@@ -45,6 +45,9 @@ void __onstack_fence_init(struct i915_sw_fence *fence,
 
 void onstack_fence_fini(struct i915_sw_fence *fence)
 {
+	if (!fence->flags)
+		return;
+
 	i915_sw_fence_commit(fence);
 	i915_sw_fence_fini(fence);
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 9c94c037d13b..0d9ca4fb9edb 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1532,6 +1532,10 @@ struct drm_i915_gem_context_param {
  * sized argument, will revert back to default settings.
  *
  * See struct i915_context_param_engines.
+ *
+ * Extensions:
+ *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
+ *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
  */
 #define I915_CONTEXT_PARAM_ENGINES	0xa
 /* Must be kept compact -- no holes and well documented */
@@ -1627,9 +1631,38 @@ struct i915_context_engines_load_balance {
 	__u64 mbz64[4]; /* reserved for future use; must be zero */
 };
 
+/*
+ * i915_context_engines_bond:
+ *
+ * Constructed bonded pairs for execution within a virtual engine.
+ *
+ * All engines are equal, but some are more equal than others. Given
+ * the distribution of resources in the HW, it may be preferable to run
+ * a request on a given subset of engines in parallel to a request on a
+ * specific engine. We enable this selection of engines within a virtual
+ * engine by specifying bonding pairs, for any given master engine we will
+ * only execute on one of the corresponding siblings within the virtual engine.
+ *
+ * To execute a request in parallel on the master engine and a sibling requires
+ * coordination with a I915_EXEC_FENCE_SUBMIT.
+ */
+struct i915_context_engines_bond {
+	struct i915_user_extension base;
+
+	__u16 virtual_index; /* index of virtual engine in ctx->engines[] */
+	__u16 mbz;
+
+	__u16 master_class;
+	__u16 master_instance;
+
+	__u64 sibling_mask; /* bitmask of BIT(sibling_index) wrt the v.engine */
+	__u64 flags; /* all undefined flags must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+#define I915_CONTEXT_ENGINES_EXT_BOND 1
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH 18/18] drm/i915: Allow specification of parallel execbuf
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (15 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 17/18] drm/i915/execlists: Virtual engine bonding Chris Wilson
@ 2019-03-19 11:57 ` Chris Wilson
  2019-03-19 12:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions Patchwork
                   ` (6 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 11:57 UTC (permalink / raw)
  To: intel-gfx

There is a desire to split a task onto two engines and have them run at
the same time, e.g. scanline interleaving to spread the workload evenly.
Through the use of the out-fence from the first execbuf, we can
coordinate secondary execbuf to only become ready simultaneously with
the first, so that with all things idle the second execbufs are executed
in parallel with the first. The key difference here between the new
EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
waits for the completion of the first request (so that all of its
rendering results are visible to the second execbuf, the more common
userspace fence requirement).

Since we only have a single input fence slot, userspace cannot mix an
in-fence and a submit-fence. It has to use one or the other! This is not
such a harsh requirement, since by virtue of the submit-fence, the
secondary execbuf inherit all of the dependencies from the first
request, and for the application the dependencies should be common
between the primary and secondary execbuf.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_fence/parallel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++++++++++++++++-
 include/uapi/drm/i915_drm.h                | 17 ++++++++++++++-
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 9a0fa3b21e9d..e7fdd9926266 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -421,6 +421,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_CAPTURE:
 	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
 	case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
+	case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 66b3921cc8bd..3e9a6892a7a9 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2281,6 +2281,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 {
 	struct i915_execbuffer eb;
 	struct dma_fence *in_fence = NULL;
+	struct dma_fence *exec_fence = NULL;
 	struct sync_file *out_fence = NULL;
 	intel_wakeref_t wakeref;
 	int out_fence_fd = -1;
@@ -2324,11 +2325,24 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			return -EINVAL;
 	}
 
+	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
+		if (in_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+
+		exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!exec_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+	}
+
 	if (args->flags & I915_EXEC_FENCE_OUT) {
 		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
 		if (out_fence_fd < 0) {
 			err = out_fence_fd;
-			goto err_in_fence;
+			goto err_exec_fence;
 		}
 	}
 
@@ -2460,6 +2474,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_request;
 	}
 
+	if (exec_fence) {
+		err = i915_request_await_execution(eb.request, exec_fence,
+						   eb.engine->bond_execute);
+		if (err < 0)
+			goto err_request;
+	}
+
 	if (fences) {
 		err = await_fence_array(&eb, fences);
 		if (err)
@@ -2520,6 +2541,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_out_fence:
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
+err_exec_fence:
+	dma_fence_put(exec_fence);
 err_in_fence:
 	dma_fence_put(in_fence);
 	return err;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0d9ca4fb9edb..08f680dd2b1c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -593,6 +593,12 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_MMAP_GTT_COHERENT	52
 
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
+ * execution through use of explicit fence support.
+ * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
+ */
+#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1115,7 +1121,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/*
+ * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_SUBMIT		(1 << 20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT << 1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (16 preceding siblings ...)
  2019-03-19 11:57 ` [PATCH 18/18] drm/i915: Allow specification of parallel execbuf Chris Wilson
@ 2019-03-19 12:09 ` Patchwork
  2019-03-19 12:18 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (5 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 12:09 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions
URL   : https://patchwork.freedesktop.org/series/58179/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
4af79bb9ca20 drm/i915/selftests: Provide stub reset functions
383e05f979e8 drm/i915: Flush pages on acquisition
b283b7c85d62 drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
-:652: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#652: 
new file mode 100644

-:657: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#657: FILE: drivers/gpu/drm/i915/i915_scheduler_types.h:1:
+/*

-:658: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#658: FILE: drivers/gpu/drm/i915/i915_scheduler_types.h:2:
+ * SPDX-License-Identifier: MIT

-:913: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#913: FILE: drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c:1:
+/*

-:914: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#914: FILE: drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c:2:
+ * SPDX-License-Identifier: MIT

total: 0 errors, 5 warnings, 0 checks, 730 lines checked
d0f146918e12 drm/i915: Separate GEM context construction and registration to userspace
0c91e73a8852 drm/i915: Introduce a mutex for file_priv->context_idr
82bce97b4b34 drm/i915: Stop storing ctx->user_handle
6d809b9cfe5a drm/i915: Stop storing the context name as the timeline name
b2401f795713 drm/i915: Introduce the i915_user_extension_method
-:72: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#72: 
new file mode 100644

-:77: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#77: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:1:
+/*

-:78: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#78: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:2:
+ * SPDX-License-Identifier: MIT

-:144: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#144: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:1:
+/*

-:145: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#145: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:2:
+ * SPDX-License-Identifier: MIT

-:178: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#178: FILE: drivers/gpu/drm/i915/i915_utils.h:114:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:178: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#178: FILE: drivers/gpu/drm/i915/i915_utils.h:114:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:178: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#178: FILE: drivers/gpu/drm/i915/i915_utils.h:114:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:198: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'U' - possible side-effects?
#198: FILE: drivers/gpu/drm/i915/i915_utils.h:134:
+#define check_user_mbz(U) ({						\
+	typeof(*(U)) mbz__;						\
+	get_user(mbz__, (U)) ? -EFAULT : mbz__ ? -EINVAL : 0;		\
+})

total: 0 errors, 5 warnings, 4 checks, 153 lines checked
704b8c113249 drm/i915: Create/destroy VM (ppGTT) for use with contexts
-:693: WARNING:LINE_SPACING: Missing a blank line after declarations
#693: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:504:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:755: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#755: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:566:
+		ncontexts = dw = 0;

-:829: WARNING:LINE_SPACING: Missing a blank line after declarations
#829: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:633:
+		struct drm_i915_gem_object *obj = NULL;
+		IGT_TIMEOUT(end_time);

-:901: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#901: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:688:
+		ncontexts = dw = 0;

-:1056: WARNING:LONG_LINE: line over 100 characters
#1056: FILE: include/uapi/drm/i915_drm.h:407:
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)

-:1057: WARNING:LONG_LINE: line over 100 characters
#1057: FILE: include/uapi/drm/i915_drm.h:408:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:1057: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#1057: FILE: include/uapi/drm/i915_drm.h:408:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:1057: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#1057: FILE: include/uapi/drm/i915_drm.h:408:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

total: 1 errors, 5 warnings, 2 checks, 998 lines checked
581f8d5da17c drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
-:28: WARNING:LONG_LINE: line over 100 characters
#28: FILE: drivers/gpu/drm/i915/i915_drv.c:3113:
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),

-:541: WARNING:LONG_LINE: line over 100 characters
#541: FILE: include/uapi/drm/i915_drm.h:397:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:541: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#541: FILE: include/uapi/drm/i915_drm.h:397:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:541: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#541: FILE: include/uapi/drm/i915_drm.h:397:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

total: 1 errors, 3 warnings, 0 checks, 701 lines checked
2480a6af6f15 drm/i915: Allow contexts to share a single timeline across all engines
74dfc612950c drm/i915: Allow userspace to clone contexts on creation
-:132: ERROR:BRACKET_SPACE: space prohibited before open square bracket '['
#132: FILE: drivers/gpu/drm/i915/i915_gem_context.c:1610:
+#define MAP(x, y) [ilog2(I915_CONTEXT_CLONE_##x)] = y

total: 1 errors, 0 warnings, 0 checks, 182 lines checked
3589b034c51f drm/i915: Allow a context to define its set of engines
-:482: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'p' - possible side-effects?
#482: FILE: drivers/gpu/drm/i915/i915_utils.h:107:
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))

-:482: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#482: FILE: drivers/gpu/drm/i915/i915_utils.h:107:
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))

-:482: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#482: FILE: drivers/gpu/drm/i915/i915_utils.h:107:
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))

total: 0 errors, 0 warnings, 3 checks, 490 lines checked
7ad2fb05d6cc drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
8c6e4c0af118 drm/i915: Load balancing across a virtual engine
-:955: WARNING:LINE_SPACING: Missing a blank line after declarations
#955: FILE: drivers/gpu/drm/i915/intel_lrc.c:3387:
+		struct intel_engine_cs *actual = ve->siblings[0];
+		intel_context_put(&ve->context);

total: 0 errors, 1 warnings, 0 checks, 1156 lines checked
93784e421678 drm/i915: Extend execution fence to support a callback
0c635289b98f drm/i915/execlists: Virtual engine bonding
230b3043a7e4 drm/i915: Allow specification of parallel execbuf

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (17 preceding siblings ...)
  2019-03-19 12:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions Patchwork
@ 2019-03-19 12:18 ` Patchwork
  2019-03-19 12:40 ` ✗ Fi.CI.BAT: failure " Patchwork
                   ` (4 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 12:18 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions
URL   : https://patchwork.freedesktop.org/series/58179/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/selftests: Provide stub reset functions
Okay!

Commit: drm/i915: Flush pages on acquisition
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3558:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3566:16: warning: expression using sizeof(void)

Commit: drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3566:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3565:16: warning: expression using sizeof(void)

Commit: drm/i915: Separate GEM context construction and registration to userspace
Okay!

Commit: drm/i915: Introduce a mutex for file_priv->context_idr
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3565:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)

Commit: drm/i915: Stop storing ctx->user_handle
Okay!

Commit: drm/i915: Stop storing the context name as the timeline name
Okay!

Commit: drm/i915: Introduce the i915_user_extension_method
Okay!

Commit: drm/i915: Create/destroy VM (ppGTT) for use with contexts
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1132:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1132:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3570:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)

Commit: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Okay!

Commit: drm/i915: Allow contexts to share a single timeline across all engines
Okay!

Commit: drm/i915: Allow userspace to clone contexts on creation
+drivers/gpu/drm/i915/i915_gem_context.c:1611:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1612:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1613:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1614:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1615:17: error: bad integer constant expression
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:452:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)
-./include/linux/slab.h:664:13: warning: call with no type!

Commit: drm/i915: Allow a context to define its set of engines
-O:drivers/gpu/drm/i915/i915_gem_context.c:1611:17: error: bad integer constant expression
-O:drivers/gpu/drm/i915/i915_gem_context.c:1612:17: error: bad integer constant expression
-O:drivers/gpu/drm/i915/i915_gem_context.c:1613:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1820:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1821:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1822:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1823:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_utils.h:84:13: error: incorrect type in conditional
+drivers/gpu/drm/i915/i915_utils.h:84:13: error: undefined identifier '__builtin_mul_overflow'
+drivers/gpu/drm/i915/i915_utils.h:84:13:    got void
+drivers/gpu/drm/i915/i915_utils.h:87:13: error: incorrect type in conditional
+drivers/gpu/drm/i915/i915_utils.h:87:13: error: undefined identifier '__builtin_add_overflow'
+drivers/gpu/drm/i915/i915_utils.h:87:13:    got void
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
Okay!

Commit: drm/i915: Load balancing across a virtual engine
+./include/linux/overflow.h:285:13: error: incorrect type in conditional
+./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/overflow.h:285:13:    got void
+./include/linux/overflow.h:285:13: warning: call with no type!
+./include/linux/overflow.h:287:13: error: incorrect type in conditional
+./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow'
+./include/linux/overflow.h:287:13:    got void
+./include/linux/overflow.h:287:13: warning: call with no type!

Commit: drm/i915: Extend execution fence to support a callback
Okay!

Commit: drm/i915/execlists: Virtual engine bonding
Okay!

Commit: drm/i915: Allow specification of parallel execbuf
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/18] drm/i915/selftests: Provide stub reset functions
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (18 preceding siblings ...)
  2019-03-19 12:18 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-03-19 12:40 ` Patchwork
  2019-03-19 13:12 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2) Patchwork
                   ` (3 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 12:40 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions
URL   : https://patchwork.freedesktop.org/series/58179/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5772 -> Patchwork_12511
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12511 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12511, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/58179/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12511:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live_requests:
    - fi-hsw-peppy:       NOTRUN -> INCOMPLETE
    - fi-ivb-3770:        PASS -> INCOMPLETE
    - fi-blb-e6850:       PASS -> INCOMPLETE
    - fi-gdg-551:         NOTRUN -> INCOMPLETE
    - fi-ilk-650:         PASS -> INCOMPLETE
    - fi-pnv-d510:        NOTRUN -> INCOMPLETE
    - fi-snb-2520m:       PASS -> INCOMPLETE
    - fi-bwr-2160:        PASS -> INCOMPLETE

  * igt@i915_selftest@live_workarounds:
    - fi-bsw-n3050:       NOTRUN -> INCOMPLETE
    - fi-cfl-guc:         PASS -> INCOMPLETE
    - fi-bsw-kefka:       NOTRUN -> INCOMPLETE
    - fi-hsw-4770r:       PASS -> INCOMPLETE
    - fi-cfl-8109u:       PASS -> INCOMPLETE
    - fi-kbl-guc:         PASS -> INCOMPLETE
    - fi-skl-6600u:       PASS -> INCOMPLETE
    - fi-bdw-5557u:       PASS -> INCOMPLETE
    - fi-kbl-7567u:       PASS -> INCOMPLETE
    - fi-whl-u:           PASS -> INCOMPLETE
    - fi-skl-iommu:       PASS -> INCOMPLETE
    - fi-skl-6770hq:      PASS -> INCOMPLETE
    - fi-kbl-x1275:       PASS -> INCOMPLETE
    - fi-icl-u3:          PASS -> INCOMPLETE
    - fi-skl-6260u:       PASS -> INCOMPLETE
    - fi-kbl-r:           PASS -> INCOMPLETE
    - fi-kbl-8809g:       PASS -> INCOMPLETE
    - fi-skl-guc:         PASS -> INCOMPLETE
    - fi-hsw-4770:        PASS -> INCOMPLETE
    - fi-cfl-8700k:       PASS -> INCOMPLETE

  * igt@runner@aborted:
    - fi-ilk-650:         NOTRUN -> FAIL
    - fi-pnv-d510:        NOTRUN -> FAIL
    - fi-cfl-8109u:       NOTRUN -> FAIL
    - fi-hsw-peppy:       NOTRUN -> FAIL
    - fi-gdg-551:         NOTRUN -> FAIL
    - fi-snb-2520m:       NOTRUN -> FAIL
    - fi-bxt-j4205:       NOTRUN -> FAIL
    - fi-whl-u:           NOTRUN -> FAIL
    - fi-icl-u3:          NOTRUN -> FAIL
    - fi-ivb-3770:        NOTRUN -> FAIL
    - fi-cfl-guc:         NOTRUN -> FAIL
    - fi-kbl-7567u:       NOTRUN -> FAIL
    - fi-blb-e6850:       NOTRUN -> FAIL
    - fi-kbl-x1275:       NOTRUN -> FAIL
    - fi-cfl-8700k:       NOTRUN -> FAIL
    - fi-kbl-8809g:       NOTRUN -> FAIL
    - fi-apl-guc:         NOTRUN -> FAIL
    - fi-kbl-r:           NOTRUN -> FAIL
    - fi-bwr-2160:        NOTRUN -> FAIL
    - fi-kbl-guc:         NOTRUN -> FAIL
    - fi-snb-2600:        NOTRUN -> FAIL
    - fi-elk-e7500:       NOTRUN -> FAIL

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * igt@i915_selftest@live_workarounds:
    - {fi-icl-y}:         PASS -> INCOMPLETE
    - {fi-skl-lmem}:      PASS -> INCOMPLETE

  * igt@runner@aborted:
    - {fi-icl-y}:         NOTRUN -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12511 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@query-info:
    - fi-bsw-kefka:       NOTRUN -> SKIP [fdo#109271] +55

  * igt@gem_ctx_create@basic-files:
    - fi-gdg-551:         NOTRUN -> SKIP [fdo#109271] +106

  * igt@gem_exec_basic@readonly-bsd2:
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] +76

  * igt@i915_selftest@live_requests:
    - fi-elk-e7500:       PASS -> INCOMPLETE [fdo#103989]
    - fi-snb-2600:        PASS -> INCOMPLETE [fdo#105411]

  * igt@i915_selftest@live_workarounds:
    - fi-byt-clapper:     PASS -> INCOMPLETE [fdo#102657]
    - fi-apl-guc:         PASS -> INCOMPLETE [fdo#103927]
    - fi-byt-n2820:       PASS -> INCOMPLETE [fdo#102657]
    - fi-skl-6700k2:      NOTRUN -> INCOMPLETE [fdo#104108]
    - fi-byt-j1900:       PASS -> INCOMPLETE [fdo#102657]
    - fi-bxt-j4205:       PASS -> INCOMPLETE [fdo#103927]
    - fi-skl-gvtdvm:      PASS -> INCOMPLETE [fdo#105600]
    - fi-bdw-gvtdvm:      PASS -> INCOMPLETE [fdo#105600]

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         NOTRUN -> FAIL [fdo#103182] +1
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +1

  * igt@kms_busy@basic-flip-c:
    - fi-gdg-551:         NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-bsw-kefka:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_chamelium@dp-hpd-fast:
    - fi-skl-6700k2:      NOTRUN -> SKIP [fdo#109271] +41

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] +62

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-hsw-peppy:       NOTRUN -> SKIP [fdo#109271] +46

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       NOTRUN -> DMESG-FAIL [fdo#102614] / [fdo#107814]

  * igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  * igt@runner@aborted:
    - fi-skl-iommu:       NOTRUN -> FAIL [fdo#104108]
    - fi-skl-guc:         NOTRUN -> FAIL [fdo#104108]
    - fi-skl-6700k2:      NOTRUN -> FAIL [fdo#104108]
    - fi-skl-6600u:       NOTRUN -> FAIL [fdo#104108]
    - fi-skl-6770hq:      NOTRUN -> FAIL [fdo#104108]
    - fi-skl-gvtdvm:      NOTRUN -> FAIL [fdo#104108]
    - fi-skl-6260u:       NOTRUN -> FAIL [fdo#104108]

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s4-devices:
    - {fi-icl-y}:         DMESG-WARN [fdo#109638] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a:
    - fi-byt-clapper:     FAIL [fdo#107362] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#102657]: https://bugs.freedesktop.org/show_bug.cgi?id=102657
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#103989]: https://bugs.freedesktop.org/show_bug.cgi?id=103989
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#105600]: https://bugs.freedesktop.org/show_bug.cgi?id=105600
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107814]: https://bugs.freedesktop.org/show_bug.cgi?id=107814
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109638]: https://bugs.freedesktop.org/show_bug.cgi?id=109638


Participating hosts (41 -> 42)
------------------------------

  Additional (6): fi-bsw-n3050 fi-hsw-peppy fi-gdg-551 fi-pnv-d510 fi-bsw-kefka fi-skl-6700k2 
  Missing    (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-ctg-p8600 fi-bdw-samus 


Build changes
-------------

    * Linux: CI_DRM_5772 -> Patchwork_12511

  CI_DRM_5772: 16930b29faa6d6fe08f44affe7753c85db95258f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4889: e3faf0fd49b7e3a763bf89e11fb4fdce81839da2 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12511: 230b3043a7e4ba3c701134d88c14f33db0338fbb @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

230b3043a7e4 drm/i915: Allow specification of parallel execbuf
0c635289b98f drm/i915/execlists: Virtual engine bonding
93784e421678 drm/i915: Extend execution fence to support a callback
8c6e4c0af118 drm/i915: Load balancing across a virtual engine
7ad2fb05d6cc drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
3589b034c51f drm/i915: Allow a context to define its set of engines
74dfc612950c drm/i915: Allow userspace to clone contexts on creation
2480a6af6f15 drm/i915: Allow contexts to share a single timeline across all engines
581f8d5da17c drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
704b8c113249 drm/i915: Create/destroy VM (ppGTT) for use with contexts
b2401f795713 drm/i915: Introduce the i915_user_extension_method
6d809b9cfe5a drm/i915: Stop storing the context name as the timeline name
82bce97b4b34 drm/i915: Stop storing ctx->user_handle
0c91e73a8852 drm/i915: Introduce a mutex for file_priv->context_idr
d0f146918e12 drm/i915: Separate GEM context construction and registration to userspace
b283b7c85d62 drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
383e05f979e8 drm/i915: Flush pages on acquisition
4af79bb9ca20 drm/i915/selftests: Provide stub reset functions

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12511/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH v2] drm/i915: Stop storing ctx->user_handle
  2019-03-19 11:57 ` [PATCH 06/18] drm/i915: Stop storing ctx->user_handle Chris Wilson
@ 2019-03-19 12:58   ` Chris Wilson
  2019-03-20 10:43     ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 12:58 UTC (permalink / raw)
  To: intel-gfx

The user_handle need only be known by userspace for it to lookup the
context via the idr; internally we have no use for it.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_debugfs.c           |  5 ++--
 drivers/gpu/drm/i915/i915_gem_context.c       | 23 ++++++++-----------
 drivers/gpu/drm/i915/i915_gem_context.h       |  5 ----
 drivers/gpu/drm/i915/i915_gem_context_types.h |  9 --------
 drivers/gpu/drm/i915/i915_gpu_error.c         | 11 ++++-----
 drivers/gpu/drm/i915/i915_gpu_error.h         |  1 -
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
 7 files changed, 16 insertions(+), 40 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index f4a07190a0e8..7970770f23a9 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -409,9 +409,8 @@ static void print_context_stats(struct seq_file *m,
 
 			rcu_read_lock();
 			task = pid_task(ctx->pid ?: file->pid, PIDTYPE_PID);
-			snprintf(name, sizeof(name), "%s/%d",
-				 task ? task->comm : "<unknown>",
-				 ctx->user_handle);
+			snprintf(name, sizeof(name), "%s",
+				 task ? task->comm : "<unknown>");
 			rcu_read_unlock();
 
 			print_file_stats(m, name, stats);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 799684d05704..95c5103e15a5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -602,20 +602,15 @@ static int gem_context_register(struct i915_gem_context *ctx,
 
 	/* And finally expose ourselves to userspace via the idr */
 	mutex_lock(&fpriv->context_idr_lock);
-	ret = idr_alloc(&fpriv->context_idr, ctx,
-			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
-	if (ret >= 0)
-		ctx->user_handle = ret;
+	ret = idr_alloc(&fpriv->context_idr, ctx, 0, 0, GFP_KERNEL);
 	mutex_unlock(&fpriv->context_idr_lock);
-	if (ret < 0)
-		goto err_name;
-
-	return 0;
+	if (ret >= 0)
+		goto out;
 
-err_name:
 	kfree(fetch_and_zero(&ctx->name));
 err_pid:
 	put_pid(fetch_and_zero(&ctx->pid));
+out:
 	return ret;
 }
 
@@ -638,11 +633,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	}
 
 	err = gem_context_register(ctx, file_priv);
-	if (err)
+	if (err < 0)
 		goto err_ctx;
 
-	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
 	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
+	GEM_BUG_ON(err > 0);
 
 	return 0;
 
@@ -852,10 +847,10 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 		return PTR_ERR(ctx);
 
 	ret = gem_context_register(ctx, file_priv);
-	if (ret)
+	if (ret < 0)
 		goto err_ctx;
 
-	args->ctx_id = ctx->user_handle;
+	args->ctx_id = ret;
 	DRM_DEBUG("HW context %d created\n", args->ctx_id);
 
 	return 0;
@@ -877,7 +872,7 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 	if (args->pad != 0)
 		return -EINVAL;
 
-	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
+	if (!args->ctx_id)
 		return -ENOENT;
 
 	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 8a1377691d6d..849b2a83c1ec 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -126,11 +126,6 @@ static inline void i915_gem_context_unpin_hw_id(struct i915_gem_context *ctx)
 	atomic_dec(&ctx->hw_id_pin_count);
 }
 
-static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
-{
-	return c->user_handle == DEFAULT_CONTEXT_HANDLE;
-}
-
 static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
 {
 	return !ctx->file_priv;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index 2bf19730eaa9..63ae8eb21939 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -129,15 +129,6 @@ struct i915_gem_context {
 	struct list_head active_engines;
 	struct mutex mutex;
 
-	/**
-	 * @user_handle: userspace identifier
-	 *
-	 * A unique per-file identifier is generated from
-	 * &drm_i915_file_private.contexts.
-	 */
-	u32 user_handle;
-#define DEFAULT_CONTEXT_HANDLE 0
-
 	struct i915_sched_attr sched;
 
 	/** hw_contexts: per-engine logical HW state */
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
index e8674347f589..b101f037b61f 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.c
+++ b/drivers/gpu/drm/i915/i915_gpu_error.c
@@ -454,8 +454,8 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
 				const char *header,
 				const struct drm_i915_error_context *ctx)
 {
-	err_printf(m, "%s%s[%d] user_handle %d hw_id %d, prio %d, guilty %d active %d\n",
-		   header, ctx->comm, ctx->pid, ctx->handle, ctx->hw_id,
+	err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
+		   header, ctx->comm, ctx->pid, ctx->hw_id,
 		   ctx->sched_attr.priority, ctx->guilty, ctx->active);
 }
 
@@ -758,11 +758,9 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
 		if (obj) {
 			err_puts(m, m->i915->engine[i]->name);
 			if (ee->context.pid)
-				err_printf(m, " (submitted by %s [%d], ctx %d [%d])",
+				err_printf(m, " (submitted by %s [%d])",
 					   ee->context.comm,
-					   ee->context.pid,
-					   ee->context.handle,
-					   ee->context.hw_id);
+					   ee->context.pid);
 			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
 				   upper_32_bits(obj->gtt_offset),
 				   lower_32_bits(obj->gtt_offset));
@@ -1330,7 +1328,6 @@ static void record_context(struct drm_i915_error_context *e,
 		rcu_read_unlock();
 	}
 
-	e->handle = ctx->user_handle;
 	e->hw_id = ctx->hw_id;
 	e->sched_attr = ctx->sched;
 	e->guilty = atomic_read(&ctx->guilty_count);
diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
index d011cb90bee1..5dc761e85d9d 100644
--- a/drivers/gpu/drm/i915/i915_gpu_error.h
+++ b/drivers/gpu/drm/i915/i915_gpu_error.h
@@ -116,7 +116,6 @@ struct i915_gpu_state {
 		struct drm_i915_error_context {
 			char comm[TASK_COMM_LEN];
 			pid_t pid;
-			u32 handle;
 			u32 hw_id;
 			int active;
 			int guilty;
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 1cc8be732435..d63025dc1d83 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -98,7 +98,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
 		return ctx;
 
 	err = gem_context_register(ctx, file->driver_priv);
-	if (err)
+	if (err < 0)
 		goto err_ctx;
 
 	return ctx;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (19 preceding siblings ...)
  2019-03-19 12:40 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2019-03-19 13:12 ` Patchwork
  2019-03-19 13:21 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 13:12 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
URL   : https://patchwork.freedesktop.org/series/58179/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
58fb48773f00 drm/i915/selftests: Provide stub reset functions
8ad2cbde4c11 drm/i915: Flush pages on acquisition
f4793bc16874 drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
-:652: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#652: 
new file mode 100644

-:657: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#657: FILE: drivers/gpu/drm/i915/i915_scheduler_types.h:1:
+/*

-:658: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#658: FILE: drivers/gpu/drm/i915/i915_scheduler_types.h:2:
+ * SPDX-License-Identifier: MIT

-:913: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#913: FILE: drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c:1:
+/*

-:914: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#914: FILE: drivers/gpu/drm/i915/test_i915_scheduler_types_standalone.c:2:
+ * SPDX-License-Identifier: MIT

total: 0 errors, 5 warnings, 0 checks, 730 lines checked
1b198564eec8 drm/i915: Separate GEM context construction and registration to userspace
e56b4575afb8 drm/i915: Introduce a mutex for file_priv->context_idr
9a2c41a4a900 drm/i915: Stop storing ctx->user_handle
ad0bb57b1067 drm/i915: Stop storing the context name as the timeline name
49a44c5b96d9 drm/i915: Introduce the i915_user_extension_method
-:72: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#72: 
new file mode 100644

-:77: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#77: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:1:
+/*

-:78: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#78: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:2:
+ * SPDX-License-Identifier: MIT

-:144: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#144: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:1:
+/*

-:145: WARNING:SPDX_LICENSE_TAG: Misplaced SPDX-License-Identifier tag - use line 1 instead
#145: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:2:
+ * SPDX-License-Identifier: MIT

-:178: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#178: FILE: drivers/gpu/drm/i915/i915_utils.h:114:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:178: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#178: FILE: drivers/gpu/drm/i915/i915_utils.h:114:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:178: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#178: FILE: drivers/gpu/drm/i915/i915_utils.h:114:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:198: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'U' - possible side-effects?
#198: FILE: drivers/gpu/drm/i915/i915_utils.h:134:
+#define check_user_mbz(U) ({						\
+	typeof(*(U)) mbz__;						\
+	get_user(mbz__, (U)) ? -EFAULT : mbz__ ? -EINVAL : 0;		\
+})

total: 0 errors, 5 warnings, 4 checks, 153 lines checked
da478a2b2e84 drm/i915: Create/destroy VM (ppGTT) for use with contexts
-:693: WARNING:LINE_SPACING: Missing a blank line after declarations
#693: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:504:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:755: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#755: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:566:
+		ncontexts = dw = 0;

-:829: WARNING:LINE_SPACING: Missing a blank line after declarations
#829: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:633:
+		struct drm_i915_gem_object *obj = NULL;
+		IGT_TIMEOUT(end_time);

-:901: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#901: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:688:
+		ncontexts = dw = 0;

-:1056: WARNING:LONG_LINE: line over 100 characters
#1056: FILE: include/uapi/drm/i915_drm.h:407:
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)

-:1057: WARNING:LONG_LINE: line over 100 characters
#1057: FILE: include/uapi/drm/i915_drm.h:408:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:1057: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#1057: FILE: include/uapi/drm/i915_drm.h:408:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:1057: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#1057: FILE: include/uapi/drm/i915_drm.h:408:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

total: 1 errors, 5 warnings, 2 checks, 998 lines checked
ab79a55630ef drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
-:28: WARNING:LONG_LINE: line over 100 characters
#28: FILE: drivers/gpu/drm/i915/i915_drv.c:3113:
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),

-:541: WARNING:LONG_LINE: line over 100 characters
#541: FILE: include/uapi/drm/i915_drm.h:397:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:541: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#541: FILE: include/uapi/drm/i915_drm.h:397:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:541: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#541: FILE: include/uapi/drm/i915_drm.h:397:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

total: 1 errors, 3 warnings, 0 checks, 701 lines checked
32f18f1740ab drm/i915: Allow contexts to share a single timeline across all engines
281831a71ef7 drm/i915: Allow userspace to clone contexts on creation
-:132: ERROR:BRACKET_SPACE: space prohibited before open square bracket '['
#132: FILE: drivers/gpu/drm/i915/i915_gem_context.c:1610:
+#define MAP(x, y) [ilog2(I915_CONTEXT_CLONE_##x)] = y

total: 1 errors, 0 warnings, 0 checks, 182 lines checked
a9295a3f81ed drm/i915: Allow a context to define its set of engines
-:482: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'p' - possible side-effects?
#482: FILE: drivers/gpu/drm/i915/i915_utils.h:107:
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))

-:482: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#482: FILE: drivers/gpu/drm/i915/i915_utils.h:107:
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))

-:482: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#482: FILE: drivers/gpu/drm/i915/i915_utils.h:107:
+#define check_struct_size(p, member, n, sz) \
+	likely(__check_struct_size(sizeof(*(p)), \
+				   sizeof(*(p)->member) + __must_be_array((p)->member), \
+				   n, sz))

total: 0 errors, 0 warnings, 3 checks, 490 lines checked
d64f177b1ad9 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
8df62463e41e drm/i915: Load balancing across a virtual engine
-:955: WARNING:LINE_SPACING: Missing a blank line after declarations
#955: FILE: drivers/gpu/drm/i915/intel_lrc.c:3387:
+		struct intel_engine_cs *actual = ve->siblings[0];
+		intel_context_put(&ve->context);

total: 0 errors, 1 warnings, 0 checks, 1156 lines checked
9a7b936e9c73 drm/i915: Extend execution fence to support a callback
c584d1697f6f drm/i915/execlists: Virtual engine bonding
d1d3ac6c6916 drm/i915: Allow specification of parallel execbuf

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (20 preceding siblings ...)
  2019-03-19 13:12 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2) Patchwork
@ 2019-03-19 13:21 ` Patchwork
  2019-03-19 13:32 ` ✓ Fi.CI.BAT: success " Patchwork
  2019-03-19 21:14 ` ✗ Fi.CI.IGT: failure " Patchwork
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 13:21 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
URL   : https://patchwork.freedesktop.org/series/58179/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915/selftests: Provide stub reset functions
Okay!

Commit: drm/i915: Flush pages on acquisition
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3558:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3566:16: warning: expression using sizeof(void)

Commit: drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3566:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3565:16: warning: expression using sizeof(void)

Commit: drm/i915: Separate GEM context construction and registration to userspace
Okay!

Commit: drm/i915: Introduce a mutex for file_priv->context_idr
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3565:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)

Commit: drm/i915: Stop storing ctx->user_handle
Okay!

Commit: drm/i915: Stop storing the context name as the timeline name
Okay!

Commit: drm/i915: Introduce the i915_user_extension_method
Okay!

Commit: drm/i915: Create/destroy VM (ppGTT) for use with contexts
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3567:16: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1132:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1132:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3570:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)

Commit: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Okay!

Commit: drm/i915: Allow contexts to share a single timeline across all engines
Okay!

Commit: drm/i915: Allow userspace to clone contexts on creation
+drivers/gpu/drm/i915/i915_gem_context.c:1611:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1612:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1613:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1614:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1615:17: error: bad integer constant expression
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:452:16: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:569:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)
-drivers/gpu/drm/i915/selftests/i915_gem_context.c:691:33: warning: expression using sizeof(void)
-./include/linux/slab.h:664:13: warning: call with no type!

Commit: drm/i915: Allow a context to define its set of engines
-O:drivers/gpu/drm/i915/i915_gem_context.c:1611:17: error: bad integer constant expression
-O:drivers/gpu/drm/i915/i915_gem_context.c:1612:17: error: bad integer constant expression
-O:drivers/gpu/drm/i915/i915_gem_context.c:1613:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1820:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1821:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1822:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_gem_context.c:1823:17: error: bad integer constant expression
+drivers/gpu/drm/i915/i915_utils.h:84:13: error: incorrect type in conditional
+drivers/gpu/drm/i915/i915_utils.h:84:13: error: undefined identifier '__builtin_mul_overflow'
+drivers/gpu/drm/i915/i915_utils.h:84:13:    got void
+drivers/gpu/drm/i915/i915_utils.h:87:13: error: incorrect type in conditional
+drivers/gpu/drm/i915/i915_utils.h:87:13: error: undefined identifier '__builtin_add_overflow'
+drivers/gpu/drm/i915/i915_utils.h:87:13:    got void
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
Okay!

Commit: drm/i915: Load balancing across a virtual engine
+./include/linux/overflow.h:285:13: error: incorrect type in conditional
+./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/overflow.h:285:13:    got void
+./include/linux/overflow.h:285:13: warning: call with no type!
+./include/linux/overflow.h:287:13: error: incorrect type in conditional
+./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow'
+./include/linux/overflow.h:287:13:    got void
+./include/linux/overflow.h:287:13: warning: call with no type!

Commit: drm/i915: Extend execution fence to support a callback
Okay!

Commit: drm/i915/execlists: Virtual engine bonding
Okay!

Commit: drm/i915: Allow specification of parallel execbuf
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✓ Fi.CI.BAT: success for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (21 preceding siblings ...)
  2019-03-19 13:21 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-03-19 13:32 ` Patchwork
  2019-03-19 21:14 ` ✗ Fi.CI.IGT: failure " Patchwork
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 13:32 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
URL   : https://patchwork.freedesktop.org/series/58179/
State : success

== Summary ==

CI Bug Log - changes from CI_DRM_5772 -> Patchwork_12513
====================================================

Summary
-------

  **SUCCESS**

  No regressions found.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/58179/revisions/2/mbox/

Known issues
------------

  Here are the changes found in Patchwork_12513 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@amdgpu/amd_basic@query-info:
    - fi-bsw-kefka:       NOTRUN -> SKIP [fdo#109271] +55

  * igt@gem_ctx_create@basic-files:
    - fi-gdg-551:         NOTRUN -> SKIP [fdo#109271] +106

  * igt@gem_exec_basic@readonly-bsd2:
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] +76

  * igt@kms_busy@basic-flip-a:
    - fi-gdg-551:         NOTRUN -> FAIL [fdo#103182]
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +1

  * igt@kms_busy@basic-flip-c:
    - fi-gdg-551:         NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-bsw-kefka:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-pnv-d510:        NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_chamelium@dp-hpd-fast:
    - fi-skl-6700k2:      NOTRUN -> SKIP [fdo#109271] +41

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] +62

  * igt@kms_chamelium@hdmi-edid-read:
    - fi-hsw-peppy:       NOTRUN -> SKIP [fdo#109271] +46

  * igt@kms_frontbuffer_tracking@basic:
    - fi-hsw-peppy:       NOTRUN -> DMESG-FAIL [fdo#102614] / [fdo#107814]

  * igt@kms_pipe_crc_basic@read-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     PASS -> FAIL [fdo#103191] / [fdo#107362]

  
#### Possible fixes ####

  * igt@gem_exec_suspend@basic-s4-devices:
    - {fi-icl-y}:         DMESG-WARN [fdo#109638] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a:
    - fi-byt-clapper:     FAIL [fdo#107362] -> PASS

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-a-frame-sequence:
    - fi-byt-clapper:     FAIL [fdo#103191] / [fdo#107362] -> PASS

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#102614]: https://bugs.freedesktop.org/show_bug.cgi?id=102614
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#107814]: https://bugs.freedesktop.org/show_bug.cgi?id=107814
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109638]: https://bugs.freedesktop.org/show_bug.cgi?id=109638


Participating hosts (41 -> 42)
------------------------------

  Additional (6): fi-bsw-n3050 fi-hsw-peppy fi-gdg-551 fi-pnv-d510 fi-bsw-kefka fi-skl-6700k2 
  Missing    (5): fi-ilk-m540 fi-hsw-4200u fi-byt-squawks fi-ctg-p8600 fi-bdw-samus 


Build changes
-------------

    * Linux: CI_DRM_5772 -> Patchwork_12513

  CI_DRM_5772: 16930b29faa6d6fe08f44affe7753c85db95258f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4889: e3faf0fd49b7e3a763bf89e11fb4fdce81839da2 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12513: d1d3ac6c6916aef024319e1a68735327b326391f @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

d1d3ac6c6916 drm/i915: Allow specification of parallel execbuf
c584d1697f6f drm/i915/execlists: Virtual engine bonding
9a7b936e9c73 drm/i915: Extend execution fence to support a callback
8df62463e41e drm/i915: Load balancing across a virtual engine
d64f177b1ad9 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
a9295a3f81ed drm/i915: Allow a context to define its set of engines
281831a71ef7 drm/i915: Allow userspace to clone contexts on creation
32f18f1740ab drm/i915: Allow contexts to share a single timeline across all engines
ab79a55630ef drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
da478a2b2e84 drm/i915: Create/destroy VM (ppGTT) for use with contexts
49a44c5b96d9 drm/i915: Introduce the i915_user_extension_method
ad0bb57b1067 drm/i915: Stop storing the context name as the timeline name
9a2c41a4a900 drm/i915: Stop storing ctx->user_handle
e56b4575afb8 drm/i915: Introduce a mutex for file_priv->context_idr
1b198564eec8 drm/i915: Separate GEM context construction and registration to userspace
f4793bc16874 drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h
8ad2cbde4c11 drm/i915: Flush pages on acquisition
58fb48773f00 drm/i915/selftests: Provide stub reset functions

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12513/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace
  2019-03-19 11:57 ` [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace Chris Wilson
@ 2019-03-19 13:41   ` Tvrtko Ursulin
  2019-03-19 13:56     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-19 13:41 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> In later patches, it became apparent that userspace can see a partially
> constructed GEM context and begin using it before it was ready, to much
> hilarity. Close this window of opportunity by lifting the registration of
> the context with userspace (the insertion of the context into the filp's
> idr) to the very end of the CONTEXT_CREATE ioctl.

Thanks, persistent absence of change logs really helps me.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       | 138 ++++++++++--------
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |   7 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.h           |   8 +-
>   drivers/gpu/drm/i915/selftests/huge_pages.c   |   2 +-
>   .../gpu/drm/i915/selftests/i915_gem_context.c |  12 +-
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   2 +-
>   drivers/gpu/drm/i915/selftests/mock_context.c |  17 ++-
>   7 files changed, 111 insertions(+), 75 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 2fa24326307a..dff4220df911 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -337,15 +337,13 @@ static u32 default_desc_template(const struct drm_i915_private *i915,
>   }
>   
>   static struct i915_gem_context *
> -__create_hw_context(struct drm_i915_private *dev_priv,
> -		    struct drm_i915_file_private *file_priv)
> +__create_context(struct drm_i915_private *dev_priv)
>   {
>   	struct i915_gem_context *ctx;
> -	int ret;
>   	int i;
>   
>   	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
> -	if (ctx == NULL)
> +	if (!ctx)
>   		return ERR_PTR(-ENOMEM);
>   
>   	kref_init(&ctx->ref);
> @@ -362,29 +360,6 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   	INIT_LIST_HEAD(&ctx->handles_list);
>   	INIT_LIST_HEAD(&ctx->hw_id_link);
>   
> -	/* Default context will never have a file_priv */
> -	ret = DEFAULT_CONTEXT_HANDLE;
> -	if (file_priv) {
> -		ret = idr_alloc(&file_priv->context_idr, ctx,
> -				DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
> -		if (ret < 0)
> -			goto err_lut;
> -	}
> -	ctx->user_handle = ret;
> -
> -	ctx->file_priv = file_priv;
> -	if (file_priv) {
> -		ctx->pid = get_task_pid(current, PIDTYPE_PID);
> -		ctx->name = kasprintf(GFP_KERNEL, "%s[%d]/%x",
> -				      current->comm,
> -				      pid_nr(ctx->pid),
> -				      ctx->user_handle);
> -		if (!ctx->name) {
> -			ret = -ENOMEM;
> -			goto err_pid;
> -		}
> -	}
> -
>   	/* NB: Mark all slices as needing a remap so that when the context first
>   	 * loads it will restore whatever remap state already exists. If there
>   	 * is no remap info, it will be a NOP. */
> @@ -401,25 +376,10 @@ __create_hw_context(struct drm_i915_private *dev_priv,
>   		ctx->hang_timestamp[i] = jiffies - CONTEXT_FAST_HANG_JIFFIES;
>   
>   	return ctx;
> -
> -err_pid:
> -	put_pid(ctx->pid);
> -	idr_remove(&file_priv->context_idr, ctx->user_handle);
> -err_lut:
> -	context_close(ctx);
> -	return ERR_PTR(ret);
> -}
> -
> -static void __destroy_hw_context(struct i915_gem_context *ctx,
> -				 struct drm_i915_file_private *file_priv)
> -{
> -	idr_remove(&file_priv->context_idr, ctx->user_handle);
> -	context_close(ctx);
>   }
>   
>   static struct i915_gem_context *
> -i915_gem_create_context(struct drm_i915_private *dev_priv,
> -			struct drm_i915_file_private *file_priv)
> +i915_gem_create_context(struct drm_i915_private *dev_priv)
>   {
>   	struct i915_gem_context *ctx;
>   
> @@ -428,18 +388,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   	/* Reap the most stale context */
>   	contexts_free_first(dev_priv);
>   
> -	ctx = __create_hw_context(dev_priv, file_priv);
> +	ctx = __create_context(dev_priv);
>   	if (IS_ERR(ctx))
>   		return ctx;
>   
>   	if (HAS_FULL_PPGTT(dev_priv)) {
>   		struct i915_hw_ppgtt *ppgtt;
>   
> -		ppgtt = i915_ppgtt_create(dev_priv, file_priv);
> +		ppgtt = i915_ppgtt_create(dev_priv);
>   		if (IS_ERR(ppgtt)) {
>   			DRM_DEBUG_DRIVER("PPGTT setup failed (%ld)\n",
>   					 PTR_ERR(ppgtt));
> -			__destroy_hw_context(ctx, file_priv);
> +			context_close(ctx);
>   			return ERR_CAST(ppgtt);
>   		}
>   
> @@ -475,7 +435,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
>   	if (ret)
>   		return ERR_PTR(ret);
>   
> -	ctx = i915_gem_create_context(to_i915(dev), NULL);
> +	ctx = i915_gem_create_context(to_i915(dev));
>   	if (IS_ERR(ctx))
>   		goto out;
>   
> @@ -511,7 +471,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
>   	struct i915_gem_context *ctx;
>   	int err;
>   
> -	ctx = i915_gem_create_context(i915, NULL);
> +	ctx = i915_gem_create_context(i915);
>   	if (IS_ERR(ctx))
>   		return ctx;
>   
> @@ -625,25 +585,74 @@ static int context_idr_cleanup(int id, void *p, void *data)
>   	return 0;
>   }
>   
> +static int gem_context_register(struct i915_gem_context *ctx,
> +				struct drm_i915_file_private *fpriv)
> +{
> +	int ret;
> +
> +	ctx->file_priv = fpriv;
> +	if (ctx->ppgtt)
> +		ctx->ppgtt->vm.file = fpriv;
> +
> +	ctx->pid = get_task_pid(current, PIDTYPE_PID);
> +	ctx->name = kasprintf(GFP_KERNEL, "%s[%d]",
> +			      current->comm, pid_nr(ctx->pid));
> +	if (!ctx->name) {
> +		ret = -ENOMEM;
> +		goto err_pid;
> +	}
> +
> +	/* And finally expose ourselves to userspace via the idr */
> +	ret = idr_alloc(&fpriv->context_idr, ctx,
> +			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
> +	if (ret < 0)
> +		goto err_name;
> +
> +	ctx->user_handle = ret;
> +
> +	return 0;
> +
> +err_name:
> +	kfree(fetch_and_zero(&ctx->name));
> +err_pid:
> +	put_pid(fetch_and_zero(&ctx->pid));
> +	return ret;
> +}
> +
>   int i915_gem_context_open(struct drm_i915_private *i915,
>   			  struct drm_file *file)
>   {
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct i915_gem_context *ctx;
> +	int err;
>   
>   	idr_init(&file_priv->context_idr);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
> -	ctx = i915_gem_create_context(i915, file_priv);
> -	mutex_unlock(&i915->drm.struct_mutex);
> +
> +	ctx = i915_gem_create_context(i915);
>   	if (IS_ERR(ctx)) {
> -		idr_destroy(&file_priv->context_idr);
> -		return PTR_ERR(ctx);
> +		err = PTR_ERR(ctx);
> +		goto err;
>   	}
>   
> +	err = gem_context_register(ctx, file_priv);
> +	if (err)
> +		goto err_ctx;
> +
> +	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
>   	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
>   
> +	mutex_unlock(&i915->drm.struct_mutex);
> +
>   	return 0;
> +
> +err_ctx:
> +	context_close(ctx);
> +err:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	idr_destroy(&file_priv->context_idr);
> +	return PTR_ERR(ctx);
>   }
>   
>   void i915_gem_context_close(struct drm_file *file)
> @@ -835,17 +844,28 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> -	ctx = i915_gem_create_context(i915, file_priv);
> -	mutex_unlock(&dev->struct_mutex);
> -	if (IS_ERR(ctx))
> -		return PTR_ERR(ctx);
> +	ctx = i915_gem_create_context(i915);
> +	if (IS_ERR(ctx)) {
> +		ret = PTR_ERR(ctx);
> +		goto err_unlock;
> +	}
>   
> -	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
> +	ret = gem_context_register(ctx, file_priv);
> +	if (ret)
> +		goto err_ctx;
> +
> +	mutex_unlock(&dev->struct_mutex);
>   
>   	args->ctx_id = ctx->user_handle;
>   	DRM_DEBUG("HW context %d created\n", args->ctx_id);
>   
>   	return 0;
> +
> +err_ctx:
> +	context_close(ctx);
> +err_unlock:
> +	mutex_unlock(&dev->struct_mutex);
> +	return ret;
>   }
>   
>   int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> @@ -870,7 +890,9 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		goto out;
>   
> -	__destroy_hw_context(ctx, file_priv);
> +	idr_remove(&file_priv->context_idr, ctx->user_handle);
> +	context_close(ctx);
> +
>   	mutex_unlock(&dev->struct_mutex);
>   
>   out:
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b8055c8d4e71..b9e0e3a00223 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2069,8 +2069,7 @@ __hw_ppgtt_create(struct drm_i915_private *i915)
>   }
>   
>   struct i915_hw_ppgtt *
> -i915_ppgtt_create(struct drm_i915_private *i915,
> -		  struct drm_i915_file_private *fpriv)
> +i915_ppgtt_create(struct drm_i915_private *i915)
>   {
>   	struct i915_hw_ppgtt *ppgtt;
>   
> @@ -2078,8 +2077,6 @@ i915_ppgtt_create(struct drm_i915_private *i915,
>   	if (IS_ERR(ppgtt))
>   		return ppgtt;
>   
> -	ppgtt->vm.file = fpriv;
> -
>   	trace_i915_ppgtt_create(&ppgtt->vm);
>   
>   	return ppgtt;
> @@ -2657,7 +2654,7 @@ int i915_gem_init_aliasing_ppgtt(struct drm_i915_private *i915)
>   	struct i915_hw_ppgtt *ppgtt;
>   	int err;
>   
> -	ppgtt = i915_ppgtt_create(i915, ERR_PTR(-EPERM));
> +	ppgtt = i915_ppgtt_create(i915);
>   	if (IS_ERR(ppgtt))
>   		return PTR_ERR(ppgtt);
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 47a54fbd30bf..8fe03067e698 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -603,15 +603,17 @@ int i915_gem_init_ggtt(struct drm_i915_private *dev_priv);
>   void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv);
>   
>   int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
> -void i915_ppgtt_release(struct kref *kref);
> -struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
> -					struct drm_i915_file_private *fpriv);
> +
> +struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv);
>   void i915_ppgtt_close(struct i915_address_space *vm);
> +void i915_ppgtt_release(struct kref *kref);
> +
>   static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
>   {
>   	if (ppgtt)
>   		kref_get(&ppgtt->ref);
>   }
> +
>   static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
>   {
>   	if (ppgtt)
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index 218cfc361de3..c5c8ba6c059f 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1710,7 +1710,7 @@ int i915_gem_huge_page_mock_selftests(void)
>   	mkwrite_device_info(dev_priv)->ppgtt_size = 48;
>   
>   	mutex_lock(&dev_priv->drm.struct_mutex);
> -	ppgtt = i915_ppgtt_create(dev_priv, ERR_PTR(-ENODEV));
> +	ppgtt = i915_ppgtt_create(dev_priv);
>   	if (IS_ERR(ppgtt)) {
>   		err = PTR_ERR(ppgtt);
>   		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 028bdbb5f3a7..ed72400f2395 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
>   	}
>   
>   	for (n = 0; n < nctx; n++) {
> -		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
> +		ctx[n] = live_context(i915, file);
>   		if (IS_ERR(ctx[n])) {
>   			err = PTR_ERR(ctx[n]);
>   			goto out_unlock;
> @@ -514,7 +514,7 @@ static int igt_ctx_exec(void *arg)
>   		struct i915_gem_context *ctx;
>   		unsigned int id;
>   
> -		ctx = i915_gem_create_context(i915, file->driver_priv);
> +		ctx = live_context(i915, file);
>   		if (IS_ERR(ctx)) {
>   			err = PTR_ERR(ctx);
>   			goto out_unlock;
> @@ -960,7 +960,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   
> -	ctx = i915_gem_create_context(i915, file->driver_priv);
> +	ctx = live_context(i915, file);
>   	if (IS_ERR(ctx)) {
>   		ret = PTR_ERR(ctx);
>   		goto out_unlock;
> @@ -1070,7 +1070,7 @@ static int igt_ctx_readonly(void *arg)
>   	if (err)
>   		goto out_unlock;
>   
> -	ctx = i915_gem_create_context(i915, file->driver_priv);
> +	ctx = live_context(i915, file);
>   	if (IS_ERR(ctx)) {
>   		err = PTR_ERR(ctx);
>   		goto out_unlock;
> @@ -1390,13 +1390,13 @@ static int igt_vm_isolation(void *arg)
>   	if (err)
>   		goto out_unlock;
>   
> -	ctx_a = i915_gem_create_context(i915, file->driver_priv);
> +	ctx_a = live_context(i915, file);
>   	if (IS_ERR(ctx_a)) {
>   		err = PTR_ERR(ctx_a);
>   		goto out_unlock;
>   	}
>   
> -	ctx_b = i915_gem_create_context(i915, file->driver_priv);
> +	ctx_b = live_context(i915, file);
>   	if (IS_ERR(ctx_b)) {
>   		err = PTR_ERR(ctx_b);
>   		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 826fd51c331e..01084f6b4fb7 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1010,7 +1010,7 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>   		return PTR_ERR(file);
>   
>   	mutex_lock(&dev_priv->drm.struct_mutex);
> -	ppgtt = i915_ppgtt_create(dev_priv, file->driver_priv);
> +	ppgtt = i915_ppgtt_create(dev_priv);
>   	if (IS_ERR(ppgtt)) {
>   		err = PTR_ERR(ppgtt);
>   		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 8efa6892c6cd..1cc8be732435 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -88,9 +88,24 @@ void mock_init_contexts(struct drm_i915_private *i915)
>   struct i915_gem_context *
>   live_context(struct drm_i915_private *i915, struct drm_file *file)
>   {
> +	struct i915_gem_context *ctx;
> +	int err;
> +
>   	lockdep_assert_held(&i915->drm.struct_mutex);
>   
> -	return i915_gem_create_context(i915, file->driver_priv);
> +	ctx = i915_gem_create_context(i915);
> +	if (IS_ERR(ctx))
> +		return ctx;
> +
> +	err = gem_context_register(ctx, file->driver_priv);
> +	if (err)
> +		goto err_ctx;
> +
> +	return ctx;
> +
> +err_ctx:
> +	i915_gem_context_put(ctx);
> +	return ERR_PTR(err);
>   }
>   
>   struct i915_gem_context *
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace
  2019-03-19 13:41   ` Tvrtko Ursulin
@ 2019-03-19 13:56     ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-19 13:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-19 13:41:37)
> 
> On 19/03/2019 11:57, Chris Wilson wrote:
> > In later patches, it became apparent that userspace can see a partially
> > constructed GEM context and begin using it before it was ready, to much
> > hilarity. Close this window of opportunity by lifting the registration of
> > the context with userspace (the insertion of the context into the filp's
> > idr) to the very end of the CONTEXT_CREATE ioctl.
> 
> Thanks, persistent absence of change logs really helps me.

And nothing of significance changed, that was the point.

The debug name of the context was changed to avoid having the
self-referential problem of how to get the ctx-id but set the ctx->name
prior to inserting the id into the context; which could have been pushed
under the idr_lock, but I choose to take the course of removing the
extra user_handle as evidenced by the next few patches.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* ✗ Fi.CI.IGT: failure for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
  2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
                   ` (22 preceding siblings ...)
  2019-03-19 13:32 ` ✓ Fi.CI.BAT: success " Patchwork
@ 2019-03-19 21:14 ` Patchwork
  23 siblings, 0 replies; 47+ messages in thread
From: Patchwork @ 2019-03-19 21:14 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2)
URL   : https://patchwork.freedesktop.org/series/58179/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5772_full -> Patchwork_12513_full
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12513_full absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12513_full, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12513_full:

### IGT changes ###

#### Possible regressions ####

  * igt@gem_wait@write-wait-default:
    - shard-iclb:         PASS -> INCOMPLETE

  * igt@kms_busy@extended-modeset-hang-newfb-render-b:
    - shard-snb:          NOTRUN -> DMESG-WARN

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-a:
    - shard-hsw:          PASS -> DMESG-WARN

  * igt@kms_busy@extended-modeset-hang-newfb-with-reset-render-b:
    - shard-iclb:         PASS -> DMESG-WARN

  * igt@kms_busy@extended-pageflip-modeset-hang-oldfb-render-b:
    - shard-glk:          PASS -> DMESG-WARN

  
#### Warnings ####

  * igt@kms_psr@psr2_primary_blt:
    - shard-iclb:         SKIP [fdo#109441] -> FAIL

  
#### Suppressed ####

  The following results come from untrusted machines, tests, or statuses.
  They do not affect the overall result.

  * {igt@kms_plane@pixel-format-pipe-b-planes-source-clamping}:
    - shard-iclb:         PASS -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12513_full that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_store@pages-bsd2:
    - shard-hsw:          NOTRUN -> SKIP [fdo#109271] +5

  * igt@gem_tiled_fence_blits@normal:
    - shard-iclb:         PASS -> TIMEOUT [fdo#109673]

  * igt@i915_pm_rpm@i2c:
    - shard-iclb:         PASS -> DMESG-WARN [fdo#109982]

  * igt@kms_cursor_crc@cursor-256x85-random:
    - shard-apl:          PASS -> FAIL [fdo#103232] +2

  * igt@kms_cursor_crc@cursor-size-change:
    - shard-glk:          PASS -> FAIL [fdo#103232]

  * igt@kms_cursor_legacy@cursor-vs-flip-legacy:
    - shard-iclb:         PASS -> FAIL [fdo#103355]

  * igt@kms_cursor_legacy@pipe-c-torture-bo:
    - shard-kbl:          PASS -> DMESG-WARN [fdo#107122]

  * igt@kms_draw_crc@draw-method-xrgb2101010-mmap-gtt-untiled:
    - shard-skl:          PASS -> FAIL [fdo#103184] / [fdo#108472]

  * igt@kms_draw_crc@draw-method-xrgb8888-mmap-cpu-untiled:
    - shard-skl:          NOTRUN -> FAIL [fdo#108472]

  * igt@kms_fbcon_fbt@fbc-suspend:
    - shard-skl:          PASS -> INCOMPLETE [fdo#104108]

  * igt@kms_fbcon_fbt@psr-suspend:
    - shard-skl:          NOTRUN -> FAIL [fdo#103833] +1

  * igt@kms_flip@flip-vs-expired-vblank-interruptible:
    - shard-skl:          PASS -> FAIL [fdo#105363]

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-pwrite:
    - shard-iclb:         PASS -> FAIL [fdo#103167] +4

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-onoff:
    - shard-glk:          PASS -> FAIL [fdo#103167] +1

  * igt@kms_frontbuffer_tracking@fbc-2p-scndscrn-cur-indfb-draw-blt:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] +7

  * igt@kms_frontbuffer_tracking@fbc-stridechange:
    - shard-skl:          NOTRUN -> FAIL [fdo#105683]

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-shrfb-msflip-blt:
    - shard-skl:          NOTRUN -> FAIL [fdo#105682]

  * igt@kms_frontbuffer_tracking@fbcpsr-rgb101010-draw-mmap-cpu:
    - shard-iclb:         PASS -> FAIL [fdo#105682] / [fdo#109247]

  * igt@kms_frontbuffer_tracking@psr-1p-primscrn-pri-indfb-draw-mmap-cpu:
    - shard-iclb:         PASS -> FAIL [fdo#109247] +27

  * igt@kms_pipe_crc_basic@nonblocking-crc-pipe-e:
    - shard-snb:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +11

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-a:
    - shard-hsw:          PASS -> INCOMPLETE [fdo#103540]

  * igt@kms_pipe_crc_basic@suspend-read-crc-pipe-f:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +7

  * igt@kms_plane_alpha_blend@pipe-a-alpha-basic:
    - shard-hsw:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +1

  * igt@kms_plane_alpha_blend@pipe-a-coverage-7efc:
    - shard-skl:          NOTRUN -> FAIL [fdo#107815] / [fdo#108145]

  * igt@kms_plane_alpha_blend@pipe-b-alpha-transparant-fb:
    - shard-kbl:          NOTRUN -> FAIL [fdo#108145]
    - shard-skl:          NOTRUN -> FAIL [fdo#108145] +1

  * igt@kms_plane_scaling@pipe-a-scaler-with-clipping-clamping:
    - shard-glk:          PASS -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_psr@psr2_primary_page_flip:
    - shard-iclb:         PASS -> SKIP [fdo#109441] +2

  * igt@kms_psr@sprite_blt:
    - shard-iclb:         PASS -> FAIL [fdo#107383] +3

  * igt@kms_universal_plane@disable-primary-vs-flip-pipe-e:
    - shard-kbl:          NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_vblank@pipe-c-ts-continuation-dpms-suspend:
    - shard-apl:          PASS -> FAIL [fdo#104894] +1

  * igt@perf_pmu@busy-accuracy-50-vcs1:
    - shard-skl:          NOTRUN -> SKIP [fdo#109271] +87

  * igt@perf_pmu@busy-start-vcs1:
    - shard-snb:          NOTRUN -> SKIP [fdo#109271] +144

  
#### Possible fixes ####

  * igt@gem_softpin@noreloc-s3:
    - shard-skl:          INCOMPLETE [fdo#104108] / [fdo#107773] -> PASS

  * igt@i915_selftest@live_workarounds:
    - shard-iclb:         DMESG-FAIL [fdo#108954] -> PASS

  * igt@kms_ccs@pipe-a-crc-sprite-planes-basic:
    - shard-glk:          FAIL [fdo#108145] -> PASS

  * igt@kms_chv_cursor_fail@pipe-c-256x256-left-edge:
    - shard-skl:          FAIL [fdo#104671] -> PASS

  * igt@kms_cursor_crc@cursor-128x128-suspend:
    - shard-apl:          FAIL [fdo#103191] / [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-128x42-sliding:
    - shard-apl:          FAIL [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-64x21-offscreen:
    - shard-skl:          FAIL [fdo#103232] -> PASS

  * igt@kms_cursor_crc@cursor-alpha-opaque:
    - shard-apl:          FAIL [fdo#109350] -> PASS

  * igt@kms_cursor_legacy@2x-long-cursor-vs-flip-legacy:
    - shard-hsw:          FAIL [fdo#105767] -> PASS

  * igt@kms_flip@2x-flip-vs-expired-vblank:
    - shard-glk:          FAIL [fdo#105363] -> PASS

  * igt@kms_flip@flip-vs-expired-vblank:
    - shard-skl:          FAIL [fdo#105363] -> PASS

  * igt@kms_flip@flip-vs-suspend-interruptible:
    - shard-skl:          INCOMPLETE [fdo#109507] -> PASS

  * igt@kms_flip_tiling@flip-yf-tiled:
    - shard-skl:          FAIL [fdo#108145] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-1p-offscren-pri-indfb-draw-mmap-gtt:
    - shard-skl:          FAIL [fdo#105682] -> PASS +3

  * igt@kms_frontbuffer_tracking@fbc-1p-primscrn-spr-indfb-onoff:
    - shard-apl:          FAIL [fdo#103167] -> PASS +1

  * igt@kms_frontbuffer_tracking@fbc-1p-rte:
    - shard-apl:          FAIL [fdo#103167] / [fdo#105682] -> PASS

  * igt@kms_frontbuffer_tracking@fbc-2p-primscrn-spr-indfb-draw-mmap-cpu:
    - shard-glk:          FAIL [fdo#103167] -> PASS +3

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-indfb-draw-blt:
    - shard-iclb:         FAIL [fdo#109247] -> PASS +10

  * igt@kms_frontbuffer_tracking@fbcpsr-1p-primscrn-pri-shrfb-draw-blt:
    - shard-iclb:         FAIL [fdo#103167] -> PASS +4

  * {igt@kms_plane@pixel-format-pipe-c-planes-source-clamping}:
    - shard-iclb:         FAIL -> PASS

  * {igt@kms_plane@plane-position-covered-pipe-b-planes}:
    - shard-apl:          FAIL [fdo#110038] -> PASS

  * {igt@kms_plane@plane-position-covered-pipe-c-planes}:
    - shard-iclb:         FAIL [fdo#110038] -> PASS

  * igt@kms_plane_alpha_blend@pipe-c-coverage-7efc:
    - shard-skl:          FAIL [fdo#107815] -> PASS +1

  * {igt@kms_plane_multiple@atomic-pipe-a-tiling-y}:
    - shard-iclb:         FAIL [fdo#110037] -> PASS +1

  * {igt@kms_plane_multiple@atomic-pipe-a-tiling-yf}:
    - shard-apl:          FAIL [fdo#110037] -> PASS

  * {igt@kms_plane_multiple@atomic-pipe-b-tiling-none}:
    - shard-glk:          FAIL [fdo#110037] -> PASS +2

  * igt@kms_psr@cursor_mmap_gtt:
    - shard-iclb:         FAIL [fdo#107383] -> PASS +5

  * igt@kms_psr@psr2_cursor_plane_onoff:
    - shard-iclb:         SKIP [fdo#109441] -> PASS +2

  * igt@kms_rotation_crc@multiplane-rotation:
    - shard-kbl:          INCOMPLETE [fdo#103665] -> PASS

  * igt@kms_rotation_crc@multiplane-rotation-cropping-bottom:
    - shard-glk:          DMESG-FAIL [fdo#105763] / [fdo#106538] -> PASS
    - shard-kbl:          DMESG-FAIL [fdo#105763] -> PASS

  * igt@kms_rotation_crc@sprite-rotation-180:
    - shard-hsw:          INCOMPLETE [fdo#103540] -> PASS

  * igt@kms_setmode@basic:
    - shard-apl:          FAIL [fdo#99912] -> PASS
    - shard-kbl:          FAIL [fdo#99912] -> PASS

  * igt@kms_vblank@pipe-c-ts-continuation-suspend:
    - shard-iclb:         FAIL [fdo#104894] -> PASS +1

  * igt@perf@polling:
    - shard-iclb:         FAIL [fdo#108587] -> PASS

  
#### Warnings ####

  * igt@kms_dp_dsc@basic-dsc-enable-edp:
    - shard-iclb:         SKIP [fdo#109349] -> FAIL [fdo#109358]

  * igt@kms_plane_scaling@pipe-c-scaler-with-rotation:
    - shard-glk:          FAIL [fdo#110098] -> SKIP [fdo#109271] / [fdo#109278]

  
  {name}: This element is suppressed. This means it is ignored when computing
          the status of the difference (SUCCESS, WARNING, or FAILURE).

  [fdo#103167]: https://bugs.freedesktop.org/show_bug.cgi?id=103167
  [fdo#103184]: https://bugs.freedesktop.org/show_bug.cgi?id=103184
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103232]: https://bugs.freedesktop.org/show_bug.cgi?id=103232
  [fdo#103355]: https://bugs.freedesktop.org/show_bug.cgi?id=103355
  [fdo#103540]: https://bugs.freedesktop.org/show_bug.cgi?id=103540
  [fdo#103665]: https://bugs.freedesktop.org/show_bug.cgi?id=103665
  [fdo#103833]: https://bugs.freedesktop.org/show_bug.cgi?id=103833
  [fdo#104108]: https://bugs.freedesktop.org/show_bug.cgi?id=104108
  [fdo#104671]: https://bugs.freedesktop.org/show_bug.cgi?id=104671
  [fdo#104894]: https://bugs.freedesktop.org/show_bug.cgi?id=104894
  [fdo#105363]: https://bugs.freedesktop.org/show_bug.cgi?id=105363
  [fdo#105682]: https://bugs.freedesktop.org/show_bug.cgi?id=105682
  [fdo#105683]: https://bugs.freedesktop.org/show_bug.cgi?id=105683
  [fdo#105763]: https://bugs.freedesktop.org/show_bug.cgi?id=105763
  [fdo#105767]: https://bugs.freedesktop.org/show_bug.cgi?id=105767
  [fdo#106538]: https://bugs.freedesktop.org/show_bug.cgi?id=106538
  [fdo#107122]: https://bugs.freedesktop.org/show_bug.cgi?id=107122
  [fdo#107383]: https://bugs.freedesktop.org/show_bug.cgi?id=107383
  [fdo#107773]: https://bugs.freedesktop.org/show_bug.cgi?id=107773
  [fdo#107815]: https://bugs.freedesktop.org/show_bug.cgi?id=107815
  [fdo#108145]: https://bugs.freedesktop.org/show_bug.cgi?id=108145
  [fdo#108472]: https://bugs.freedesktop.org/show_bug.cgi?id=108472
  [fdo#108587]: https://bugs.freedesktop.org/show_bug.cgi?id=108587
  [fdo#108954]: https://bugs.freedesktop.org/show_bug.cgi?id=108954
  [fdo#109247]: https://bugs.freedesktop.org/show_bug.cgi?id=109247
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109349]: https://bugs.freedesktop.org/show_bug.cgi?id=109349
  [fdo#109350]: https://bugs.freedesktop.org/show_bug.cgi?id=109350
  [fdo#109358]: https://bugs.freedesktop.org/show_bug.cgi?id=109358
  [fdo#109441]: https://bugs.freedesktop.org/show_bug.cgi?id=109441
  [fdo#109507]: https://bugs.freedesktop.org/show_bug.cgi?id=109507
  [fdo#109673]: https://bugs.freedesktop.org/show_bug.cgi?id=109673
  [fdo#109982]: https://bugs.freedesktop.org/show_bug.cgi?id=109982
  [fdo#110037]: https://bugs.freedesktop.org/show_bug.cgi?id=110037
  [fdo#110038]: https://bugs.freedesktop.org/show_bug.cgi?id=110038
  [fdo#110098]: https://bugs.freedesktop.org/show_bug.cgi?id=110098
  [fdo#99912]: https://bugs.freedesktop.org/show_bug.cgi?id=99912


Participating hosts (10 -> 10)
------------------------------

  No changes in participating hosts


Build changes
-------------

    * Linux: CI_DRM_5772 -> Patchwork_12513

  CI_DRM_5772: 16930b29faa6d6fe08f44affe7753c85db95258f @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4889: e3faf0fd49b7e3a763bf89e11fb4fdce81839da2 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12513: d1d3ac6c6916aef024319e1a68735327b326391f @ git://anongit.freedesktop.org/gfx-ci/linux
  piglit_4509: fdc5a4ca11124ab8413c7988896eec4c97336694 @ git://anongit.freedesktop.org/piglit

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12513/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 05/18] drm/i915: Introduce a mutex for file_priv->context_idr
  2019-03-19 11:57 ` [PATCH 05/18] drm/i915: Introduce a mutex for file_priv->context_idr Chris Wilson
@ 2019-03-20 10:36   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 10:36 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> Define a mutex for the exclusive use of interacting with the per-file
> context-idr, that was previously guarded by struct_mutex. This allows us
> to reduce the coverage of struct_mutex, with a view to removing the last
> bits coordinating GEM context later. (In the short term, we avoid taking
> struct_mutex while using the extended constructor functions, preventing
> some nasty recursion.)
> 
> v2: s/context_lock/context_idr_lock/
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         |  2 ++
>   drivers/gpu/drm/i915/i915_gem_context.c | 47 +++++++++++--------------
>   2 files changed, 23 insertions(+), 26 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 86080a6e0f45..219348121897 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -216,7 +216,9 @@ struct drm_i915_file_private {
>    */
>   #define DRM_I915_THROTTLE_JIFFIES msecs_to_jiffies(20)
>   	} mm;
> +
>   	struct idr context_idr;
> +	struct mutex context_idr_lock; /* guards context_idr */
>   
>   	unsigned int bsd_engine;
>   
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index dff4220df911..799684d05704 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -579,9 +579,7 @@ void i915_gem_contexts_fini(struct drm_i915_private *i915)
>   
>   static int context_idr_cleanup(int id, void *p, void *data)
>   {
> -	struct i915_gem_context *ctx = p;
> -
> -	context_close(ctx);
> +	context_close(p);
>   	return 0;
>   }
>   
> @@ -603,13 +601,15 @@ static int gem_context_register(struct i915_gem_context *ctx,
>   	}
>   
>   	/* And finally expose ourselves to userspace via the idr */
> +	mutex_lock(&fpriv->context_idr_lock);
>   	ret = idr_alloc(&fpriv->context_idr, ctx,
>   			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
> +	if (ret >= 0)
> +		ctx->user_handle = ret;
> +	mutex_unlock(&fpriv->context_idr_lock);
>   	if (ret < 0)
>   		goto err_name;
>   
> -	ctx->user_handle = ret;
> -
>   	return 0;
>   
>   err_name:
> @@ -627,10 +627,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	int err;
>   
>   	idr_init(&file_priv->context_idr);
> +	mutex_init(&file_priv->context_idr_lock);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
> -
>   	ctx = i915_gem_create_context(i915);
> +	mutex_unlock(&i915->drm.struct_mutex);
>   	if (IS_ERR(ctx)) {
>   		err = PTR_ERR(ctx);
>   		goto err;
> @@ -643,14 +644,14 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
>   	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
>   
> -	mutex_unlock(&i915->drm.struct_mutex);
> -
>   	return 0;
>   
>   err_ctx:
> +	mutex_lock(&i915->drm.struct_mutex);
>   	context_close(ctx);
> -err:
>   	mutex_unlock(&i915->drm.struct_mutex);
> +err:
> +	mutex_destroy(&file_priv->context_idr_lock);
>   	idr_destroy(&file_priv->context_idr);
>   	return PTR_ERR(ctx);
>   }
> @@ -663,6 +664,7 @@ void i915_gem_context_close(struct drm_file *file)
>   
>   	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
>   	idr_destroy(&file_priv->context_idr);
> +	mutex_destroy(&file_priv->context_idr_lock);
>   }
>   
>   static struct i915_request *
> @@ -845,25 +847,22 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   		return ret;
>   
>   	ctx = i915_gem_create_context(i915);
> -	if (IS_ERR(ctx)) {
> -		ret = PTR_ERR(ctx);
> -		goto err_unlock;
> -	}
> +	mutex_unlock(&dev->struct_mutex);
> +	if (IS_ERR(ctx))
> +		return PTR_ERR(ctx);
>   
>   	ret = gem_context_register(ctx, file_priv);
>   	if (ret)
>   		goto err_ctx;
>   
> -	mutex_unlock(&dev->struct_mutex);
> -
>   	args->ctx_id = ctx->user_handle;
>   	DRM_DEBUG("HW context %d created\n", args->ctx_id);
>   
>   	return 0;
>   
>   err_ctx:
> +	mutex_lock(&dev->struct_mutex);
>   	context_close(ctx);
> -err_unlock:
>   	mutex_unlock(&dev->struct_mutex);
>   	return ret;
>   }
> @@ -874,7 +873,6 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	struct drm_i915_gem_context_destroy *args = data;
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct i915_gem_context *ctx;
> -	int ret;
>   
>   	if (args->pad != 0)
>   		return -EINVAL;
> @@ -882,21 +880,18 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
>   		return -ENOENT;
>   
> -	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
> +	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
> +		return -EINTR;
> +
> +	ctx = idr_remove(&file_priv->context_idr, args->ctx_id);
> +	mutex_unlock(&file_priv->context_idr_lock);
>   	if (!ctx)
>   		return -ENOENT;
>   
> -	ret = mutex_lock_interruptible(&dev->struct_mutex);
> -	if (ret)
> -		goto out;
> -
> -	idr_remove(&file_priv->context_idr, ctx->user_handle);
> +	mutex_lock(&dev->struct_mutex);
>   	context_close(ctx);
> -
>   	mutex_unlock(&dev->struct_mutex);
>   
> -out:
> -	i915_gem_context_put(ctx);
>   	return 0;
>   }
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH v2] drm/i915: Stop storing ctx->user_handle
  2019-03-19 12:58   ` [PATCH v2] " Chris Wilson
@ 2019-03-20 10:43     ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 10:43 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 12:58, Chris Wilson wrote:
> The user_handle need only be known by userspace for it to lookup the
> context via the idr; internally we have no use for it.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c           |  5 ++--
>   drivers/gpu/drm/i915/i915_gem_context.c       | 23 ++++++++-----------
>   drivers/gpu/drm/i915/i915_gem_context.h       |  5 ----
>   drivers/gpu/drm/i915/i915_gem_context_types.h |  9 --------
>   drivers/gpu/drm/i915/i915_gpu_error.c         | 11 ++++-----
>   drivers/gpu/drm/i915/i915_gpu_error.h         |  1 -
>   drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
>   7 files changed, 16 insertions(+), 40 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index f4a07190a0e8..7970770f23a9 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -409,9 +409,8 @@ static void print_context_stats(struct seq_file *m,
>   
>   			rcu_read_lock();
>   			task = pid_task(ctx->pid ?: file->pid, PIDTYPE_PID);
> -			snprintf(name, sizeof(name), "%s/%d",
> -				 task ? task->comm : "<unknown>",
> -				 ctx->user_handle);
> +			snprintf(name, sizeof(name), "%s",
> +				 task ? task->comm : "<unknown>");
>   			rcu_read_unlock();
>   
>   			print_file_stats(m, name, stats);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 799684d05704..95c5103e15a5 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -602,20 +602,15 @@ static int gem_context_register(struct i915_gem_context *ctx,
>   
>   	/* And finally expose ourselves to userspace via the idr */
>   	mutex_lock(&fpriv->context_idr_lock);
> -	ret = idr_alloc(&fpriv->context_idr, ctx,
> -			DEFAULT_CONTEXT_HANDLE, 0, GFP_KERNEL);
> -	if (ret >= 0)
> -		ctx->user_handle = ret;
> +	ret = idr_alloc(&fpriv->context_idr, ctx, 0, 0, GFP_KERNEL);
>   	mutex_unlock(&fpriv->context_idr_lock);
> -	if (ret < 0)
> -		goto err_name;
> -
> -	return 0;
> +	if (ret >= 0)
> +		goto out;
>   
> -err_name:
>   	kfree(fetch_and_zero(&ctx->name));
>   err_pid:
>   	put_pid(fetch_and_zero(&ctx->pid));
> +out:
>   	return ret;
>   }
>   
> @@ -638,11 +633,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	}
>   
>   	err = gem_context_register(ctx, file_priv);
> -	if (err)
> +	if (err < 0)
>   		goto err_ctx;
>   
> -	GEM_BUG_ON(ctx->user_handle != DEFAULT_CONTEXT_HANDLE);
>   	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
> +	GEM_BUG_ON(err > 0);
>   
>   	return 0;
>   
> @@ -852,10 +847,10 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   		return PTR_ERR(ctx);
>   
>   	ret = gem_context_register(ctx, file_priv);
> -	if (ret)
> +	if (ret < 0)
>   		goto err_ctx;
>   
> -	args->ctx_id = ctx->user_handle;
> +	args->ctx_id = ret;
>   	DRM_DEBUG("HW context %d created\n", args->ctx_id);
>   
>   	return 0;
> @@ -877,7 +872,7 @@ int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
>   	if (args->pad != 0)
>   		return -EINVAL;
>   
> -	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
> +	if (!args->ctx_id)
>   		return -ENOENT;
>   
>   	if (mutex_lock_interruptible(&file_priv->context_idr_lock))
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 8a1377691d6d..849b2a83c1ec 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -126,11 +126,6 @@ static inline void i915_gem_context_unpin_hw_id(struct i915_gem_context *ctx)
>   	atomic_dec(&ctx->hw_id_pin_count);
>   }
>   
> -static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
> -{
> -	return c->user_handle == DEFAULT_CONTEXT_HANDLE;
> -}
> -
>   static inline bool i915_gem_context_is_kernel(struct i915_gem_context *ctx)
>   {
>   	return !ctx->file_priv;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
> index 2bf19730eaa9..63ae8eb21939 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
> @@ -129,15 +129,6 @@ struct i915_gem_context {
>   	struct list_head active_engines;
>   	struct mutex mutex;
>   
> -	/**
> -	 * @user_handle: userspace identifier
> -	 *
> -	 * A unique per-file identifier is generated from
> -	 * &drm_i915_file_private.contexts.
> -	 */
> -	u32 user_handle;
> -#define DEFAULT_CONTEXT_HANDLE 0
> -
>   	struct i915_sched_attr sched;
>   
>   	/** hw_contexts: per-engine logical HW state */
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.c b/drivers/gpu/drm/i915/i915_gpu_error.c
> index e8674347f589..b101f037b61f 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.c
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.c
> @@ -454,8 +454,8 @@ static void error_print_context(struct drm_i915_error_state_buf *m,
>   				const char *header,
>   				const struct drm_i915_error_context *ctx)
>   {
> -	err_printf(m, "%s%s[%d] user_handle %d hw_id %d, prio %d, guilty %d active %d\n",
> -		   header, ctx->comm, ctx->pid, ctx->handle, ctx->hw_id,
> +	err_printf(m, "%s%s[%d] hw_id %d, prio %d, guilty %d active %d\n",
> +		   header, ctx->comm, ctx->pid, ctx->hw_id,
>   		   ctx->sched_attr.priority, ctx->guilty, ctx->active);
>   }
>   
> @@ -758,11 +758,9 @@ static void __err_print_to_sgl(struct drm_i915_error_state_buf *m,
>   		if (obj) {
>   			err_puts(m, m->i915->engine[i]->name);
>   			if (ee->context.pid)
> -				err_printf(m, " (submitted by %s [%d], ctx %d [%d])",
> +				err_printf(m, " (submitted by %s [%d])",
>   					   ee->context.comm,
> -					   ee->context.pid,
> -					   ee->context.handle,
> -					   ee->context.hw_id);
> +					   ee->context.pid);
>   			err_printf(m, " --- gtt_offset = 0x%08x %08x\n",
>   				   upper_32_bits(obj->gtt_offset),
>   				   lower_32_bits(obj->gtt_offset));
> @@ -1330,7 +1328,6 @@ static void record_context(struct drm_i915_error_context *e,
>   		rcu_read_unlock();
>   	}
>   
> -	e->handle = ctx->user_handle;
>   	e->hw_id = ctx->hw_id;
>   	e->sched_attr = ctx->sched;
>   	e->guilty = atomic_read(&ctx->guilty_count);
> diff --git a/drivers/gpu/drm/i915/i915_gpu_error.h b/drivers/gpu/drm/i915/i915_gpu_error.h
> index d011cb90bee1..5dc761e85d9d 100644
> --- a/drivers/gpu/drm/i915/i915_gpu_error.h
> +++ b/drivers/gpu/drm/i915/i915_gpu_error.h
> @@ -116,7 +116,6 @@ struct i915_gpu_state {
>   		struct drm_i915_error_context {
>   			char comm[TASK_COMM_LEN];
>   			pid_t pid;
> -			u32 handle;
>   			u32 hw_id;
>   			int active;
>   			int guilty;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 1cc8be732435..d63025dc1d83 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -98,7 +98,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
>   		return ctx;
>   
>   	err = gem_context_register(ctx, file->driver_priv);
> -	if (err)
> +	if (err < 0)
>   		goto err_ctx;
>   
>   	return ctx;
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-19 11:57 ` [PATCH 02/18] drm/i915: Flush pages on acquisition Chris Wilson
@ 2019-03-20 11:41   ` Matthew Auld
  2019-03-20 11:48     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2019-03-20 11:41 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> When we return pages to the system, we ensure that they are marked as
> being in the CPU domain since any external access is uncontrolled and we
> must assume the worst. This means that we need to always flush the pages
> on acquisition if we need to use them on the GPU, and from the beginning
> have used set-domain. Set-domain is overkill for the purpose as it is a
> general synchronisation barrier, but our intent is to only flush the
> pages being swapped in. If we move that flush into the pages acquisition
> phase, we know then that when we have obj->mm.pages, they are coherent
> with the GPU and need only maintain that status without resorting to
> heavy handed use of set-domain.
>
> The principle knock-on effect for userspace is through mmap-gtt
> pagefaulting. Our uAPI has always implied that the GTT mmap was async
> (especially as when any pagefault occurs is unpredicatable to userspace)
> and so userspace had to apply explicit domain control itself
> (set-domain). However, swapping is transparent to the kernel, and so on
> first fault we need to acquire the pages and make them coherent for
> access through the GTT. Our use of set-domain here leaks into the uABI
> that the first pagefault was synchronous. This is unintentional and
> baring a few igt should be unoticed, nevertheless we bump the uABI
> version for mmap-gtt to reflect the change in behaviour.
>
> Another implication of the change is that gem_create() is presumed to
> create an object that is coherent with the CPU and is in the CPU write
> domain, so a set-domain(CPU) following a gem_create() would be a minor
> operation that merely checked whether we could allocate all pages for
> the object. On applying this change, a set-domain(CPU) causes a clflush
> as we acquire the pages. This will have a small impact on mesa as we move
> the clflush here on !llc from execbuf time to create, but that should
> have minimal performance impact as the same clflush exists but is now
> done early and because of the clflush issue, userspace recycles bo and
> so should resist allocating fresh objects.
>
> Internally, the presumption that objects are created in the CPU
> write-domain and remain so through writes to obj->mm.mapping is more
> prevalent than I expect; but easy enough to catch and apply a manual
> flush.
>
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Matthew Auld <matthew.william.auld@gmail.com>
> Cc: Daniele Ceraolo Spurio <daniele.ceraolospurio@intel.com>
> Cc: Antonio Argenziano <antonio.argenziano@intel.com>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h               |  8 +++
>  drivers/gpu/drm/i915/i915_gem.c               | 57 ++++++++++++-----
>  drivers/gpu/drm/i915/i915_gem_dmabuf.c        |  1 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  7 +--
>  drivers/gpu/drm/i915/i915_gem_render_state.c  |  2 +-
>  drivers/gpu/drm/i915/i915_perf.c              |  4 +-
>  drivers/gpu/drm/i915/intel_engine_cs.c        |  4 +-
>  drivers/gpu/drm/i915/intel_lrc.c              | 63 +++++++++----------
>  drivers/gpu/drm/i915/intel_ringbuffer.c       | 62 +++++++-----------
>  drivers/gpu/drm/i915/selftests/huge_pages.c   |  5 +-
>  .../gpu/drm/i915/selftests/i915_gem_context.c | 17 ++---
>  .../gpu/drm/i915/selftests/i915_gem_dmabuf.c  |  1 +
>  .../gpu/drm/i915/selftests/i915_gem_evict.c   |  4 +-
>  drivers/gpu/drm/i915/selftests/i915_request.c | 14 ++---
>  drivers/gpu/drm/i915/selftests/igt_spinner.c  |  2 +-
>  .../gpu/drm/i915/selftests/intel_hangcheck.c  |  2 +-
>  drivers/gpu/drm/i915/selftests/intel_lrc.c    |  5 +-
>  .../drm/i915/selftests/intel_workarounds.c    |  3 +
>  18 files changed, 127 insertions(+), 134 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c65c2e6649df..395aa9d5ba02 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2959,6 +2959,14 @@ i915_coherent_map_type(struct drm_i915_private *i915)
>  void *__must_check i915_gem_object_pin_map(struct drm_i915_gem_object *obj,
>                                            enum i915_map_type type);
>
> +void __i915_gem_object_flush_map(struct drm_i915_gem_object *obj,
> +                                unsigned long offset,
> +                                unsigned long size);
> +static inline void i915_gem_object_flush_map(struct drm_i915_gem_object *obj)
> +{
> +       __i915_gem_object_flush_map(obj, 0, obj->base.size);
> +}
> +
>  /**
>   * i915_gem_object_unpin_map - releases an earlier mapping
>   * @obj: the object to unmap
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index b7086c8d4726..41d96414ef18 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1713,6 +1713,9 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
>   * 2 - Recognise WC as a separate cache domain so that we can flush the
>   *     delayed writes via GTT before performing direct access via WC.
>   *
> + * 3 - Remove implicit set-domain(GTT) and synchronisation on initial
> + *     pagefault; swapin remains transparent.
> + *
>   * Restrictions:
>   *
>   *  * snoopable objects cannot be accessed via the GTT. It can cause machine
> @@ -1740,7 +1743,7 @@ static unsigned int tile_row_pages(const struct drm_i915_gem_object *obj)
>   */
>  int i915_gem_mmap_gtt_version(void)
>  {
> -       return 2;
> +       return 3;
>  }
>
>  static inline struct i915_ggtt_view
> @@ -1808,17 +1811,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>
>         trace_i915_gem_object_fault(obj, page_offset, true, write);
>
> -       /* Try to flush the object off the GPU first without holding the lock.
> -        * Upon acquiring the lock, we will perform our sanity checks and then
> -        * repeat the flush holding the lock in the normal manner to catch cases
> -        * where we are gazumped.
> -        */
> -       ret = i915_gem_object_wait(obj,
> -                                  I915_WAIT_INTERRUPTIBLE,
> -                                  MAX_SCHEDULE_TIMEOUT);
> -       if (ret)
> -               goto err;
> -
>         ret = i915_gem_object_pin_pages(obj);
>         if (ret)
>                 goto err;
> @@ -1874,10 +1866,6 @@ vm_fault_t i915_gem_fault(struct vm_fault *vmf)
>                 goto err_unlock;
>         }
>
> -       ret = i915_gem_object_set_to_gtt_domain(obj, write);
> -       if (ret)
> -               goto err_unpin;
> -
>         ret = i915_vma_pin_fence(vma);
>         if (ret)
>                 goto err_unpin;
> @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
>
>         lockdep_assert_held(&obj->mm.lock);
>
> +       /* Make the pages coherent with the GPU (flushing any swapin). */
> +       if (obj->cache_dirty) {
> +               obj->write_domain = 0;
> +               if (i915_gem_object_has_struct_page(obj))
> +                       drm_clflush_sg(pages);
> +               obj->cache_dirty = false;
> +       }

Is it worth adding some special casing here for volatile objects, so
that we avoid doing the clflush_sg every time we do set_pages for
!llc?

if (obj->cache_dirty && obj->mm.madvise == WILLNEED)

Or is that meh?
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-20 11:41   ` Matthew Auld
@ 2019-03-20 11:48     ` Chris Wilson
  2019-03-20 12:26       ` Matthew Auld
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-20 11:48 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2019-03-20 11:41:52)
> On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> >
> >         lockdep_assert_held(&obj->mm.lock);
> >
> > +       /* Make the pages coherent with the GPU (flushing any swapin). */
> > +       if (obj->cache_dirty) {
> > +               obj->write_domain = 0;
> > +               if (i915_gem_object_has_struct_page(obj))
> > +                       drm_clflush_sg(pages);
> > +               obj->cache_dirty = false;
> > +       }
> 
> Is it worth adding some special casing here for volatile objects, so
> that we avoid doing the clflush_sg every time we do set_pages for
> !llc?
> 
> if (obj->cache_dirty && obj->mm.madvise == WILLNEED)
> 
> Or is that meh?

No, even for volatile objects we have to be careful with what remains in
the CPU cache as that may obscure updates to the underlying page. We see
the very same problem with speculative cacheline loading.

A DONTNEED object should fail before it gets allocated pages :)

If it becomes DONTNEED in flight? Haven't considered that case, but I
think it is best we keep the pages around as we have people waiting for
them, so we should consider them in use and so only reap them after this
period of activity.

One agenda I have for local-memory is the per-fd private page pool,
where we can stuff cache flushed pages for reuse in !llc. However, all
the testing many, many years ago said that if userspace is doing the
write thing, such a cache is fruitless.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-20 11:48     ` Chris Wilson
@ 2019-03-20 12:26       ` Matthew Auld
  2019-03-20 12:35         ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Matthew Auld @ 2019-03-20 12:26 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Wed, 20 Mar 2019 at 11:48, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Matthew Auld (2019-03-20 11:41:52)
> > On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> > >
> > >         lockdep_assert_held(&obj->mm.lock);
> > >
> > > +       /* Make the pages coherent with the GPU (flushing any swapin). */
> > > +       if (obj->cache_dirty) {
> > > +               obj->write_domain = 0;
> > > +               if (i915_gem_object_has_struct_page(obj))
> > > +                       drm_clflush_sg(pages);
> > > +               obj->cache_dirty = false;
> > > +       }
> >
> > Is it worth adding some special casing here for volatile objects, so
> > that we avoid doing the clflush_sg every time we do set_pages for
> > !llc?
> >
> > if (obj->cache_dirty && obj->mm.madvise == WILLNEED)
> >
> > Or is that meh?
>
> No, even for volatile objects we have to be careful with what remains in
> the CPU cache as that may obscure updates to the underlying page. We see
> the very same problem with speculative cacheline loading.
>
> A DONTNEED object should fail before it gets allocated pages :)

I was talking about kernel internal objects, which are marked as
DONTNEED just before we call set_pages(), and for that case it's
surely up to the caller to flush things before they even think of
doing the unpin(since it's volatile).

>
> If it becomes DONTNEED in flight? Haven't considered that case, but I
> think it is best we keep the pages around as we have people waiting for
> them, so we should consider them in use and so only reap them after this
> period of activity.
>
> One agenda I have for local-memory is the per-fd private page pool,
> where we can stuff cache flushed pages for reuse in !llc. However, all
> the testing many, many years ago said that if userspace is doing the
> write thing, such a cache is fruitless.
> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-20 12:26       ` Matthew Auld
@ 2019-03-20 12:35         ` Chris Wilson
  2019-03-20 14:24           ` Matthew Auld
  2019-03-21  0:16           ` Chris Wilson
  0 siblings, 2 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-20 12:35 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Matthew Auld (2019-03-20 12:26:00)
> On Wed, 20 Mar 2019 at 11:48, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> >
> > Quoting Matthew Auld (2019-03-20 11:41:52)
> > > On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> > > >
> > > >         lockdep_assert_held(&obj->mm.lock);
> > > >
> > > > +       /* Make the pages coherent with the GPU (flushing any swapin). */
> > > > +       if (obj->cache_dirty) {
> > > > +               obj->write_domain = 0;
> > > > +               if (i915_gem_object_has_struct_page(obj))
> > > > +                       drm_clflush_sg(pages);
> > > > +               obj->cache_dirty = false;
> > > > +       }
> > >
> > > Is it worth adding some special casing here for volatile objects, so
> > > that we avoid doing the clflush_sg every time we do set_pages for
> > > !llc?
> > >
> > > if (obj->cache_dirty && obj->mm.madvise == WILLNEED)
> > >
> > > Or is that meh?
> >
> > No, even for volatile objects we have to be careful with what remains in
> > the CPU cache as that may obscure updates to the underlying page. We see
> > the very same problem with speculative cacheline loading.
> >
> > A DONTNEED object should fail before it gets allocated pages :)
> 
> I was talking about kernel internal objects, which are marked as
> DONTNEED just before we call set_pages(), and for that case it's
> surely up to the caller to flush things before they even think of
> doing the unpin(since it's volatile).

But those objects also become WILLNEED at that point, and may still need
to be flushed.

The cost of the extra flushes is a worry, but not enough for me to be
concerned about. I think the convention that get_pages == coherent on
gpu improves quite a bit of our internal rummaging around and prevents
the ABI nightmare of mmap_gtt/mmap_offset. Will this flush remain inside
set_pages()? No, I don't think it will as pushing it into the callers
outside of the mm.lock itself makes sense, but I didn't think that was
of paramount importance compared to the uABI and can be done later.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 07/18] drm/i915: Stop storing the context name as the timeline name
  2019-03-19 11:57 ` [PATCH 07/18] drm/i915: Stop storing the context name as the timeline name Chris Wilson
@ 2019-03-20 12:46   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 12:46 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> The timeline->name is only used for convenience in pretty printing the
> i915_request.fence->ops->get_timeline_name() and it is just as
> convenient to pull it from the gem_context directly. The few instances
> of its use inside GEM_TRACE() has proven more of a nuisance than
> helpful, so not worth saving imo.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c        | 5 ++---
>   drivers/gpu/drm/i915/i915_request.c            | 7 ++-----
>   drivers/gpu/drm/i915/i915_timeline.c           | 5 +----
>   drivers/gpu/drm/i915/i915_timeline.h           | 2 --
>   drivers/gpu/drm/i915/i915_timeline_types.h     | 1 -
>   drivers/gpu/drm/i915/intel_engine_cs.c         | 3 +--
>   drivers/gpu/drm/i915/intel_lrc.c               | 2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c        | 4 +---
>   drivers/gpu/drm/i915/selftests/i915_timeline.c | 6 +++---
>   drivers/gpu/drm/i915/selftests/mock_engine.c   | 9 ++-------
>   10 files changed, 13 insertions(+), 31 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 95c5103e15a5..196982f38a28 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -673,9 +673,8 @@ last_request_on_engine(struct i915_timeline *timeline,
>   	rq = i915_active_request_raw(&timeline->last_request,
>   				     &engine->i915->drm.struct_mutex);
>   	if (rq && rq->engine == engine) {
> -		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
> -			  timeline->name, engine->name,
> -			  rq->fence.context, rq->fence.seqno);
> +		GEM_TRACE("last request on engine %s: %llx:%llu\n",
> +			  engine->name, rq->fence.context, rq->fence.seqno);
>   		GEM_BUG_ON(rq->timeline != timeline);
>   		return rq;
>   	}
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 0a3d94517d0a..1529824d7c61 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -66,7 +66,7 @@ static const char *i915_fence_get_timeline_name(struct dma_fence *fence)
>   	if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
>   		return "signaled";
>   
> -	return to_request(fence)->timeline->name;
> +	return to_request(fence)->gem_context->name ?: "[i915]";
>   }
>   
>   static bool i915_fence_signaled(struct dma_fence *fence)
> @@ -167,7 +167,6 @@ static void advance_ring(struct i915_request *request)
>   		 * is just about to be. Either works, if we miss the last two
>   		 * noops - they are safe to be replayed on a reset.
>   		 */
> -		GEM_TRACE("marking %s as inactive\n", ring->timeline->name);
>   		tail = READ_ONCE(request->tail);
>   		list_del(&ring->active_link);
>   	} else {
> @@ -1064,10 +1063,8 @@ void i915_request_add(struct i915_request *request)
>   	__i915_active_request_set(&timeline->last_request, request);
>   
>   	list_add_tail(&request->ring_link, &ring->request_list);
> -	if (list_is_first(&request->ring_link, &ring->request_list)) {
> -		GEM_TRACE("marking %s as active\n", ring->timeline->name);
> +	if (list_is_first(&request->ring_link, &ring->request_list))
>   		list_add(&ring->active_link, &request->i915->gt.active_rings);
> -	}
>   	request->i915->gt.active_engines |= request->engine->mask;
>   	request->emitted_jiffies = jiffies;
>   
> diff --git a/drivers/gpu/drm/i915/i915_timeline.c b/drivers/gpu/drm/i915/i915_timeline.c
> index 8484ba6e51d1..2f4907364920 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/i915_timeline.c
> @@ -197,7 +197,6 @@ static void cacheline_free(struct i915_timeline_cacheline *cl)
>   
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *timeline,
> -		       const char *name,
>   		       struct i915_vma *hwsp)
>   {
>   	void *vaddr;
> @@ -213,7 +212,6 @@ int i915_timeline_init(struct drm_i915_private *i915,
>   	BUILD_BUG_ON(KSYNCMAP < I915_NUM_ENGINES);
>   
>   	timeline->i915 = i915;
> -	timeline->name = name;
>   	timeline->pin_count = 0;
>   	timeline->has_initial_breadcrumb = !hwsp;
>   	timeline->hwsp_cacheline = NULL;
> @@ -342,7 +340,6 @@ void i915_timeline_fini(struct i915_timeline *timeline)
>   
>   struct i915_timeline *
>   i915_timeline_create(struct drm_i915_private *i915,
> -		     const char *name,
>   		     struct i915_vma *global_hwsp)
>   {
>   	struct i915_timeline *timeline;
> @@ -352,7 +349,7 @@ i915_timeline_create(struct drm_i915_private *i915,
>   	if (!timeline)
>   		return ERR_PTR(-ENOMEM);
>   
> -	err = i915_timeline_init(i915, timeline, name, global_hwsp);
> +	err = i915_timeline_init(i915, timeline, global_hwsp);
>   	if (err) {
>   		kfree(timeline);
>   		return ERR_PTR(err);
> diff --git a/drivers/gpu/drm/i915/i915_timeline.h b/drivers/gpu/drm/i915/i915_timeline.h
> index 454aa72aee18..4ca7f80bdf6d 100644
> --- a/drivers/gpu/drm/i915/i915_timeline.h
> +++ b/drivers/gpu/drm/i915/i915_timeline.h
> @@ -33,7 +33,6 @@
>   
>   int i915_timeline_init(struct drm_i915_private *i915,
>   		       struct i915_timeline *tl,
> -		       const char *name,
>   		       struct i915_vma *hwsp);
>   void i915_timeline_fini(struct i915_timeline *tl);
>   
> @@ -58,7 +57,6 @@ i915_timeline_set_subclass(struct i915_timeline *timeline,
>   
>   struct i915_timeline *
>   i915_timeline_create(struct drm_i915_private *i915,
> -		     const char *name,
>   		     struct i915_vma *global_hwsp);
>   
>   static inline struct i915_timeline *
> diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
> index d42053544d7c..1f5b55d9ffb5 100644
> --- a/drivers/gpu/drm/i915/i915_timeline_types.h
> +++ b/drivers/gpu/drm/i915/i915_timeline_types.h
> @@ -72,7 +72,6 @@ struct i915_timeline {
>   	struct i915_active_request barrier;
>   
>   	struct list_head link;
> -	const char *name;
>   	struct drm_i915_private *i915;
>   
>   	struct kref kref;
> diff --git a/drivers/gpu/drm/i915/intel_engine_cs.c b/drivers/gpu/drm/i915/intel_engine_cs.c
> index 314b86b6f88d..d2a051c53c4a 100644
> --- a/drivers/gpu/drm/i915/intel_engine_cs.c
> +++ b/drivers/gpu/drm/i915/intel_engine_cs.c
> @@ -579,7 +579,6 @@ int intel_engine_setup_common(struct intel_engine_cs *engine)
>   
>   	err = i915_timeline_init(engine->i915,
>   				 &engine->timeline,
> -				 engine->name,
>   				 engine->status_page.vma);
>   	if (err)
>   		goto err_hwsp;
> @@ -658,7 +657,7 @@ static int measure_breadcrumb_dw(struct intel_engine_cs *engine)
>   		return -ENOMEM;
>   
>   	if (i915_timeline_init(engine->i915,
> -			       &frame->timeline, "measure",
> +			       &frame->timeline,
>   			       engine->status_page.vma))
>   		goto out_frame;
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7e0c20a2d733..b3009086b50e 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2802,7 +2802,7 @@ populate_lr_context(struct intel_context *ce,
>   
>   static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
>   {
> -	return i915_timeline_create(ctx->i915, ctx->name, NULL);
> +	return i915_timeline_create(ctx->i915, NULL);
>   }
>   
>   static int execlists_context_deferred_alloc(struct intel_context *ce,
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 0310d5d53bf9..4405ac1b32f3 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1533,9 +1533,7 @@ static int intel_init_ring_buffer(struct intel_engine_cs *engine)
>   	if (err)
>   		return err;
>   
> -	timeline = i915_timeline_create(engine->i915,
> -					engine->name,
> -					engine->status_page.vma);
> +	timeline = i915_timeline_create(engine->i915, engine->status_page.vma);
>   	if (IS_ERR(timeline)) {
>   		err = PTR_ERR(timeline);
>   		goto err;
> diff --git a/drivers/gpu/drm/i915/selftests/i915_timeline.c b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> index 844701759ffc..8e7bcaa1eb66 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_timeline.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_timeline.c
> @@ -64,7 +64,7 @@ static int __mock_hwsp_timeline(struct mock_hwsp_freelist *state,
>   		unsigned long cacheline;
>   		int err;
>   
> -		tl = i915_timeline_create(state->i915, "mock", NULL);
> +		tl = i915_timeline_create(state->i915, NULL);
>   		if (IS_ERR(tl))
>   			return PTR_ERR(tl);
>   
> @@ -476,7 +476,7 @@ checked_i915_timeline_create(struct drm_i915_private *i915)
>   {
>   	struct i915_timeline *tl;
>   
> -	tl = i915_timeline_create(i915, "live", NULL);
> +	tl = i915_timeline_create(i915, NULL);
>   	if (IS_ERR(tl))
>   		return tl;
>   
> @@ -658,7 +658,7 @@ static int live_hwsp_wrap(void *arg)
>   	mutex_lock(&i915->drm.struct_mutex);
>   	wakeref = intel_runtime_pm_get(i915);
>   
> -	tl = i915_timeline_create(i915, __func__, NULL);
> +	tl = i915_timeline_create(i915, NULL);
>   	if (IS_ERR(tl)) {
>   		err = PTR_ERR(tl);
>   		goto out_rpm;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_engine.c b/drivers/gpu/drm/i915/selftests/mock_engine.c
> index 61744819172b..61a8206ed677 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_engine.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_engine.c
> @@ -50,9 +50,7 @@ static struct intel_ring *mock_ring(struct intel_engine_cs *engine)
>   	if (!ring)
>   		return NULL;
>   
> -	if (i915_timeline_init(engine->i915,
> -			       &ring->timeline, engine->name,
> -			       NULL)) {
> +	if (i915_timeline_init(engine->i915, &ring->timeline, NULL)) {
>   		kfree(ring);
>   		return NULL;
>   	}
> @@ -259,10 +257,7 @@ struct intel_engine_cs *mock_engine(struct drm_i915_private *i915,
>   	engine->base.reset.finish = mock_reset_finish;
>   	engine->base.cancel_requests = mock_cancel_requests;
>   
> -	if (i915_timeline_init(i915,
> -			       &engine->base.timeline,
> -			       engine->base.name,
> -			       NULL))
> +	if (i915_timeline_init(i915, &engine->base.timeline, NULL))
>   		goto err_free;
>   	i915_timeline_set_subclass(&engine->base.timeline, TIMELINE_ENGINE);
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 09/18] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-19 11:57 ` [PATCH 09/18] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-03-20 13:00   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 13:00 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> In preparation to making the ppGTT binding for a context explicit (to
> facilitate reusing the same ppGTT between different contexts), allow the
> user to create and destroy named ppGTT.
> 
> v2: Replace global barrier for swapping over the ppgtt and tlbs with a
> local context barrier (Tvrtko)
> v3: serialise with struct_mutex; it's lazy but required dammit
> v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
> similarly, turned out different!)
> 
> v5: Fix up test unwind for aliasing-ppgtt (snb)
> v6: Tighten language for uapi struct drm_i915_gem_vm_control.
> v7: Patch the context image for runtime ppgtt switching!
> 
> Testcase: igt/gem_vm_create
> Testcase: igt/gem_ctx_param/vm
> Testcase: igt/gem_ctx_clone/vm
> Testcase: igt/gem_ctx_shared
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c               |   2 +
>   drivers/gpu/drm/i915/i915_drv.h               |   3 +
>   drivers/gpu/drm/i915/i915_gem_context.c       | 331 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |  19 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.h           |  10 +-
>   drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 238 ++++++++++---
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
>   drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
>   include/uapi/drm/i915_drm.h                   |  43 +++
>   11 files changed, 580 insertions(+), 81 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index a3b00ecc58c9..fa991144e0f2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -3121,6 +3121,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
>   };
>   
>   static struct drm_driver driver = {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 219348121897..87ef2e031b2e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -220,6 +220,9 @@ struct drm_i915_file_private {
>   	struct idr context_idr;
>   	struct mutex context_idr_lock; /* guards context_idr */
>   
> +	struct idr vm_idr;
> +	struct mutex vm_idr_lock; /* guards vm_idr */
> +
>   	unsigned int bsd_engine;
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 196982f38a28..966fbbc154d3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -90,6 +90,7 @@
>   #include "i915_drv.h"
>   #include "i915_globals.h"
>   #include "i915_trace.h"
> +#include "i915_user_extensions.h"
>   #include "intel_lrc_reg.h"
>   #include "intel_workarounds.h"
>   
> @@ -120,12 +121,15 @@ static void lut_close(struct i915_gem_context *ctx)
>   		list_del(&lut->obj_link);
>   		i915_lut_handle_free(lut);
>   	}
> +	INIT_LIST_HEAD(&ctx->handles_list);
>   
>   	rcu_read_lock();
>   	radix_tree_for_each_slot(slot, &ctx->handles_vma, &iter, 0) {
>   		struct i915_vma *vma = rcu_dereference_raw(*slot);
>   
>   		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
> +
> +		vma->open_count--;
>   		__i915_gem_object_release_unless_active(vma->obj);
>   	}
>   	rcu_read_unlock();
> @@ -305,8 +309,6 @@ static void context_close(struct i915_gem_context *ctx)
>   	 * the ppgtt).
>   	 */
>   	lut_close(ctx);
> -	if (ctx->ppgtt)
> -		i915_ppgtt_close(&ctx->ppgtt->vm);
>   
>   	ctx->file_priv = ERR_PTR(-EBADF);
>   	i915_gem_context_put(ctx);
> @@ -378,6 +380,28 @@ __create_context(struct drm_i915_private *dev_priv)
>   	return ctx;
>   }
>   
> +static struct i915_hw_ppgtt *
> +__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct i915_hw_ppgtt *old = ctx->ppgtt;
> +
> +	ctx->ppgtt = i915_ppgtt_get(ppgtt);
> +	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
> +
> +	return old;
> +}
> +
> +static void __assign_ppgtt(struct i915_gem_context *ctx,
> +			   struct i915_hw_ppgtt *ppgtt)
> +{
> +	if (ppgtt == ctx->ppgtt)
> +		return;
> +
> +	ppgtt = __set_ppgtt(ctx, ppgtt);
> +	if (ppgtt)
> +		i915_ppgtt_put(ppgtt);
> +}
> +
>   static struct i915_gem_context *
>   i915_gem_create_context(struct drm_i915_private *dev_priv)
>   {
> @@ -403,8 +427,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv)
>   			return ERR_CAST(ppgtt);
>   		}
>   
> -		ctx->ppgtt = ppgtt;
> -		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
> +		__assign_ppgtt(ctx, ppgtt);
> +		i915_ppgtt_put(ppgtt);
>   	}
>   
>   	trace_i915_context_create(ctx);
> @@ -583,6 +607,12 @@ static int context_idr_cleanup(int id, void *p, void *data)
>   	return 0;
>   }
>   
> +static int vm_idr_cleanup(int id, void *p, void *data)
> +{
> +	i915_ppgtt_put(p);
> +	return 0;
> +}
> +
>   static int gem_context_register(struct i915_gem_context *ctx,
>   				struct drm_i915_file_private *fpriv)
>   {
> @@ -621,8 +651,11 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	struct i915_gem_context *ctx;
>   	int err;
>   
> -	idr_init(&file_priv->context_idr);
>   	mutex_init(&file_priv->context_idr_lock);
> +	mutex_init(&file_priv->vm_idr_lock);
> +
> +	idr_init(&file_priv->context_idr);
> +	idr_init_base(&file_priv->vm_idr, 1);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   	ctx = i915_gem_create_context(i915);
> @@ -646,8 +679,10 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	context_close(ctx);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   err:
> -	mutex_destroy(&file_priv->context_idr_lock);
> +	idr_destroy(&file_priv->vm_idr);
>   	idr_destroy(&file_priv->context_idr);
> +	mutex_destroy(&file_priv->vm_idr_lock);
> +	mutex_destroy(&file_priv->context_idr_lock);
>   	return PTR_ERR(ctx);
>   }
>   
> @@ -660,6 +695,99 @@ void i915_gem_context_close(struct drm_file *file)
>   	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
>   	idr_destroy(&file_priv->context_idr);
>   	mutex_destroy(&file_priv->context_idr_lock);
> +
> +	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
> +	idr_destroy(&file_priv->vm_idr);
> +	mutex_destroy(&file_priv->vm_idr_lock);
> +}
> +
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +
> +	if (!HAS_FULL_PPGTT(i915))
> +		return -ENODEV;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	ppgtt = i915_ppgtt_create(i915);
> +	if (IS_ERR(ppgtt))
> +		return PTR_ERR(ppgtt);
> +
> +	ppgtt->vm.file = file_priv;
> +
> +	if (args->extensions) {
> +		err = i915_user_extensions(u64_to_user_ptr(args->extensions),
> +					   NULL, 0,
> +					   ppgtt);
> +		if (err)
> +			goto err_put;
> +	}
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_idr_lock);
> +	if (err)
> +		goto err_put;
> +
> +	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +	if (err < 0)
> +		goto err_unlock;
> +
> +	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
> +	ppgtt->user_handle = err;
> +
> +	mutex_unlock(&file_priv->vm_idr_lock);
> +
> +	args->vm_id = err;
> +	return 0;
> +
> +err_unlock:
> +	mutex_unlock(&file_priv->vm_idr_lock);
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +	u32 id;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	id = args->vm_id;
> +	if (!id)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_idr_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_remove(&file_priv->vm_idr, id);
> +	if (ppgtt) {
> +		GEM_BUG_ON(ppgtt->user_handle != id);
> +		ppgtt->user_handle = 0;
> +	}
> +
> +	mutex_unlock(&file_priv->vm_idr_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	i915_ppgtt_put(ppgtt);
> +	return 0;
>   }
>   
>   static struct i915_request *
> @@ -702,12 +830,13 @@ static void cb_retire(struct i915_active *base)
>   I915_SELFTEST_DECLARE(static intel_engine_mask_t context_barrier_inject_fault);
>   static int context_barrier_task(struct i915_gem_context *ctx,
>   				intel_engine_mask_t engines,
> +				int (*emit)(struct i915_request *rq, void *data),
>   				void (*task)(void *data),
>   				void *data)
>   {
>   	struct drm_i915_private *i915 = ctx->i915;
>   	struct context_barrier_task *cb;
> -	struct intel_context *ce;
> +	struct intel_context *ce, *next;
>   	intel_wakeref_t wakeref;
>   	int err = 0;
>   
> @@ -722,11 +851,11 @@ static int context_barrier_task(struct i915_gem_context *ctx,
>   	i915_active_acquire(&cb->base);
>   
>   	wakeref = intel_runtime_pm_get(i915);
> -	list_for_each_entry(ce, &ctx->active_engines, active_link) {
> +	rbtree_postorder_for_each_entry_safe(ce, next, &ctx->hw_contexts, node) {
>   		struct intel_engine_cs *engine = ce->engine;
>   		struct i915_request *rq;
>   
> -		if (!(ce->engine->mask & engines))
> +		if (!(engine->mask & engines))
>   			continue;
>   
>   		if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
> @@ -741,7 +870,12 @@ static int context_barrier_task(struct i915_gem_context *ctx,
>   			break;
>   		}
>   
> -		err = i915_active_ref(&cb->base, rq->fence.context, rq);
> +		err = 0;
> +		if (emit)
> +			err = emit(rq, data);
> +		if (err == 0)
> +			err = i915_active_ref(&cb->base, rq->fence.context, rq);
> +
>   		i915_request_add(rq);
>   		if (err)
>   			break;
> @@ -804,6 +938,170 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
>   	return 0;
>   }
>   
> +static int get_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int ret;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	/* XXX rcu acquire? */
> +	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt = i915_ppgtt_get(ctx->ppgtt);
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	ret = mutex_lock_interruptible(&file_priv->vm_idr_lock);
> +	if (ret)
> +		goto err_put;
> +
> +	if (!ppgtt->user_handle) {
> +		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +		GEM_BUG_ON(!ret);
> +		if (ret < 0)
> +			goto err_unlock;
> +
> +		ppgtt->user_handle = ret;
> +		i915_ppgtt_get(ppgtt);
> +	}
> +
> +	args->size = 0;
> +	args->value = ppgtt->user_handle;
> +
> +	ret = 0;
> +err_unlock:
> +	mutex_unlock(&file_priv->vm_idr_lock);
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return ret;
> +}
> +
> +static void set_ppgtt_barrier(void *data)
> +{
> +	struct i915_hw_ppgtt *old = data;
> +
> +	if (INTEL_GEN(old->vm.i915) < 8)
> +		gen6_ppgtt_unpin_all(old);
> +
> +	i915_ppgtt_put(old);
> +}
> +
> +static int emit_ppgtt_update(struct i915_request *rq, void *data)
> +{
> +	struct i915_hw_ppgtt *ppgtt = rq->gem_context->ppgtt;
> +	struct intel_engine_cs *engine = rq->engine;
> +	u32 *cs;
> +	int i;
> +
> +	if (i915_vm_is_4lvl(&ppgtt->vm)) {
> +		const dma_addr_t pd_daddr = px_dma(&ppgtt->pml4);
> +
> +		cs = intel_ring_begin(rq, 6);
> +		if (IS_ERR(cs))
> +			return PTR_ERR(cs);
> +
> +		*cs++ = MI_LOAD_REGISTER_IMM(2);
> +
> +		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
> +		*cs++ = upper_32_bits(pd_daddr);
> +		*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
> +		*cs++ = lower_32_bits(pd_daddr);
> +
> +		*cs++ = MI_NOOP;
> +		intel_ring_advance(rq, cs);
> +	} else if (HAS_LOGICAL_RING_CONTEXTS(engine->i915)) {
> +		cs = intel_ring_begin(rq, 4 * GEN8_3LVL_PDPES + 2);
> +		if (IS_ERR(cs))
> +			return PTR_ERR(cs);
> +
> +		*cs++ = MI_LOAD_REGISTER_IMM(2 * GEN8_3LVL_PDPES);
> +		for (i = GEN8_3LVL_PDPES; i--; ) {
> +			const dma_addr_t pd_daddr = i915_page_dir_dma_addr(ppgtt, i);
> +
> +			*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, i));
> +			*cs++ = upper_32_bits(pd_daddr);
> +			*cs++ = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, i));
> +			*cs++ = lower_32_bits(pd_daddr);
> +		}
> +		*cs++ = MI_NOOP;
> +		intel_ring_advance(rq, cs);
> +	} else {
> +		/* ppGTT is not part of the legacy context image */
> +		gen6_ppgtt_pin(ppgtt);
> +	}
> +
> +	return 0;
> +}
> +
> +static int set_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt, *old;
> +	int err;
> +
> +	if (args->size)
> +		return -EINVAL;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	if (upper_32_bits(args->value))
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_idr_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_find(&file_priv->vm_idr, args->value);
> +	if (ppgtt) {
> +		GEM_BUG_ON(ppgtt->user_handle != args->value);
> +		i915_ppgtt_get(ppgtt);
> +	}
> +	mutex_unlock(&file_priv->vm_idr_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (err)
> +		goto out;
> +
> +	if (ppgtt == ctx->ppgtt)
> +		goto unlock;
> +
> +	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
> +	lut_close(ctx);
> +
> +	old = __set_ppgtt(ctx, ppgtt);
> +
> +	/*
> +	 * We need to flush any requests using the current ppgtt before
> +	 * we release it as the requests do not hold a reference themselves,
> +	 * only indirectly through the context.
> +	 */
> +	err = context_barrier_task(ctx, ALL_ENGINES,
> +				   emit_ppgtt_update,
> +				   set_ppgtt_barrier,
> +				   old);
> +	if (err) {
> +		ctx->ppgtt = old;
> +		ctx->desc_template = default_desc_template(ctx->i915, old);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +
> +unlock:
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +out:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
>   static bool client_is_banned(struct drm_i915_file_private *file_priv)
>   {
>   	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> @@ -984,6 +1282,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = get_sseu(ctx, args);
>   		break;
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = get_ppgtt(ctx, args);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -1285,9 +1586,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		return -ENOENT;
>   
>   	switch (args->param) {
> -	case I915_CONTEXT_PARAM_BAN_PERIOD:
> -		ret = -EINVAL;
> -		break;
>   	case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1343,9 +1641,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   					I915_USER_PRIORITY(priority);
>   		}
>   		break;
> +
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = set_sseu(ctx, args);
>   		break;
> +
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = set_ppgtt(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 849b2a83c1ec..23dcb01bfd82 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -148,6 +148,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
>   struct i915_gem_context *
>   i915_gem_context_create_gvt(struct drm_device *dev);
>   
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file);
> +
>   int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   				  struct drm_file *file);
>   int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index b9e0e3a00223..736c845eb77f 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -1937,6 +1937,8 @@ int gen6_ppgtt_pin(struct i915_hw_ppgtt *base)
>   	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
>   	int err;
>   
> +	GEM_BUG_ON(ppgtt->base.vm.closed);
> +
>   	/*
>   	 * Workaround the limited maximum vma->pin_count and the aliasing_ppgtt
>   	 * which will be pinned into every active context.
> @@ -1975,6 +1977,17 @@ void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base)
>   	i915_vma_unpin(ppgtt->vma);
>   }
>   
> +void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base)
> +{
> +	struct gen6_hw_ppgtt *ppgtt = to_gen6_ppgtt(base);
> +
> +	if (!ppgtt->pin_count)
> +		return;
> +
> +	ppgtt->pin_count = 0;
> +	i915_vma_unpin(ppgtt->vma);
> +}
> +
>   static struct i915_hw_ppgtt *gen6_ppgtt_create(struct drm_i915_private *i915)
>   {
>   	struct i915_ggtt * const ggtt = &i915->ggtt;
> @@ -2082,12 +2095,6 @@ i915_ppgtt_create(struct drm_i915_private *i915)
>   	return ppgtt;
>   }
>   
> -void i915_ppgtt_close(struct i915_address_space *vm)
> -{
> -	GEM_BUG_ON(vm->closed);
> -	vm->closed = true;
> -}
> -
>   static void ppgtt_destroy_vma(struct i915_address_space *vm)
>   {
>   	struct list_head *phases[] = {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index 8fe03067e698..f597f35b109b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -396,6 +396,8 @@ struct i915_hw_ppgtt {
>   		struct i915_page_directory_pointer pdp;	/* GEN8+ */
>   		struct i915_page_directory pd;		/* GEN6-7 */
>   	};
> +
> +	u32 user_handle;
>   };
>   
>   struct gen6_hw_ppgtt {
> @@ -605,13 +607,12 @@ void i915_ggtt_cleanup_hw(struct drm_i915_private *dev_priv);
>   int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
>   
>   struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv);
> -void i915_ppgtt_close(struct i915_address_space *vm);
>   void i915_ppgtt_release(struct kref *kref);
>   
> -static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
> +static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
>   {
> -	if (ppgtt)
> -		kref_get(&ppgtt->ref);
> +	kref_get(&ppgtt->ref);
> +	return ppgtt;
>   }
>   
>   static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
> @@ -622,6 +623,7 @@ static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
>   
>   int gen6_ppgtt_pin(struct i915_hw_ppgtt *base);
>   void gen6_ppgtt_unpin(struct i915_hw_ppgtt *base);
> +void gen6_ppgtt_unpin_all(struct i915_hw_ppgtt *base);
>   
>   void i915_check_and_clear_faults(struct drm_i915_private *dev_priv);
>   void i915_gem_suspend_gtt_mappings(struct drm_i915_private *dev_priv);
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index c5c8ba6c059f..90721b54e7ae 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1732,7 +1732,6 @@ int i915_gem_huge_page_mock_selftests(void)
>   	err = i915_subtests(tests, ppgtt);
>   
>   out_close:
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   
>   out_unlock:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index ed72400f2395..5e7e2a9193fe 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -373,7 +373,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
>   	return 0;
>   }
>   
> -static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
> +static noinline int cpu_check(struct drm_i915_gem_object *obj,
> +			      unsigned int idx, unsigned int max)
>   {
>   	unsigned int n, m, needs_flush;
>   	int err;
> @@ -391,8 +392,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (m = 0; m < max; m++) {
>   			if (map[m] != m) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], m);
> +				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
> +				       __builtin_return_address(0), idx,
> +				       n, real_page_count(obj), m, max,
> +				       map[m], m);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -400,8 +403,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (; m < DW_PER_PAGE; m++) {
>   			if (map[m] != STACK_MAGIC) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], STACK_MAGIC);
> +				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
> +				       __builtin_return_address(0), idx, n, m,
> +				       map[m], STACK_MAGIC);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -479,12 +483,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
>   static int igt_ctx_exec(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> -	struct drm_i915_gem_object *obj = NULL;
> -	unsigned long ncontexts, ndwords, dw;
> -	struct igt_live_test t;
> -	struct drm_file *file;
> -	IGT_TIMEOUT(end_time);
> -	LIST_HEAD(objects);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
>   	int err = -ENODEV;
>   
>   	/*
> @@ -496,44 +496,167 @@ static int igt_ctx_exec(void *arg)
>   	if (!DRIVER_CAPS(i915)->has_logical_contexts)
>   		return 0;
>   
> +	for_each_engine(engine, i915, id) {
> +		struct drm_i915_gem_object *obj = NULL;
> +		unsigned long ncontexts, ndwords, dw;
> +		struct igt_live_test t;
> +		struct drm_file *file;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
> +
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
> +
> +		if (!engine->context_size)
> +			continue; /* No logical context support in HW */
> +
> +		file = mock_file(i915);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
> +
> +		mutex_lock(&i915->drm.struct_mutex);
> +
> +		err = igt_live_test_begin(&t, i915, __func__, engine->name);
> +		if (err)
> +			goto out_unlock;
> +
> +		ncontexts = 0;
> +		ndwords = 0;
> +		dw = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			struct i915_gem_context *ctx;
> +			intel_wakeref_t wakeref;
> +
> +			ctx = live_context(i915, file);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
> +
> +			if (!obj) {
> +				obj = create_test_object(ctx, file, &objects);
> +				if (IS_ERR(obj)) {
> +					err = PTR_ERR(obj);
> +					goto out_unlock;
> +				}
> +			}
> +
> +			with_intel_runtime_pm(i915, wakeref)
> +				err = gpu_fill(obj, ctx, engine, dw);
> +			if (err) {
> +				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
> +				       ndwords, dw, max_dwords(obj),
> +				       engine->name, ctx->hw_id,
> +				       yesno(!!ctx->ppgtt), err);
> +				goto out_unlock;
> +			}
> +
> +			if (++dw == max_dwords(obj)) {
> +				obj = NULL;
> +				dw = 0;
> +			}
> +
> +			ndwords++;
> +			ncontexts++;
> +		}
> +
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
> +
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				break;
> +
> +			dw += rem;
> +		}
> +
> +out_unlock:
> +		if (igt_live_test_end(&t))
> +			err = -EIO;
> +		mutex_unlock(&i915->drm.struct_mutex);
> +
> +		mock_file_free(i915, file);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int igt_shared_ctx_exec(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct i915_gem_context *parent;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	struct igt_live_test t;
> +	struct drm_file *file;
> +	int err = 0;
> +
> +	/*
> +	 * Create a few different contexts with the same mm and write
> +	 * through each ctx using the GPU making sure those writes end
> +	 * up in the expected pages of our obj.
> +	 */
> +	if (!DRIVER_CAPS(i915)->has_logical_contexts)
> +		return 0;
> +
>   	file = mock_file(i915);
>   	if (IS_ERR(file))
>   		return PTR_ERR(file);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   
> +	parent = live_context(i915, file);
> +	if (IS_ERR(parent)) {
> +		err = PTR_ERR(parent);
> +		goto out_unlock;
> +	}
> +
> +	if (!parent->ppgtt) { /* not full-ppgtt; nothing to share */
> +		err = 0;
> +		goto out_unlock;
> +	}
> +
>   	err = igt_live_test_begin(&t, i915, __func__, "");
>   	if (err)
>   		goto out_unlock;
>   
> -	ncontexts = 0;
> -	ndwords = 0;
> -	dw = 0;
> -	while (!time_after(jiffies, end_time)) {
> -		struct intel_engine_cs *engine;
> -		struct i915_gem_context *ctx;
> -		unsigned int id;
> +	for_each_engine(engine, i915, id) {
> +		unsigned long ncontexts, ndwords, dw;
> +		struct drm_i915_gem_object *obj = NULL;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
>   
> -		ctx = live_context(i915, file);
> -		if (IS_ERR(ctx)) {
> -			err = PTR_ERR(ctx);
> -			goto out_unlock;
> -		}
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
>   
> -		for_each_engine(engine, i915, id) {
> +		dw = 0;
> +		ndwords = 0;
> +		ncontexts = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			struct i915_gem_context *ctx;
>   			intel_wakeref_t wakeref;
>   
> -			if (!engine->context_size)
> -				continue; /* No logical context support in HW */
> +			ctx = kernel_context(i915);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_test;
> +			}
>   
> -			if (!intel_engine_can_store_dword(engine))
> -				continue;
> +			__assign_ppgtt(ctx, parent->ppgtt);
>   
>   			if (!obj) {
> -				obj = create_test_object(ctx, file, &objects);
> +				obj = create_test_object(parent, file, &objects);
>   				if (IS_ERR(obj)) {
>   					err = PTR_ERR(obj);
> -					goto out_unlock;
> +					kernel_context_close(ctx);
> +					goto out_test;
>   				}
>   			}
>   
> @@ -545,35 +668,39 @@ static int igt_ctx_exec(void *arg)
>   				       ndwords, dw, max_dwords(obj),
>   				       engine->name, ctx->hw_id,
>   				       yesno(!!ctx->ppgtt), err);
> -				goto out_unlock;
> +				kernel_context_close(ctx);
> +				goto out_test;
>   			}
>   
>   			if (++dw == max_dwords(obj)) {
>   				obj = NULL;
>   				dw = 0;
>   			}
> +
>   			ndwords++;
> +			ncontexts++;
> +
> +			kernel_context_close(ctx);
>   		}
> -		ncontexts++;
> -	}
> -	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
> -		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
>   
> -	dw = 0;
> -	list_for_each_entry(obj, &objects, st_link) {
> -		unsigned int rem =
> -			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
>   
> -		err = cpu_check(obj, rem);
> -		if (err)
> -			break;
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				goto out_test;
>   
> -		dw += rem;
> +			dw += rem;
> +		}
>   	}
> -
> -out_unlock:
> +out_test:
>   	if (igt_live_test_end(&t))
>   		err = -EIO;
> +out_unlock:
>   	mutex_unlock(&i915->drm.struct_mutex);
>   
>   	mock_file_free(i915, file);
> @@ -1046,7 +1173,7 @@ static int igt_ctx_readonly(void *arg)
>   	struct drm_i915_gem_object *obj = NULL;
>   	struct i915_gem_context *ctx;
>   	struct i915_hw_ppgtt *ppgtt;
> -	unsigned long ndwords, dw;
> +	unsigned long idx, ndwords, dw;
>   	struct igt_live_test t;
>   	struct drm_file *file;
>   	I915_RND_STATE(prng);
> @@ -1127,6 +1254,7 @@ static int igt_ctx_readonly(void *arg)
>   		ndwords, RUNTIME_INFO(i915)->num_engines);
>   
>   	dw = 0;
> +	idx = 0;
>   	list_for_each_entry(obj, &objects, st_link) {
>   		unsigned int rem =
>   			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> @@ -1136,7 +1264,7 @@ static int igt_ctx_readonly(void *arg)
>   		if (i915_gem_object_is_readonly(obj))
>   			num_writes = 0;
>   
> -		err = cpu_check(obj, num_writes);
> +		err = cpu_check(obj, idx++, num_writes);
>   		if (err)
>   			break;
>   
> @@ -1619,7 +1747,8 @@ static int mock_context_barrier(void *arg)
>   	}
>   
>   	counter = 0;
> -	err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
> +	err = context_barrier_task(ctx, 0,
> +				   NULL, mock_barrier_task, &counter);
>   	if (err) {
>   		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
>   		goto out;
> @@ -1631,8 +1760,8 @@ static int mock_context_barrier(void *arg)
>   	}
>   
>   	counter = 0;
> -	err = context_barrier_task(ctx,
> -				   ALL_ENGINES, mock_barrier_task, &counter);
> +	err = context_barrier_task(ctx, ALL_ENGINES,
> +				   NULL, mock_barrier_task, &counter);
>   	if (err) {
>   		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
>   		goto out;
> @@ -1655,8 +1784,8 @@ static int mock_context_barrier(void *arg)
>   
>   	counter = 0;
>   	context_barrier_inject_fault = BIT(RCS0);
> -	err = context_barrier_task(ctx,
> -				   ALL_ENGINES, mock_barrier_task, &counter);
> +	err = context_barrier_task(ctx, ALL_ENGINES,
> +				   NULL, mock_barrier_task, &counter);
>   	context_barrier_inject_fault = 0;
>   	if (err == -ENXIO)
>   		err = 0;
> @@ -1670,8 +1799,8 @@ static int mock_context_barrier(void *arg)
>   		goto out;
>   
>   	counter = 0;
> -	err = context_barrier_task(ctx,
> -				   ALL_ENGINES, mock_barrier_task, &counter);
> +	err = context_barrier_task(ctx, ALL_ENGINES,
> +				   NULL, mock_barrier_task, &counter);
>   	if (err) {
>   		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
>   		goto out;
> @@ -1719,6 +1848,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
>   		SUBTEST(igt_ctx_exec),
>   		SUBTEST(igt_ctx_readonly),
>   		SUBTEST(igt_ctx_sseu),
> +		SUBTEST(igt_shared_ctx_exec),
>   		SUBTEST(igt_vm_isolation),
>   	};
>   
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 01084f6b4fb7..9cca66e4420a 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>   
>   	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
>   
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   out_unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 1cc8be732435..cfc9012c8e49 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
>   		goto err_handles;
>   
>   	if (name) {
> +		struct i915_hw_ppgtt *ppgtt;
> +
>   		ctx->name = kstrdup(name, GFP_KERNEL);
>   		if (!ctx->name)
>   			goto err_put;
>   
> -		ctx->ppgtt = mock_ppgtt(i915, name);
> -		if (!ctx->ppgtt)
> +		ppgtt = mock_ppgtt(i915, name);
> +		if (!ppgtt)
>   			goto err_put;
> +
> +		__set_ppgtt(ctx, ppgtt);
>   	}
>   
>   	return ctx;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 1c69ed16a923..9af7a8e6a46e 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -343,6 +343,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_I915_PERF_ADD_CONFIG	0x37
>   #define DRM_I915_PERF_REMOVE_CONFIG	0x38
>   #define DRM_I915_QUERY			0x39
> +#define DRM_I915_GEM_VM_CREATE		0x3a
> +#define DRM_I915_GEM_VM_DESTROY		0x3b
>   /* Must be kept compact -- no holes */
>   
>   #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
> @@ -402,6 +404,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
>   #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
>   #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
> +#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
>   
>   /* Allow drivers to submit batchbuffers directly to hardware, relying
>    * on the security mechanisms provided by hardware.
> @@ -1453,6 +1457,33 @@ struct drm_i915_gem_context_destroy {
>   	__u32 pad;
>   };
>   
> +/*
> + * DRM_I915_GEM_VM_CREATE -
> + *
> + * Create a new virtual memory address space (ppGTT) for use within a context
> + * on the same file. Extensions can be provided to configure exactly how the
> + * address space is setup upon creation.
> + *
> + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> + * returned in the outparam @vm_id.
> + *
> + * No flags are defined, with all bits reserved and must be zero.
> + *
> + * An extension chain maybe provided, starting with @extensions, and terminated
> + * by the @next_extension being 0. Currently, no extensions are defined.
> + *
> + * DRM_I915_GEM_VM_DESTROY -
> + *
> + * Destroys a previously created VM id, specified in @vm_id.
> + *
> + * No extensions or flags are allowed currently, and so must be zero.
> + */
> +struct drm_i915_gem_vm_control {
> +	__u64 extensions;
> +	__u32 flags;
> +	__u32 vm_id;
> +};
> +
>   struct drm_i915_reg_read {
>   	/*
>   	 * Register offset.
> @@ -1542,7 +1573,19 @@ struct drm_i915_gem_context_param {
>    * On creation, all new contexts are marked as recoverable.
>    */
>   #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
> +
> +	/*
> +	 * The id of the associated virtual memory address space (ppGTT) of
> +	 * this context. Can be retrieved and passed to another context
> +	 * (on the same fd) for both to use the same ppGTT and so share
> +	 * address layouts, and avoid reloading the page tables on context
> +	 * switches between themselves.
> +	 *
> +	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
> +	 */
> +#define I915_CONTEXT_PARAM_VM		0x9
>   /* Must be kept compact -- no holes and well documented */
> +
>   	__u64 value;
>   };
>   
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation
  2019-03-19 11:57 ` [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
@ 2019-03-20 13:13   ` Tvrtko Ursulin
  2019-03-21 14:38     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 13:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> A usecase arose out of handling context recovery in mesa, whereby they
> wish to recreate a context with fresh logical state but preserving all
> other details of the original. Currently, they create a new context and
> iterate over which bits they want to copy across, but it would much more
> convenient if they were able to just pass in a target context to clone
> during creation. This essentially extends the setparam during creation
> to pull the details from a target context instead of the user supplied
> parameters.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 154 ++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h             |  14 +++
>   2 files changed, 168 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index fc1f64e19507..f36648329074 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1500,8 +1500,162 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
>   	return ctx_setparam(arg->ctx, &local.param);
>   }
>   
> +static int clone_flags(struct i915_gem_context *dst,
> +		       struct i915_gem_context *src)
> +{
> +	dst->user_flags = src->user_flags;
> +	return 0;
> +}
> +
> +static int clone_schedattr(struct i915_gem_context *dst,
> +			   struct i915_gem_context *src)
> +{
> +	dst->sched = src->sched;
> +	return 0;
> +}
> +
> +static int clone_sseu(struct i915_gem_context *dst,
> +		      struct i915_gem_context *src)
> +{
> +	const struct intel_sseu default_sseu =
> +		intel_device_default_sseu(dst->i915);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, dst->i915, id) {

Hm in the load balancing patch this needs to be extended so the veng ce 
is also handled here.

Possibly even when adding engine map the loop needs to iterate the map 
and not for_each_engine?

> +		struct intel_context *ce;
> +		struct intel_sseu sseu;
> +
> +		ce = intel_context_lookup(src, engine);
> +		if (!ce)
> +			continue;
> +
> +		sseu = ce->sseu;
> +		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))

Could memcmp against &ce->sseu directly and keep src_ce and dst_ce so 
you can copy over without a temporary copy on stack?

> +			continue;
> +
> +		ce = intel_context_pin_lock(dst, engine);
> +		if (IS_ERR(ce))
> +			return PTR_ERR(ce);
> +
> +		ce->sseu = sseu;
> +		intel_context_pin_unlock(ce);
> +	}
> +
> +	return 0;
> +}
> +
> +static int clone_timeline(struct i915_gem_context *dst,
> +			  struct i915_gem_context *src)
> +{
> +	if (src->timeline) {
> +		GEM_BUG_ON(src->timeline == dst->timeline);
> +
> +		if (dst->timeline)
> +			i915_timeline_put(dst->timeline);
> +		dst->timeline = i915_timeline_get(src->timeline);
> +	}
> +
> +	return 0;
> +}
> +
> +static int clone_vm(struct i915_gem_context *dst,
> +		    struct i915_gem_context *src)
> +{
> +	struct i915_hw_ppgtt *ppgtt;
> +
> +	rcu_read_lock();
> +	do {
> +		ppgtt = READ_ONCE(src->ppgtt);
> +		if (!ppgtt)
> +			break;
> +
> +		if (!kref_get_unless_zero(&ppgtt->ref))
> +			continue;
> +
> +		/*
> +		 * This ppgtt may have be reallocated between
> +		 * the read and the kref, and reassigned to a third
> +		 * context. In order to avoid inadvertent sharing
> +		 * of this ppgtt with that third context (and not
> +		 * src), we have to confirm that we have the same
> +		 * ppgtt after passing through the strong memory
> +		 * barrier implied by a successful
> +		 * kref_get_unless_zero().
> +		 *
> +		 * Once we have acquired the current ppgtt of src,
> +		 * we no longer care if it is released from src, as
> +		 * it cannot be reallocated elsewhere.
> +		 */
> +
> +		if (ppgtt == READ_ONCE(src->ppgtt))
> +			break;
> +
> +		i915_ppgtt_put(ppgtt);
> +	} while (1);
> +	rcu_read_unlock();

I still have the same problem. What if you added here:

GEM_BUG_ON(ppgtt != READ_ONCE(src->ppgtt));

Could it trigger? If so what is the point in the last check in the loop 
above?

> +
> +	if (ppgtt) {
> +		__assign_ppgtt(dst, ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +
> +	return 0;
> +}
> +
> +static int create_clone(struct i915_user_extension __user *ext, void *data)
> +{
> +	static int (* const fn[])(struct i915_gem_context *dst,
> +				  struct i915_gem_context *src) = {
> +#define MAP(x, y) [ilog2(I915_CONTEXT_CLONE_##x)] = y
> +		MAP(FLAGS, clone_flags),
> +		MAP(SCHEDATTR, clone_schedattr),
> +		MAP(SSEU, clone_sseu),
> +		MAP(TIMELINE, clone_timeline),
> +		MAP(VM, clone_vm),
> +#undef MAP
> +	};
> +	struct drm_i915_gem_context_create_ext_clone local;
> +	const struct create_ext *arg = data;
> +	struct i915_gem_context *dst = arg->ctx;
> +	struct i915_gem_context *src;
> +	int err, bit;
> +
> +	if (copy_from_user(&local, ext, sizeof(local)))
> +		return -EFAULT;
> +
> +	BUILD_BUG_ON(GENMASK(BITS_PER_TYPE(local.flags) - 1, ARRAY_SIZE(fn)) !=
> +		     I915_CONTEXT_CLONE_UNKNOWN);
> +
> +	if (local.flags & I915_CONTEXT_CLONE_UNKNOWN)
> +		return -EINVAL;
> +
> +	if (local.rsvd)
> +		return -EINVAL;
> +
> +	rcu_read_lock();
> +	src = __i915_gem_context_lookup_rcu(arg->fpriv, local.clone_id);
> +	rcu_read_unlock();
> +	if (!src)
> +		return -ENOENT;
> +
> +	GEM_BUG_ON(src == dst);
> +
> +	for (bit = 0; bit < ARRAY_SIZE(fn); bit++) {
> +		if (!(local.flags & BIT(bit)))
> +			continue;
> +
> +		err = fn[bit](dst, src);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn create_extensions[] = {
>   	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
> +	[I915_CONTEXT_CREATE_EXT_CLONE] = create_clone,
>   };
>   
>   static bool client_is_banned(struct drm_i915_file_private *file_priv)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 9999f7d6a5a9..a5bdb86858f6 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1581,6 +1581,20 @@ struct drm_i915_gem_context_create_ext_setparam {
>   	struct drm_i915_gem_context_param param;
>   };
>   
> +struct drm_i915_gem_context_create_ext_clone {
> +#define I915_CONTEXT_CREATE_EXT_CLONE 1
> +	struct i915_user_extension base;
> +	__u32 clone_id;
> +	__u32 flags;
> +#define I915_CONTEXT_CLONE_FLAGS	(1u << 0)
> +#define I915_CONTEXT_CLONE_SCHEDATTR	(1u << 1)
> +#define I915_CONTEXT_CLONE_SSEU		(1u << 2)
> +#define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
> +#define I915_CONTEXT_CLONE_VM		(1u << 4)
> +#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
> +	__u64 rsvd;
> +};
> +
>   struct drm_i915_gem_context_destroy {
>   	__u32 ctx_id;
>   	__u32 pad;
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 13/18] drm/i915: Allow a context to define its set of engines
  2019-03-19 11:57 ` [PATCH 13/18] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-03-20 13:20   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 13:20 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> Over the last few years, we have debated how to extend the user API to
> support an increase in the number of engines, that may be sparse and
> even be heterogeneous within a class (not all video decoders created
> equal). We settled on using (class, instance) tuples to identify a
> specific engine, with an API for the user to construct a map of engines
> to capabilities. Into this picture, we then add a challenge of virtual
> engines; one user engine that maps behind the scenes to any number of
> physical engines. To keep it general, we want the user to have full
> control over that mapping. To that end, we allow the user to constrain a
> context to define the set of engines that it can access, order fully
> controlled by the user via (class, instance). With such precise control
> in context setup, we can continue to use the existing execbuf uABI of
> specifying a single index; only now it doesn't automagically map onto
> the engines, it uses the user defined engine map from the context.
> 
> The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
> execbuf. It's use will be revealed in the next patch.
> 
> v2: Fixup freeing of local on success of get_engines()
> v3: Allow empty engines[]
> v4: s/nengine/num_engines/
> 
> Testcase: igt/gem_ctx_engines
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       | 226 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context_types.h |  21 ++
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  19 +-
>   drivers/gpu/drm/i915/i915_utils.h             |  36 +++
>   include/uapi/drm/i915_drm.h                   |  42 +++-
>   5 files changed, 331 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f36648329074..f038c15e73d8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -86,7 +86,9 @@
>    */
>   
>   #include <linux/log2.h>
> +
>   #include <drm/i915_drm.h>
> +
>   #include "i915_drv.h"
>   #include "i915_globals.h"
>   #include "i915_trace.h"
> @@ -101,6 +103,21 @@ static struct i915_global_gem_context {
>   	struct kmem_cache *slab_luts;
>   } global;
>   
> +static struct intel_engine_cs *
> +lookup_user_engine(struct i915_gem_context *ctx,
> +		   unsigned long flags, u16 class, u16 instance)
> +#define LOOKUP_USER_INDEX BIT(0)
> +{
> +	if (flags & LOOKUP_USER_INDEX) {
> +		if (instance >= ctx->num_engines)
> +			return NULL;
> +
> +		return ctx->engines[instance];
> +	}
> +
> +	return intel_engine_lookup_user(ctx->i915, class, instance);
> +}
> +
>   struct i915_lut_handle *i915_lut_handle_alloc(void)
>   {
>   	return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
> @@ -235,6 +252,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
>   
> +	kfree(ctx->engines);
> +
>   	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>   		intel_context_put(it);
>   
> @@ -1377,9 +1396,9 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	if (user_sseu.flags || user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = intel_engine_lookup_user(i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> +	engine = lookup_user_engine(ctx, 0,
> +				    user_sseu.engine_class,
> +				    user_sseu.engine_instance);
>   	if (!engine)
>   		return -EINVAL;
>   
> @@ -1397,9 +1416,166 @@ static int set_sseu(struct i915_gem_context *ctx,
>   
>   	args->size = sizeof(user_sseu);
>   
> +	return 0;
> +};
> +
> +struct set_engines {
> +	struct i915_gem_context *ctx;
> +	struct intel_engine_cs **engines;
> +	unsigned int num_engines;
> +};
> +
> +static const i915_user_extension_fn set_engines__extensions[] = {
> +};
> +
> +static int
> +set_engines(struct i915_gem_context *ctx,
> +	    const struct drm_i915_gem_context_param *args)
> +{
> +	struct i915_context_param_engines __user *user;
> +	struct set_engines set = { .ctx = ctx };
> +	u64 size, extensions;
> +	unsigned int n;
> +	int err;
> +
> +	user = u64_to_user_ptr(args->value);
> +	size = args->size;
> +	if (!size)
> +		goto out;
> +
> +	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
> +	if (size < sizeof(*user) ||
> +	    !IS_ALIGNED(size, sizeof(*user->class_instance)))
> +		return -EINVAL;
> +
> +	/* Internal limitation of u64 bitmaps + a few bits of u64 in the uABI */
> +	set.num_engines =
> +		(size - sizeof(*user)) / sizeof(*user->class_instance);
> +	if (set.num_engines > I915_EXEC_RING_MASK + 1)
> +		return -EINVAL;
> +
> +	set.engines = kmalloc_array(set.num_engines,
> +				    sizeof(*set.engines),
> +				    GFP_KERNEL);
> +	if (!set.engines)
> +		return -ENOMEM;
> +
> +	for (n = 0; n < set.num_engines; n++) {
> +		u16 class, inst;
> +
> +		if (get_user(class, &user->class_instance[n].engine_class) ||
> +		    get_user(inst, &user->class_instance[n].engine_instance)) {
> +			kfree(set.engines);
> +			return -EFAULT;
> +		}
> +
> +		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
> +		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
> +			set.engines[n] = NULL;
> +			continue;
> +		}
> +
> +		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
> +		if (!set.engines[n]) {
> +			kfree(set.engines);
> +			return -ENOENT;
> +		}
> +	}
> +
> +	err = -EFAULT;
> +	if (!get_user(extensions, &user->extensions))
> +		err = i915_user_extensions(u64_to_user_ptr(extensions),
> +					   set_engines__extensions,
> +					   ARRAY_SIZE(set_engines__extensions),
> +					   &set);
> +	if (err) {
> +		kfree(set.engines);
> +		return err;
> +	}
> +
> +out:
> +	mutex_lock(&ctx->i915->drm.struct_mutex);
> +	kfree(ctx->engines);
> +	ctx->engines = set.engines;
> +	ctx->num_engines = set.num_engines;
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
>   	return 0;
>   }
>   
> +static int
> +get_engines(struct i915_gem_context *ctx,
> +	    struct drm_i915_gem_context_param *args)
> +{
> +	struct i915_context_param_engines *local;
> +	size_t n, count, size;
> +	int err = 0;
> +
> +restart:
> +	if (!READ_ONCE(ctx->engines)) {
> +		args->size = 0;
> +		return 0;
> +	}
> +
> +	count = READ_ONCE(ctx->num_engines);
> +
> +	/* Be paranoid in case we have an impedance mismatch */
> +	if (!check_struct_size(local, class_instance, count, &size))
> +		return -ENOMEM;
> +	if (unlikely(overflows_type(size, args->size)))
> +		return -ENOMEM;
> +
> +	if (!args->size) {
> +		args->size = size;
> +		return 0;
> +	}
> +
> +	if (args->size < size)
> +		return -EINVAL;
> +
> +	local = kmalloc(size, GFP_KERNEL);
> +	if (!local)
> +		return -ENOMEM;
> +
> +	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
> +		err = -EINTR;
> +		goto out;
> +	}
> +
> +	if (!ctx->engines || ctx->num_engines != count) {
> +		mutex_unlock(&ctx->i915->drm.struct_mutex);
> +		kfree(local);
> +		goto restart;
> +	}
> +
> +	local->extensions = 0;
> +	for (n = 0; n < count; n++) {
> +		if (ctx->engines[n]) {
> +			local->class_instance[n].engine_class =
> +				ctx->engines[n]->uabi_class;
> +			local->class_instance[n].engine_instance =
> +				ctx->engines[n]->instance;
> +		} else {
> +			local->class_instance[n].engine_class =
> +				I915_ENGINE_CLASS_INVALID;
> +			local->class_instance[n].engine_instance =
> +				I915_ENGINE_CLASS_INVALID_NONE;
> +		}
> +	}
> +
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
> +		err = -EFAULT;
> +		goto out;
> +	}
> +	args->size = size;
> +
> +out:
> +	kfree(local);
> +	return err;
> +}
> +
>   static int ctx_setparam(struct i915_gem_context *ctx,
>   			struct drm_i915_gem_context_param *args)
>   {
> @@ -1472,6 +1648,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
>   		ret = set_ppgtt(ctx, args);
>   		break;
>   
> +	case I915_CONTEXT_PARAM_ENGINES:
> +		ret = set_engines(ctx, args);
> +		break;
> +
>   	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
> @@ -1500,6 +1680,35 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
>   	return ctx_setparam(arg->ctx, &local.param);
>   }
>   
> +static int clone_engines(struct i915_gem_context *dst,
> +			 struct i915_gem_context *src)
> +{
> +	struct intel_engine_cs **engines;
> +	unsigned int num_engines;
> +
> +	mutex_lock(&src->i915->drm.struct_mutex); /* serialise src->engine[] */
> +
> +	/* handle ZERO_SIZE_PTR on behalf of kmemdup */
> +	num_engines = src->num_engines;
> +	engines = src->engines;
> +	if (!ZERO_OR_NULL_PTR(engines)) {
> +		engines = kmemdup(engines,
> +				  sizeof(*engines) * num_engines,
> +				  GFP_KERNEL);
> +		if (!engines) {
> +			mutex_unlock(&src->i915->drm.struct_mutex);
> +			return -ENOMEM;
> +		}
> +	}
> +
> +	mutex_unlock(&src->i915->drm.struct_mutex);
> +
> +	kfree(dst->engines);
> +	dst->engines = engines;
> +	dst->num_engines = num_engines;
> +	return 0;
> +}
> +
>   static int clone_flags(struct i915_gem_context *dst,
>   		       struct i915_gem_context *src)
>   {
> @@ -1608,6 +1817,7 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
>   	static int (* const fn[])(struct i915_gem_context *dst,
>   				  struct i915_gem_context *src) = {
>   #define MAP(x, y) [ilog2(I915_CONTEXT_CLONE_##x)] = y
> +		MAP(ENGINES, clone_engines),
>   		MAP(FLAGS, clone_flags),
>   		MAP(SCHEDATTR, clone_schedattr),
>   		MAP(SSEU, clone_sseu),
> @@ -1770,9 +1980,9 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	if (user_sseu.flags || user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = intel_engine_lookup_user(ctx->i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> +	engine = lookup_user_engine(ctx, 0,
> +				    user_sseu.engine_class,
> +				    user_sseu.engine_instance);
>   	if (!engine)
>   		return -EINVAL;
>   
> @@ -1853,6 +2063,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   		ret = get_ppgtt(ctx, args);
>   		break;
>   
> +	case I915_CONTEXT_PARAM_ENGINES:
> +		ret = get_engines(ctx, args);
> +		break;
> +
>   	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
> index e2ec58b10fb2..46b6080b2240 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
> @@ -41,6 +41,20 @@ struct i915_gem_context {
>   	/** file_priv: owning file descriptor */
>   	struct drm_i915_file_private *file_priv;
>   
> +	/**
> +	 * @engines: User defined engines for this context
> +	 *
> +	 * NULL means to use legacy definitions (including random meaning of
> +	 * I915_EXEC_BSD with I915_EXEC_BSD_SELECTOR overrides).
> +	 *
> +	 * If defined, execbuf uses the I915_EXEC_MASK as an index into
> +	 * array, and various uAPI other the ability to lookup up an
> +	 * index from this array to select an engine operate on.
> +	 *
> +	 * User defined by I915_CONTEXT_PARAM_ENGINE.
> +	 */
> +	struct intel_engine_cs **engines;
> +
>   	struct i915_timeline *timeline;
>   
>   	/**
> @@ -110,6 +124,13 @@ struct i915_gem_context {
>   #define CONTEXT_CLOSED			1
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
>   
> +	/**
> +	 * @num_engines: Number of user defined engines for this context
> +	 *
> +	 * See @engines for the elements.
> +	 */
> +	unsigned int num_engines;
> +
>   	/**
>   	 * @hw_id: - unique identifier for the context
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 3d672c9edb94..66b3921cc8bd 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -2089,13 +2089,20 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
>   };
>   
>   static struct intel_engine_cs *
> -eb_select_engine(struct drm_i915_private *dev_priv,
> +eb_select_engine(struct i915_execbuffer *eb,
>   		 struct drm_file *file,
>   		 struct drm_i915_gem_execbuffer2 *args)
>   {
>   	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
>   	struct intel_engine_cs *engine;
>   
> +	if (eb->ctx->engines) {
> +		if (user_ring_id >= eb->ctx->num_engines)
> +			return NULL;
> +
> +		return eb->ctx->engines[user_ring_id];
> +	}
> +
>   	if (user_ring_id > I915_USER_RINGS) {
>   		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
>   		return NULL;
> @@ -2108,11 +2115,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   		return NULL;
>   	}
>   
> -	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(dev_priv, VCS1)) {
> +	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(eb->i915, VCS1)) {
>   		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
>   
>   		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
> -			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
> +			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
>   		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
>   			   bsd_idx <= I915_EXEC_BSD_RING2) {
>   			bsd_idx >>= I915_EXEC_BSD_SHIFT;
> @@ -2123,9 +2130,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   			return NULL;
>   		}
>   
> -		engine = dev_priv->engine[_VCS(bsd_idx)];
> +		engine = eb->i915->engine[_VCS(bsd_idx)];
>   	} else {
> -		engine = dev_priv->engine[user_ring_map[user_ring_id]];
> +		engine = eb->i915->engine[user_ring_map[user_ring_id]];
>   	}
>   
>   	if (!engine) {
> @@ -2335,7 +2342,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	if (unlikely(err))
>   		goto err_destroy;
>   
> -	eb.engine = eb_select_engine(eb.i915, file, args);
> +	eb.engine = eb_select_engine(&eb, file, args);
>   	if (!eb.engine) {
>   		err = -EINVAL;
>   		goto err_engine;
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 2dbe8933b50a..1436fe2fb5f8 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -25,6 +25,9 @@
>   #ifndef __I915_UTILS_H
>   #define __I915_UTILS_H
>   
> +#include <linux/kernel.h>
> +#include <linux/overflow.h>
> +
>   #undef WARN_ON
>   /* Many gcc seem to no see through this and fall over :( */
>   #if 0
> @@ -73,6 +76,39 @@
>   #define overflows_type(x, T) \
>   	(sizeof(x) > sizeof(T) && (x) >> BITS_PER_TYPE(T))
>   
> +static inline bool
> +__check_struct_size(size_t base, size_t arr, size_t count, size_t *size)
> +{
> +	size_t sz;
> +
> +	if (check_mul_overflow(count, arr, &sz))
> +		return false;
> +
> +	if (check_add_overflow(sz, base, &sz))
> +		return false;
> +
> +	*size = sz;
> +	return true;
> +}
> +
> +/**
> + * check_struct_size() - Calculate size of structure with trailing array.
> + * @p: Pointer to the structure.
> + * @member: Name of the array member.
> + * @n: Number of elements in the array.
> + * @sz: Total size of structure and array
> + *
> + * Calculates size of memory needed for structure @p followed by an
> + * array of @n @member elements, like struct_size() but reports
> + * whether it overflowed, and the resultant size in @sz
> + *
> + * Return: false if the calculation overflowed.
> + */
> +#define check_struct_size(p, member, n, sz) \
> +	likely(__check_struct_size(sizeof(*(p)), \
> +				   sizeof(*(p)->member) + __must_be_array((p)->member), \
> +				   n, sz))
> +
>   #define ptr_mask_bits(ptr, n) ({					\
>   	unsigned long __v = (unsigned long)(ptr);			\
>   	(typeof(ptr))(__v & -BIT(n));					\
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a5bdb86858f6..4e67c2395b46 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -126,6 +126,8 @@ enum drm_i915_gem_engine_class {
>   	I915_ENGINE_CLASS_INVALID	= -1
>   };
>   
> +#define I915_ENGINE_CLASS_INVALID_NONE -1
> +
>   /**
>    * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
>    *
> @@ -1511,6 +1513,26 @@ struct drm_i915_gem_context_param {
>   	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
>   	 */
>   #define I915_CONTEXT_PARAM_VM		0x9
> +
> +/*
> + * I915_CONTEXT_PARAM_ENGINES:
> + *
> + * Bind this context to operate on this subset of available engines. Henceforth,
> + * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
> + * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
> + * and upwards. Slots 0...N are filled in using the specified (class, instance).
> + * Use
> + *	engine_class: I915_ENGINE_CLASS_INVALID,
> + *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
> + * to specify a gap in the array that can be filled in later, e.g. by a
> + * virtual engine used for load balancing.
> + *
> + * Setting the number of engines bound to the context to 0, by passing a zero
> + * sized argument, will revert back to default settings.
> + *
> + * See struct i915_context_param_engines.
> + */
> +#define I915_CONTEXT_PARAM_ENGINES	0xa
>   /* Must be kept compact -- no holes and well documented */
>   
>   	__u64 value;
> @@ -1575,6 +1597,23 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +struct i915_context_param_engines {
> +	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
> +
> +	struct {
> +		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> +		__u16 engine_instance;
> +	} class_instance[0];
> +} __attribute__((packed));
> +
> +#define I915_DEFINE_CONTEXT_PARAM_ENGINES(name__, N__) struct { \
> +	__u64 extensions; \
> +	struct { \
> +		__u16 engine_class; \
> +		__u16 engine_instance; \
> +	} class_instance[N__]; \
> +} __attribute__((packed)) name__
> +
>   struct drm_i915_gem_context_create_ext_setparam {
>   #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
>   	struct i915_user_extension base;
> @@ -1591,7 +1630,8 @@ struct drm_i915_gem_context_create_ext_clone {
>   #define I915_CONTEXT_CLONE_SSEU		(1u << 2)
>   #define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
>   #define I915_CONTEXT_CLONE_VM		(1u << 4)
> -#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
> +#define I915_CONTEXT_CLONE_ENGINES	(1u << 5)
> +#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_ENGINES << 1)
>   	__u64 rsvd;
>   };
>   
> 
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 14/18] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-19 11:57 ` [PATCH 14/18] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
@ 2019-03-20 13:22   ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 13:22 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> Allow the user to specify a local engine index (as opposed to
> class:index) that they can use to refer to a preset engine inside the
> ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
> This will be useful for setting SSEU parameters on virtual engines that
> are local to the context and do not have a valid global class:instance
> lookup.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
>   include/uapi/drm/i915_drm.h             |  3 ++-
>   2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index f038c15e73d8..cbd76ef95115 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1381,6 +1381,7 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	struct drm_i915_gem_context_param_sseu user_sseu;
>   	struct intel_engine_cs *engine;
>   	struct intel_sseu sseu;
> +	unsigned long lookup;
>   	int ret;
>   
>   	if (args->size < sizeof(user_sseu))
> @@ -1393,10 +1394,17 @@ static int set_sseu(struct i915_gem_context *ctx,
>   			   sizeof(user_sseu)))
>   		return -EFAULT;
>   
> -	if (user_sseu.flags || user_sseu.rsvd)
> +	if (user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = lookup_user_engine(ctx, 0,
> +	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
> +		return -EINVAL;
> +
> +	lookup = 0;
> +	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
> +		lookup |= LOOKUP_USER_INDEX;
> +
> +	engine = lookup_user_engine(ctx, lookup,
>   				    user_sseu.engine_class,
>   				    user_sseu.engine_instance);
>   	if (!engine)
> @@ -1967,6 +1975,7 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	struct drm_i915_gem_context_param_sseu user_sseu;
>   	struct intel_engine_cs *engine;
>   	struct intel_context *ce;
> +	unsigned long lookup;
>   
>   	if (args->size == 0)
>   		goto out;
> @@ -1977,10 +1986,17 @@ static int get_sseu(struct i915_gem_context *ctx,
>   			   sizeof(user_sseu)))
>   		return -EFAULT;
>   
> -	if (user_sseu.flags || user_sseu.rsvd)
> +	if (user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = lookup_user_engine(ctx, 0,
> +	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
> +		return -EINVAL;
> +
> +	lookup = 0;
> +	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
> +		lookup |= LOOKUP_USER_INDEX;
> +
> +	engine = lookup_user_engine(ctx, lookup,
>   				    user_sseu.engine_class,
>   				    user_sseu.engine_instance);
>   	if (!engine)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 4e67c2395b46..8ef6d60929c6 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1567,9 +1567,10 @@ struct drm_i915_gem_context_param_sseu {
>   	__u16 engine_instance;
>   
>   	/*
> -	 * Unused for now. Must be cleared to zero.
> +	 * Unknown flags must be cleared to zero.
>   	 */
>   	__u32 flags;
> +#define I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX (1u << 0)
>   
>   	/*
>   	 * Mask of slices to enable for the context. Valid values are a subset
> 

Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-20 12:35         ` Chris Wilson
@ 2019-03-20 14:24           ` Matthew Auld
  2019-03-21  0:16           ` Chris Wilson
  1 sibling, 0 replies; 47+ messages in thread
From: Matthew Auld @ 2019-03-20 14:24 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Intel Graphics Development

On Wed, 20 Mar 2019 at 12:36, Chris Wilson <chris@chris-wilson.co.uk> wrote:
>
> Quoting Matthew Auld (2019-03-20 12:26:00)
> > On Wed, 20 Mar 2019 at 11:48, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > >
> > > Quoting Matthew Auld (2019-03-20 11:41:52)
> > > > On Tue, 19 Mar 2019 at 11:58, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> > > > > @@ -2534,6 +2522,14 @@ void __i915_gem_object_set_pages(struct drm_i915_gem_object *obj,
> > > > >
> > > > >         lockdep_assert_held(&obj->mm.lock);
> > > > >
> > > > > +       /* Make the pages coherent with the GPU (flushing any swapin). */
> > > > > +       if (obj->cache_dirty) {
> > > > > +               obj->write_domain = 0;
> > > > > +               if (i915_gem_object_has_struct_page(obj))
> > > > > +                       drm_clflush_sg(pages);
> > > > > +               obj->cache_dirty = false;
> > > > > +       }
> > > >
> > > > Is it worth adding some special casing here for volatile objects, so
> > > > that we avoid doing the clflush_sg every time we do set_pages for
> > > > !llc?
> > > >
> > > > if (obj->cache_dirty && obj->mm.madvise == WILLNEED)
> > > >
> > > > Or is that meh?
> > >
> > > No, even for volatile objects we have to be careful with what remains in
> > > the CPU cache as that may obscure updates to the underlying page. We see
> > > the very same problem with speculative cacheline loading.
> > >
> > > A DONTNEED object should fail before it gets allocated pages :)
> >
> > I was talking about kernel internal objects, which are marked as
> > DONTNEED just before we call set_pages(), and for that case it's
> > surely up to the caller to flush things before they even think of
> > doing the unpin(since it's volatile).
>
> But those objects also become WILLNEED at that point, and may still need
> to be flushed.
>
> The cost of the extra flushes is a worry, but not enough for me to be
> concerned about. I think the convention that get_pages == coherent on
> gpu improves quite a bit of our internal rummaging around and prevents
> the ABI nightmare of mmap_gtt/mmap_offset. Will this flush remain inside
> set_pages()? No, I don't think it will as pushing it into the callers
> outside of the mm.lock itself makes sense, but I didn't think that was
> of paramount importance compared to the uABI and can be done later.

Fair enough,
Reviewed-by: Matthew Auld <matthew.william.auld@gmail.com>

> -Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/18] drm/i915: Load balancing across a virtual engine
  2019-03-19 11:57 ` [PATCH 15/18] drm/i915: Load balancing across a virtual engine Chris Wilson
@ 2019-03-20 15:59   ` Tvrtko Ursulin
  2019-03-21 15:00     ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-20 15:59 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 19/03/2019 11:57, Chris Wilson wrote:
> Having allowed the user to define a set of engines that they will want
> to only use, we go one step further and allow them to bind those engines
> into a single virtual instance. Submitting a batch to the virtual engine
> will then forward it to any one of the set in a manner as best to
> distribute load.  The virtual engine has a single timeline across all
> engines (it operates as a single queue), so it is not able to concurrently
> run batches across multiple engines by itself; that is left up to the user
> to submit multiple concurrent batches to multiple queues. Multiple users
> will be load balanced across the system.
> 
> The mechanism used for load balancing in this patch is a late greedy
> balancer. When a request is ready for execution, it is added to each
> engine's queue, and when an engine is ready for its next request it
> claims it from the virtual engine. The first engine to do so, wins, i.e.
> the request is executed at the earliest opportunity (idle moment) in the
> system.
> 
> As not all HW is created equal, the user is still able to skip the
> virtual engine and execute the batch on a specific engine, all within the
> same queue. It will then be executed in order on the correct engine,
> with execution on other virtual engines being moved away due to the load
> detection.
> 
> A couple of areas for potential improvement left!
> 
> - The virtual engine always take priority over equal-priority tasks.
> Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
> and hopefully the virtual and real engines are not then congested (i.e.
> all work is via virtual engines, or all work is to the real engine).
> 
> - We require the breadcrumb irq around every virtual engine request. For
> normal engines, we eliminate the need for the slow round trip via
> interrupt by using the submit fence and queueing in order. For virtual
> engines, we have to allow any job to transfer to a new ring, and cannot
> coalesce the submissions, so require the completion fence instead,
> forcing the persistent use of interrupts.
> 
> - We only drip feed single requests through each virtual engine and onto
> the physical engines, even if there was enough work to fill all ELSP,
> leaving small stalls with an idle CS event at the end of every request.
> Could we be greedy and fill both slots? Being lazy is virtuous for load
> distribution on less-than-full workloads though.
> 
> Other areas of improvement are more general, such as reducing lock
> contention, reducing dispatch overhead, looking at direct submission
> rather than bouncing around tasklets etc.
> 
> sseu: Lift the restriction to allow sseu to be reconfigured on virtual
> engines composed of RENDER_CLASS (rcs).
> 
> v2: macroize check_user_mbz()
> v3: Cancel virtual engines on wedging
> v4: Commence commenting
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.h            |   5 +
>   drivers/gpu/drm/i915/i915_gem_context.c    | 126 ++++-
>   drivers/gpu/drm/i915/i915_scheduler.c      |  18 +-
>   drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
>   drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
>   drivers/gpu/drm/i915/intel_lrc.c           | 567 ++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.h           |  11 +
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 ++++++
>   include/uapi/drm/i915_drm.h                |  30 ++
>   9 files changed, 912 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 5c073fe73664..3ca855505715 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -96,4 +96,9 @@ static inline bool __tasklet_enable(struct tasklet_struct *t)
>   	return atomic_dec_and_test(&t->count);
>   }
>   
> +static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
> +{
> +	return test_bit(TASKLET_STATE_SCHED, &t->state);
> +}
> +
>   #endif /* __I915_GEM_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index cbd76ef95115..8d8fcc8c7a86 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -86,6 +86,7 @@
>    */
>   
>   #include <linux/log2.h>
> +#include <linux/nospec.h>
>   
>   #include <drm/i915_drm.h>
>   
> @@ -94,6 +95,7 @@
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
>   #include "intel_lrc_reg.h"
> +#include "intel_lrc.h"
>   #include "intel_workarounds.h"
>   
>   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
> @@ -241,6 +243,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
>   	mutex_unlock(&i915->contexts.mutex);
>   }
>   
> +static void free_engines(struct intel_engine_cs **engines, int count)
> +{
> +	int i;
> +
> +	if (ZERO_OR_NULL_PTR(engines))
> +		return;
> +
> +	/* We own the veng we created; regular engines are ignored */
> +	for (i = 0; i < count; i++)
> +		intel_virtual_engine_destroy(engines[i]);
> +
> +	kfree(engines);
> +}
> +
>   static void i915_gem_context_free(struct i915_gem_context *ctx)
>   {
>   	struct intel_context *it, *n;
> @@ -251,8 +267,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
> -
> -	kfree(ctx->engines);
> +	free_engines(ctx->engines, ctx->num_engines);
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>   		intel_context_put(it);
> @@ -1239,7 +1254,6 @@ __i915_gem_context_reconfigure_sseu(struct i915_gem_context *ctx,
>   	int ret = 0;
>   
>   	GEM_BUG_ON(INTEL_GEN(ctx->i915) < 8);
> -	GEM_BUG_ON(engine->id != RCS0);
>   
>   	ce = intel_context_pin_lock(ctx, engine);
>   	if (IS_ERR(ce))
> @@ -1433,7 +1447,80 @@ struct set_engines {
>   	unsigned int num_engines;
>   };
>   
> +static int
> +set_engines__load_balance(struct i915_user_extension __user *base, void *data)
> +{
> +	struct i915_context_engines_load_balance __user *ext =
> +		container_of_user(base, typeof(*ext), base);
> +	const struct set_engines *set = data;
> +	struct intel_engine_cs *ve;
> +	unsigned int n;
> +	u64 mask;
> +	u16 idx;
> +	int err;
> +
> +	if (!HAS_EXECLISTS(set->ctx->i915))
> +		return -ENODEV;
> +
> +	if (USES_GUC_SUBMISSION(set->ctx->i915))
> +		return -ENODEV; /* not implement yet */
> +
> +	if (get_user(idx, &ext->engine_index))
> +		return -EFAULT;
> +
> +	if (idx >= set->num_engines)
> +		return -EINVAL;
> +
> +	idx = array_index_nospec(idx, set->num_engines);
> +	if (set->engines[idx])
> +		return -EEXIST;
> +
> +	err = check_user_mbz(&ext->mbz16);
> +	if (err)
> +		return err;
> +
> +	err = check_user_mbz(&ext->flags);
> +	if (err)
> +		return err;
> +
> +	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
> +		err = check_user_mbz(&ext->mbz64[n]);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (get_user(mask, &ext->engines_mask))
> +		return -EFAULT;
> +
> +	mask &= GENMASK_ULL(set->num_engines - 1, 0) & ~BIT_ULL(idx);
> +	if (!mask)
> +		return -EINVAL;
> +
> +	if (is_power_of_2(mask)) {
> +		ve = set->engines[__ffs64(mask)];
> +	} else {
> +		struct intel_engine_cs *stack[64];
> +		int bit;
> +
> +		n = 0;
> +		for_each_set_bit(bit, (unsigned long *)&mask, set->num_engines)
> +			stack[n++] = set->engines[bit];
> +
> +		ve = intel_execlists_create_virtual(set->ctx, stack, n);
> +	}
> +	if (IS_ERR(ve))
> +		return PTR_ERR(ve);
> +
> +	if (cmpxchg(&set->engines[idx], NULL, ve)) {
> +		intel_virtual_engine_destroy(ve);
> +		return -EEXIST;
> +	}
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn set_engines__extensions[] = {
> +	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
>   };
>   
>   static int
> @@ -1497,13 +1584,13 @@ set_engines(struct i915_gem_context *ctx,
>   					   ARRAY_SIZE(set_engines__extensions),
>   					   &set);
>   	if (err) {
> -		kfree(set.engines);
> +		free_engines(set.engines, set.num_engines);
>   		return err;
>   	}
>   
>   out:
>   	mutex_lock(&ctx->i915->drm.struct_mutex);
> -	kfree(ctx->engines);
> +	free_engines(ctx->engines, ctx->num_engines);
>   	ctx->engines = set.engines;
>   	ctx->num_engines = set.num_engines;
>   	mutex_unlock(&ctx->i915->drm.struct_mutex);
> @@ -1692,7 +1779,7 @@ static int clone_engines(struct i915_gem_context *dst,
>   			 struct i915_gem_context *src)
>   {
>   	struct intel_engine_cs **engines;
> -	unsigned int num_engines;
> +	unsigned int num_engines, i;
>   
>   	mutex_lock(&src->i915->drm.struct_mutex); /* serialise src->engine[] */
>   
> @@ -1707,11 +1794,36 @@ static int clone_engines(struct i915_gem_context *dst,
>   			mutex_unlock(&src->i915->drm.struct_mutex);
>   			return -ENOMEM;
>   		}
> +
> +		/*
> +		 * Virtual engines are singletons; they can only exist
> +		 * inside a single context, because they embed their
> +		 * HW context... As each virtual context implies a single
> +		 * timeline (each engine can only dequeue a single request
> +		 * at any time), it would be surprising for two contexts
> +		 * to use the same engine. So let's create a copy of
> +		 * the virtual engine instead.
> +		 */
> +		for (i = 0; i < num_engines; i++) {
> +			struct intel_engine_cs *engine = engines[i];
> +
> +			if (!engine || !intel_engine_is_virtual(engine))
> +				continue;
> +
> +			engine = intel_execlists_clone_virtual(dst, engine);
> +			if (IS_ERR(engine)) {
> +				free_engines(engines, i);
> +				mutex_unlock(&src->i915->drm.struct_mutex);
> +				return PTR_ERR(engine);
> +			}
> +
> +			engines[i] = engine;
> +		}
>   	}
>   
>   	mutex_unlock(&src->i915->drm.struct_mutex);
>   
> -	kfree(dst->engines);
> +	free_engines(dst->engines, dst->num_engines);
>   	dst->engines = engines;
>   	dst->num_engines = num_engines;
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index e0f609d01564..8cff4f6d6158 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -247,17 +247,26 @@ sched_lock_engine(const struct i915_sched_node *node,
>   		  struct intel_engine_cs *locked,
>   		  struct sched_cache *cache)
>   {
> -	struct intel_engine_cs *engine = node_to_request(node)->engine;
> +	const struct i915_request *rq = node_to_request(node);
> +	struct intel_engine_cs *engine;
>   
>   	GEM_BUG_ON(!locked);
>   
> -	if (engine != locked) {
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +	while (locked != (engine = READ_ONCE(rq->engine))) {
>   		spin_unlock(&locked->timeline.lock);
>   		memset(cache, 0, sizeof(*cache));
>   		spin_lock(&engine->timeline.lock);
> +		locked = engine;
>   	}
>   
> -	return engine;
> +	GEM_BUG_ON(locked != engine);
> +	return locked;
>   }
>   
>   static bool inflight(const struct i915_request *rq,
> @@ -370,8 +379,11 @@ static void __i915_schedule(struct i915_request *rq,
>   		if (prio <= node->attr.priority || node_signaled(node))
>   			continue;
>   
> +		GEM_BUG_ON(node_to_request(node)->engine != engine);
> +
>   		node->attr.priority = prio;
>   		if (!list_empty(&node->link)) {
> +			GEM_BUG_ON(intel_engine_is_virtual(engine));
>   			if (!cache.priolist)
>   				cache.priolist =
>   					i915_sched_lookup_priolist(engine,
> diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
> index 1f5b55d9ffb5..57c79830bb8c 100644
> --- a/drivers/gpu/drm/i915/i915_timeline_types.h
> +++ b/drivers/gpu/drm/i915/i915_timeline_types.h
> @@ -26,6 +26,7 @@ struct i915_timeline {
>   	spinlock_t lock;
>   #define TIMELINE_CLIENT 0 /* default subclass */
>   #define TIMELINE_ENGINE 1
> +#define TIMELINE_VIRTUAL 2
>   	struct mutex mutex; /* protects the flow of requests */
>   
>   	unsigned int pin_count;
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index cef1e71a7401..4d526767c371 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -224,6 +224,7 @@ struct intel_engine_execlists {
>   	 * @queue: queue of requests, in priority lists
>   	 */
>   	struct rb_root_cached queue;
> +	struct rb_root_cached virtual;
>   
>   	/**
>   	 * @csb_write: control register for Context Switch buffer
> @@ -429,6 +430,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
>   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> +#define I915_ENGINE_IS_VIRTUAL       BIT(4)
>   	unsigned int flags;
>   
>   	/*
> @@ -512,6 +514,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
>   }
>   
> +static inline bool
> +intel_engine_is_virtual(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_IS_VIRTUAL;
> +}
> +
>   #define instdone_slice_mask(dev_priv__) \
>   	(IS_GEN(dev_priv__, 7) ? \
>   	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0e64317ddcbf..a7f0235f19c5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -166,6 +166,41 @@
>   
>   #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
>   
> +struct virtual_engine {
> +	struct intel_engine_cs base;
> +	struct intel_context context;
> +
> +	/*
> +	 * We allow only a single request through the virtual engine at a time
> +	 * (each request in the timeline waits for the completion fence of
> +	 * the previous before being submitted). By restricting ourselves to
> +	 * only submitting a single request, each request is placed on to a
> +	 * physical to maximise load spreading (by virtue of the late greedy
> +	 * scheduling -- each real engine takes the next available request
> +	 * upon idling).
> +	 */
> +	struct i915_request *request;
> +
> +	/*
> +	 * We keep a rbtree of available virtual engines inside each physical
> +	 * engine, sorted by priority. Here we preallocate the nodes we need
> +	 * for the virtual engine, indexed by physical_engine->id.
> +	 */
> +	struct ve_node {
> +		struct rb_node rb;
> +		int prio;
> +	} nodes[I915_NUM_ENGINES];
> +
> +	/* And finally, which physical engines this virtual engine maps onto. */
> +	unsigned int count;
> +	struct intel_engine_cs *siblings[0];
> +};
> +
> +static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> +{
> +	return container_of(engine, struct virtual_engine, base);
> +}
> +
>   static int execlists_context_deferred_alloc(struct intel_context *ce,
>   					    struct intel_engine_cs *engine);
>   static void execlists_init_reg_state(u32 *reg_state,
> @@ -229,7 +264,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   }
>   
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
> -				const struct i915_request *rq)
> +				const struct i915_request *rq,
> +				struct rb_node *rb)
>   {
>   	int last_prio;
>   
> @@ -264,6 +300,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	    rq_prio(list_next_entry(rq, link)) > last_prio)
>   		return true;
>   
> +	if (rb) { /* XXX virtual precedence */
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		bool preempt = false;
> +
> +		if (engine == ve->siblings[0]) { /* only preempt one sibling */
> +			spin_lock(&ve->base.timeline.lock);
> +			if (ve->request)
> +				preempt = rq_prio(ve->request) > last_prio;
> +			spin_unlock(&ve->base.timeline.lock);
> +		}
> +
> +		if (preempt)
> +			return preempt;
> +	}
> +
>   	/*
>   	 * If the inflight context did not trigger the preemption, then maybe
>   	 * it was the set of queued requests? Pick the highest priority in
> @@ -382,6 +434,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry_safe_reverse(rq, rn,
>   					 &engine->timeline.requests,
>   					 link) {
> +		struct intel_engine_cs *owner;
> +
>   		if (i915_request_completed(rq))
>   			break;
>   
> @@ -390,14 +444,30 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   
>   		GEM_BUG_ON(rq->hw_context->active);
>   
> -		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> -		if (rq_prio(rq) != prio) {
> -			prio = rq_prio(rq);
> -			pl = i915_sched_lookup_priolist(engine, prio);
> -		}
> -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +		/*
> +		 * Push the request back into the queue for later resubmission.
> +		 * If this request is not native to this physical engine (i.e.
> +		 * it came from a virtual source), push it back onto the virtual
> +		 * engine so that it can be moved across onto another physical
> +		 * engine as load dictates.
> +		 */
> +		owner = rq->hw_context->engine;
> +		if (likely(owner == engine)) {
> +			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> +			if (rq_prio(rq) != prio) {
> +				prio = rq_prio(rq);
> +				pl = i915_sched_lookup_priolist(engine, prio);
> +			}
> +			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
>   
> -		list_add(&rq->sched.link, pl);
> +			list_add(&rq->sched.link, pl);
> +		} else {
> +			if (__i915_request_has_started(rq))
> +				rq->sched.attr.priority |= ACTIVE_PRIORITY;
> +
> +			rq->engine = owner;
> +			owner->submit_request(rq);
> +		}
>   
>   		active = rq;
>   	}
> @@ -659,6 +729,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
>   						  execlists));
>   }
>   
> +static void virtual_update_register_offsets(u32 *regs,
> +					    struct intel_engine_cs *engine)
> +{
> +	u32 base = engine->mmio_base;
> +
> +	regs[CTX_CONTEXT_CONTROL] =
> +		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
> +	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
> +	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
> +	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
> +	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
> +
> +	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
> +	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
> +	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
> +	regs[CTX_SECOND_BB_HEAD_U] =
> +		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
> +	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
> +	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
> +
> +	regs[CTX_CTX_TIMESTAMP] =
> +		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
> +	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
> +	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
> +	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
> +	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
> +	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
> +	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
> +	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
> +	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
> +
> +	if (engine->class == RENDER_CLASS) {
> +		regs[CTX_RCS_INDIRECT_CTX] =
> +			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
> +		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
> +			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
> +		regs[CTX_BB_PER_CTX_PTR] =
> +			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
> +
> +		regs[CTX_R_PWR_CLK_STATE] =
> +			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
> +	}
> +}
> +
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -691,6 +805,37 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> +	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +		struct intel_engine_cs *active;
> +
> +		if (!rq) { /* lazily cleanup after another engine handled rq */
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&execlists->virtual);
> +			continue;
> +		}
> +
> +		/*
> +		 * We track when the HW has completed saving the context image
> +		 * (i.e. when we have seen the final CS event switching out of
> +		 * the context) and must not overwrite the context image before
> +		 * then. This restricts us to only using the active engine
> +		 * while the previous virtualized request is inflight (so
> +		 * we reuse the register offsets). This is a very small
> +		 * hystersis on the greedy seelction algorithm.
> +		 */
> +		active = READ_ONCE(ve->context.active);
> +		if (active && active != engine) {
> +			rb = rb_next(rb);
> +			continue;
> +		}
> +
> +		break;
> +	}
> +
>   	if (last) {
>   		/*
>   		 * Don't resubmit or switch until all outstanding
> @@ -712,7 +857,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
>   			return;
>   
> -		if (need_preempt(engine, last)) {
> +		if (need_preempt(engine, last, rb)) {
>   			inject_preempt_context(engine);
>   			return;
>   		}
> @@ -752,6 +897,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		last->tail = last->wa_tail;
>   	}
>   
> +	while (rb) { /* XXX virtual is always taking precedence */
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq;
> +
> +		spin_lock(&ve->base.timeline.lock);
> +
> +		rq = ve->request;
> +		if (unlikely(!rq)) { /* lost the race to a sibling */
> +			spin_unlock(&ve->base.timeline.lock);
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&execlists->virtual);
> +			continue;
> +		}
> +
> +		if (rq_prio(rq) >= queue_prio(execlists)) {
> +			if (last && !can_merge_rq(last, rq)) {
> +				spin_unlock(&ve->base.timeline.lock);
> +				return; /* leave this rq for another engine */

Should this engine attempt to dequeue something else? Maybe something 
from the non-virtual queue for instance could be submitted/appended.

> +			}
> +
> +			GEM_BUG_ON(rq->engine != &ve->base);
> +			ve->request = NULL;
> +			ve->base.execlists.queue_priority_hint = INT_MIN;
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +
> +			GEM_BUG_ON(rq->hw_context != &ve->context);
> +			rq->engine = engine;
> +
> +			if (engine != ve->siblings[0]) {
> +				u32 *regs = ve->context.lrc_reg_state;
> +				unsigned int n;
> +
> +				GEM_BUG_ON(READ_ONCE(ve->context.active));
> +				virtual_update_register_offsets(regs, engine);
> +
> +				/*
> +				 * Move the bound engine to the top of the list
> +				 * for future execution. We then kick this
> +				 * tasklet first before checking others, so that
> +				 * we preferentially reuse this set of bound
> +				 * registers.
> +				 */
> +				for (n = 1; n < ve->count; n++) {
> +					if (ve->siblings[n] == engine) {
> +						swap(ve->siblings[n],
> +						     ve->siblings[0]);
> +						break;
> +					}
> +				}
> +
> +				GEM_BUG_ON(ve->siblings[0] != engine);
> +			}
> +
> +			__i915_request_submit(rq);
> +			trace_i915_request_in(rq, port_index(port, execlists));
> +			submit = true;
> +			last = rq;
> +		}
> +
> +		spin_unlock(&ve->base.timeline.lock);
> +		break;
> +	}
> +
>   	while ((rb = rb_first_cached(&execlists->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
> @@ -971,6 +1182,24 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>   		i915_priolist_free(p);
>   	}
>   
> +	/* Cancel all attached virtual engines */
> +	while ((rb = rb_first_cached(&execlists->virtual))) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +
> +		rb_erase_cached(rb, &execlists->virtual);
> +		RB_CLEAR_NODE(rb);
> +
> +		spin_lock(&ve->base.timeline.lock);
> +		if (ve->request) {
> +			__i915_request_submit(ve->request);
> +			dma_fence_set_error(&ve->request->fence, -EIO);
> +			i915_request_mark_complete(ve->request);
> +			ve->request = NULL;
> +		}
> +		spin_unlock(&ve->base.timeline.lock);
> +	}
> +
>   	/* Remaining _unready_ requests will be nop'ed when submitted */
>   
>   	execlists->queue_priority_hint = INT_MIN;
> @@ -2897,6 +3126,303 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
>   	}
>   }
>   
> +static void virtual_context_destroy(struct kref *kref)
> +{
> +	struct virtual_engine *ve =
> +		container_of(kref, typeof(*ve), context.ref);
> +	unsigned int n;
> +
> +	GEM_BUG_ON(ve->request);
> +	GEM_BUG_ON(ve->context.active);
> +
> +	for (n = 0; n < ve->count; n++) {
> +		struct intel_engine_cs *sibling = ve->siblings[n];
> +		struct rb_node *node = &ve->nodes[sibling->id].rb;
> +
> +		if (RB_EMPTY_NODE(node))
> +			continue;
> +
> +		spin_lock_irq(&sibling->timeline.lock);
> +
> +		if (!RB_EMPTY_NODE(node))
> +			rb_erase_cached(node, &sibling->execlists.virtual);
> +
> +		spin_unlock_irq(&sibling->timeline.lock);
> +	}
> +	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> +
> +	if (ve->context.state)
> +		__execlists_context_fini(&ve->context);
> +
> +	i915_timeline_fini(&ve->base.timeline);
> +	kfree(ve);
> +}
> +
> +static void virtual_engine_initial_hint(struct virtual_engine *ve)
> +{
> +	int swp;
> +
> +	/*
> +	 * Pick a random sibling on starting to help spread the load around.
> +	 *
> +	 * New contexts are typically created with exactly the same order
> +	 * of siblings, and often started in batches. Due to the way we iterate
> +	 * the array of sibling when submitting requests, sibling[0] is
> +	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
> +	 * randomised across the system, we also help spread the load by the
> +	 * first engine we inspect being different each time.
> +	 *
> +	 * NB This does not force us to execute on this engine, it will just
> +	 * typically be the first we inspect for submission.
> +	 */
> +	swp = prandom_u32_max(ve->count);
> +	if (!swp)
> +		return;
> +
> +	swap(ve->siblings[swp], ve->siblings[0]);
> +	virtual_update_register_offsets(ve->context.lrc_reg_state,
> +					ve->siblings[0]);
> +}
> +
> +static int virtual_context_pin(struct intel_context *ce)
> +{
> +	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> +	int err;
> +
> +	/* Note: we must use a real engine class for setting up reg state */
> +	err = __execlists_context_pin(ce, ve->siblings[0]);
> +	if (err)
> +		return err;
> +
> +	virtual_engine_initial_hint(ve);
> +	return 0;
> +}
> +
> +static const struct intel_context_ops virtual_context_ops = {
> +	.pin = virtual_context_pin,
> +	.unpin = execlists_context_unpin,
> +
> +	.destroy = virtual_context_destroy,
> +};
> +
> +static void virtual_submission_tasklet(unsigned long data)
> +{
> +	struct virtual_engine * const ve = (struct virtual_engine *)data;
> +	unsigned int n;
> +	int prio;
> +
> +	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> +	if (prio == INT_MIN)
> +		return;

When does it hit this return?

> +
> +	local_irq_disable();
> +	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> +		struct intel_engine_cs *sibling = ve->siblings[n];
> +		struct ve_node * const node = &ve->nodes[sibling->id];
> +		struct rb_node **parent, *rb;
> +		bool first;
> +
> +		spin_lock(&sibling->timeline.lock);
> +
> +		if (!RB_EMPTY_NODE(&node->rb)) {
> +			/*
> +			 * Cheat and avoid rebalancing the tree if we can
> +			 * reuse this node in situ.
> +			 */
> +			first = rb_first_cached(&sibling->execlists.virtual) ==
> +				&node->rb;
> +			if (prio == node->prio || (prio > node->prio && first))
> +				goto submit_engine;
> +
> +			rb_erase_cached(&node->rb, &sibling->execlists.virtual);

How the cheat works exactly? Avoids inserting into the tree in some 
cases? And how does the real tasklet find this node then?

> +		}
> +
> +		rb = NULL;
> +		first = true;
> +		parent = &sibling->execlists.virtual.rb_root.rb_node;
> +		while (*parent) {
> +			struct ve_node *other;
> +
> +			rb = *parent;
> +			other = rb_entry(rb, typeof(*other), rb);
> +			if (prio > other->prio) {
> +				parent = &rb->rb_left;
> +			} else {
> +				parent = &rb->rb_right;
> +				first = false;
> +			}
> +		}
> +
> +		rb_link_node(&node->rb, rb, parent);
> +		rb_insert_color_cached(&node->rb,
> +				       &sibling->execlists.virtual,
> +				       first);
> +
> +submit_engine:
> +		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> +		node->prio = prio;
> +		if (first && prio > sibling->execlists.queue_priority_hint) {
> +			sibling->execlists.queue_priority_hint = prio;
> +			tasklet_hi_schedule(&sibling->execlists.tasklet);
> +		}
> +
> +		spin_unlock(&sibling->timeline.lock);
> +	}
> +	local_irq_enable();
> +}
> +
> +static void virtual_submit_request(struct i915_request *request)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(request->engine);
> +
> +	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
> +
> +	GEM_BUG_ON(ve->request);
> +	ve->base.execlists.queue_priority_hint = rq_prio(request);
> +	WRITE_ONCE(ve->request, request);
> +
> +	tasklet_schedule(&ve->base.execlists.tasklet);
> +}
> +
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count)
> +{
> +	struct virtual_engine *ve;
> +	unsigned int n;
> +	int err;
> +
> +	if (!count)
> +		return ERR_PTR(-EINVAL);
> +
> +	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> +	if (!ve)
> +		return ERR_PTR(-ENOMEM);
> +
> +	ve->base.i915 = ctx->i915;
> +	ve->base.id = -1;
> +	ve->base.class = OTHER_CLASS;
> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> +
> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> +
> +	err = i915_timeline_init(ctx->i915, &ve->base.timeline, NULL);
> +	if (err)
> +		goto err_put;
> +	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
> +
> +	ve->base.cops = &virtual_context_ops;
> +	ve->base.request_alloc = execlists_request_alloc;
> +
> +	ve->base.schedule = i915_schedule;
> +	ve->base.submit_request = virtual_submit_request;
> +
> +	ve->base.execlists.queue_priority_hint = INT_MIN;
> +	tasklet_init(&ve->base.execlists.tasklet,
> +		     virtual_submission_tasklet,
> +		     (unsigned long)ve);
> +
> +	intel_context_init(&ve->context, ctx, &ve->base);
> +
> +	for (n = 0; n < count; n++) {
> +		struct intel_engine_cs *sibling = siblings[n];
> +
> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> +		if (sibling->mask & ve->base.mask)
> +			continue;

Continuing from the previous round - should we -EINVAL if two of the 
same are found in the map? Since we are going to silently drop it.. 
perhaps better to disallow.

> +
> +		/*
> +		 * The virtual engine implementation is tightly coupled to
> +		 * the execlists backend -- we push out request directly
> +		 * into a tree inside each physical engine. We could support
> +		 * layering if we handling cloning of the requests and
> +		 * submitting a copy into each backend.
> +		 */
> +		if (sibling->execlists.tasklet.func !=
> +		    execlists_submission_tasklet) {
> +			err = -ENODEV;
> +			goto err_put;
> +		}
> +
> +		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
> +		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
> +
> +		ve->siblings[ve->count++] = sibling;
> +		ve->base.mask |= sibling->mask;
> +
> +		/*
> +		 * All physical engines must be compatible for their emission
> +		 * functions (as we build the instructions during request
> +		 * construction and do not alter them before submission
> +		 * on the physical engine). We use the engine class as a guide
> +		 * here, although that could be refined.
> +		 */
> +		if (ve->base.class != OTHER_CLASS) {
> +			if (ve->base.class != sibling->class) {
> +				err = -EINVAL;
> +				goto err_put;
> +			}
> +			continue;
> +		}
> +
> +		ve->base.class = sibling->class;
> +		snprintf(ve->base.name, sizeof(ve->base.name),
> +			 "v%dx%d", ve->base.class, count);
> +		ve->base.context_size = sibling->context_size;
> +
> +		ve->base.emit_bb_start = sibling->emit_bb_start;
> +		ve->base.emit_flush = sibling->emit_flush;
> +		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> +		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
> +		ve->base.emit_fini_breadcrumb_dw =
> +			sibling->emit_fini_breadcrumb_dw;
> +	}
> +
> +	/* gracefully replace a degenerate virtual engine */
> +	if (ve->count == 1) {
> +		struct intel_engine_cs *actual = ve->siblings[0];
> +		intel_context_put(&ve->context);
> +		return actual;
> +	}
> +
> +	__intel_context_insert(ctx, &ve->base, &ve->context);
> +	return &ve->base;
> +
> +err_put:
> +	intel_context_put(&ve->context);
> +	return ERR_PTR(err);
> +}
> +
> +struct intel_engine_cs *
> +intel_execlists_clone_virtual(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *src)
> +{
> +	struct virtual_engine *se = to_virtual_engine(src);
> +	struct intel_engine_cs *dst;
> +
> +	dst = intel_execlists_create_virtual(ctx,
> +					     se->siblings,
> +					     se->count);
> +	if (IS_ERR(dst))
> +		return dst;
> +
> +	return dst;
> +}
> +
> +void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(engine);
> +
> +	if (!engine || !intel_engine_is_virtual(engine))
> +		return;
> +
> +	__intel_context_remove(&ve->context);
> +	intel_context_put(&ve->context);
> +}
> +
>   void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   				   struct drm_printer *m,
>   				   void (*show_request)(struct drm_printer *m,
> @@ -2954,6 +3480,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   		show_request(m, last, "\t\tQ ");
>   	}
>   
> +	last = NULL;
> +	count = 0;
> +	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +
> +		if (rq) {
> +			if (count++ < max - 1)
> +				show_request(m, rq, "\t\tV ");
> +			else
> +				last = rq;
> +		}
> +	}
> +	if (last) {
> +		if (count > max) {
> +			drm_printf(m,
> +				   "\t\t...skipping %d virtual requests...\n",
> +				   count - max);
> +		}
> +		show_request(m, last, "\t\tV ");
> +	}
> +
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f1aec8a6986f..9d90dc68e02b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,6 +112,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   							const char *prefix),
>   				   unsigned int max);
>   
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count);
> +
> +struct intel_engine_cs *
> +intel_execlists_clone_virtual(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *src);
> +
> +void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
> +
>   u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
>   
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 9e871eb0bfb1..6df033960350 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -10,6 +10,7 @@
>   
>   #include "../i915_selftest.h"
>   #include "igt_flush_test.h"
> +#include "igt_live_test.h"
>   #include "igt_spinner.h"
>   #include "i915_random.h"
>   
> @@ -1057,6 +1058,169 @@ static int live_preempt_smoke(void *arg)
>   	return err;
>   }
>   
> +static int nop_virtual_engine(struct drm_i915_private *i915,
> +			      struct intel_engine_cs **siblings,
> +			      unsigned int nsibling,
> +			      unsigned int nctx,
> +			      unsigned int flags)
> +#define CHAIN BIT(0)
> +{
> +	IGT_TIMEOUT(end_time);
> +	struct i915_request *request[16];
> +	struct i915_gem_context *ctx[16];
> +	struct intel_engine_cs *ve[16];
> +	unsigned long n, prime, nc;
> +	struct igt_live_test t;
> +	ktime_t times[2] = {};
> +	int err;
> +
> +	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
> +
> +	for (n = 0; n < nctx; n++) {
> +		ctx[n] = kernel_context(i915);
> +		if (!ctx[n])
> +			return -ENOMEM;
> +
> +		ve[n] = intel_execlists_create_virtual(ctx[n],
> +						       siblings, nsibling);
> +		if (IS_ERR(ve[n]))
> +			return PTR_ERR(ve[n]);
> +	}
> +
> +	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
> +	if (err)
> +		goto out;
> +
> +	for_each_prime_number_from(prime, 1, 8192) {
> +		times[1] = ktime_get_raw();
> +
> +		if (flags & CHAIN) {
> +			for (nc = 0; nc < nctx; nc++) {
> +				for (n = 0; n < prime; n++) {
> +					request[nc] =
> +						i915_request_alloc(ve[nc], ctx[nc]);
> +					if (IS_ERR(request[nc])) {
> +						err = PTR_ERR(request[nc]);
> +						goto out;
> +					}
> +
> +					i915_request_add(request[nc]);
> +				}
> +			}
> +		} else {
> +			for (n = 0; n < prime; n++) {
> +				for (nc = 0; nc < nctx; nc++) {
> +					request[nc] =
> +						i915_request_alloc(ve[nc], ctx[nc]);
> +					if (IS_ERR(request[nc])) {
> +						err = PTR_ERR(request[nc]);
> +						goto out;
> +					}
> +
> +					i915_request_add(request[nc]);
> +				}
> +			}
> +		}
> +
> +		for (nc = 0; nc < nctx; nc++) {
> +			if (i915_request_wait(request[nc],
> +					      I915_WAIT_LOCKED,
> +					      HZ / 10) < 0) {
> +				pr_err("%s(%s): wait for %llx:%lld timed out\n",
> +				       __func__, ve[0]->name,
> +				       request[nc]->fence.context,
> +				       request[nc]->fence.seqno);
> +
> +				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
> +					  __func__, ve[0]->name,
> +					  request[nc]->fence.context,
> +					  request[nc]->fence.seqno);
> +				GEM_TRACE_DUMP();
> +				i915_gem_set_wedged(i915);
> +				break;
> +			}
> +		}
> +
> +		times[1] = ktime_sub(ktime_get_raw(), times[1]);
> +		if (prime == 1)
> +			times[0] = times[1];
> +
> +		if (__igt_timeout(end_time, NULL))
> +			break;
> +	}
> +
> +	err = igt_live_test_end(&t);
> +	if (err)
> +		goto out;
> +
> +	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
> +		nctx, ve[0]->name, ktime_to_ns(times[0]),
> +		prime, div64_u64(ktime_to_ns(times[1]), prime));
> +
> +out:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	for (nc = 0; nc < nctx; nc++) {
> +		intel_virtual_engine_destroy(ve[nc]);
> +		kernel_context_close(ctx[nc]);
> +	}
> +	return err;
> +}
> +
> +static int live_virtual_engine(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	unsigned int class, inst;
> +	int err = -ENODEV;
> +
> +	if (USES_GUC_SUBMISSION(i915))
> +		return 0;
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +
> +	for_each_engine(engine, i915, id) {
> +		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
> +		if (err) {
> +			pr_err("Failed to wrap engine %s: err=%d\n",
> +			       engine->name, err);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> +		int nsibling, n;
> +
> +		nsibling = 0;
> +		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
> +			if (!i915->engine_class[class][inst])
> +				break;
> +
> +			siblings[nsibling++] = i915->engine_class[class][inst];
> +		}
> +		if (nsibling < 2)
> +			continue;
> +
> +		for (n = 1; n <= nsibling + 1; n++) {
> +			err = nop_virtual_engine(i915, siblings, nsibling,
> +						 n, 0);
> +			if (err)
> +				goto out_unlock;
> +		}
> +
> +		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
> +		if (err)
> +			goto out_unlock;
> +	}
> +
> +out_unlock:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +}
> +
>   int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -1068,6 +1232,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
> +		SUBTEST(live_virtual_engine),
>   	};
>   
>   	if (!HAS_EXECLISTS(i915))
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 8ef6d60929c6..9c94c037d13b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -127,6 +127,7 @@ enum drm_i915_gem_engine_class {
>   };
>   
>   #define I915_ENGINE_CLASS_INVALID_NONE -1
> +#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
>   
>   /**
>    * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
> @@ -1598,8 +1599,37 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +/*
> + * i915_context_engines_load_balance:
> + *
> + * Enable load balancing across this set of engines.
> + *
> + * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
> + * used will proxy the execbuffer request onto one of the set of engines
> + * in such a way as to distribute the load evenly across the set.
> + *
> + * The set of engines must be compatible (e.g. the same HW class) as they
> + * will share the same logical GPU context and ring.
> + *
> + * To intermix rendering with the virtual engine and direct rendering onto
> + * the backing engines (bypassing the load balancing proxy), the context must
> + * be defined to use a single timeline for all engines.
> + */
> +struct i915_context_engines_load_balance {
> +	struct i915_user_extension base;
> +
> +	__u16 engine_index;
> +	__u16 mbz16; /* reserved for future use; must be zero */
> +	__u32 flags; /* all undefined flags must be zero */
> +
> +	__u64 engines_mask; /* selection mask of engines[] */
> +
> +	__u64 mbz64[4]; /* reserved for future use; must be zero */
> +};
> +
>   struct i915_context_param_engines {
>   	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
> +#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
>   
>   	struct {
>   		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 02/18] drm/i915: Flush pages on acquisition
  2019-03-20 12:35         ` Chris Wilson
  2019-03-20 14:24           ` Matthew Auld
@ 2019-03-21  0:16           ` Chris Wilson
  1 sibling, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-21  0:16 UTC (permalink / raw)
  To: Matthew Auld; +Cc: Intel Graphics Development

Quoting Chris Wilson (2019-03-20 12:35:51)
> The cost of the extra flushes is a worry, but not enough for me to be
> concerned about. I think the convention that get_pages == coherent on
> gpu improves quite a bit of our internal rummaging around and prevents
> the ABI nightmare of mmap_gtt/mmap_offset. Will this flush remain inside
> set_pages()? No, I don't think it will as pushing it into the callers
> outside of the mm.lock itself makes sense, but I didn't think that was
> of paramount importance compared to the uABI and can be done later.

Fwiw, the proxy object is one where we do want to move the clflush into
the caller (to avoid it on the proxy object)! So, it turns out to be
something I can't indefinitely postpone. (But that object also
desperately wants to break away from the whole object at once design.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation
  2019-03-20 13:13   ` Tvrtko Ursulin
@ 2019-03-21 14:38     ` Chris Wilson
  2019-03-21 15:19       ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-21 14:38 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-20 13:13:45)
> 
> On 19/03/2019 11:57, Chris Wilson wrote:
> > A usecase arose out of handling context recovery in mesa, whereby they
> > wish to recreate a context with fresh logical state but preserving all
> > other details of the original. Currently, they create a new context and
> > iterate over which bits they want to copy across, but it would much more
> > convenient if they were able to just pass in a target context to clone
> > during creation. This essentially extends the setparam during creation
> > to pull the details from a target context instead of the user supplied
> > parameters.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c | 154 ++++++++++++++++++++++++
> >   include/uapi/drm/i915_drm.h             |  14 +++
> >   2 files changed, 168 insertions(+)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index fc1f64e19507..f36648329074 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -1500,8 +1500,162 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
> >       return ctx_setparam(arg->ctx, &local.param);
> >   }
> >   
> > +static int clone_flags(struct i915_gem_context *dst,
> > +                    struct i915_gem_context *src)
> > +{
> > +     dst->user_flags = src->user_flags;
> > +     return 0;
> > +}
> > +
> > +static int clone_schedattr(struct i915_gem_context *dst,
> > +                        struct i915_gem_context *src)
> > +{
> > +     dst->sched = src->sched;
> > +     return 0;
> > +}
> > +
> > +static int clone_sseu(struct i915_gem_context *dst,
> > +                   struct i915_gem_context *src)
> > +{
> > +     const struct intel_sseu default_sseu =
> > +             intel_device_default_sseu(dst->i915);
> > +     struct intel_engine_cs *engine;
> > +     enum intel_engine_id id;
> > +
> > +     for_each_engine(engine, dst->i915, id) {
> 
> Hm in the load balancing patch this needs to be extended so the veng ce 
> is also handled here.
> 
> Possibly even when adding engine map the loop needs to iterate the map 
> and not for_each_engine?

One problem is that it is hard to match a veng in one context with
another context, there may even be several :|

And then in clone_engines, we create a fresh virtual engine. So a nasty
interoperation with clone_engines.

Bleugh.

> > +             struct intel_context *ce;
> > +             struct intel_sseu sseu;
> > +
> > +             ce = intel_context_lookup(src, engine);
> > +             if (!ce)
> > +                     continue;
> > +
> > +             sseu = ce->sseu;
> > +             if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
> 
> Could memcmp against &ce->sseu directly and keep src_ce and dst_ce so 
> you can copy over without a temporary copy on stack?

At one point, the locking favoured making a local sseu to avoid
overlapping locks. Hmm, sseu = ce->sseu could still tear. Pedantically
that copy should be locked.

> > +                     continue;
> > +
> > +             ce = intel_context_pin_lock(dst, engine);
> > +             if (IS_ERR(ce))
> > +                     return PTR_ERR(ce);
> > +
> > +             ce->sseu = sseu;
> > +             intel_context_pin_unlock(ce);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int clone_timeline(struct i915_gem_context *dst,
> > +                       struct i915_gem_context *src)
> > +{
> > +     if (src->timeline) {
> > +             GEM_BUG_ON(src->timeline == dst->timeline);
> > +
> > +             if (dst->timeline)
> > +                     i915_timeline_put(dst->timeline);
> > +             dst->timeline = i915_timeline_get(src->timeline);
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int clone_vm(struct i915_gem_context *dst,
> > +                 struct i915_gem_context *src)
> > +{
> > +     struct i915_hw_ppgtt *ppgtt;
> > +
> > +     rcu_read_lock();
> > +     do {
> > +             ppgtt = READ_ONCE(src->ppgtt);
> > +             if (!ppgtt)
> > +                     break;
> > +
> > +             if (!kref_get_unless_zero(&ppgtt->ref))
> > +                     continue;
> > +
> > +             /*
> > +              * This ppgtt may have be reallocated between
> > +              * the read and the kref, and reassigned to a third
> > +              * context. In order to avoid inadvertent sharing
> > +              * of this ppgtt with that third context (and not
> > +              * src), we have to confirm that we have the same
> > +              * ppgtt after passing through the strong memory
> > +              * barrier implied by a successful
> > +              * kref_get_unless_zero().
> > +              *
> > +              * Once we have acquired the current ppgtt of src,
> > +              * we no longer care if it is released from src, as
> > +              * it cannot be reallocated elsewhere.
> > +              */
> > +
> > +             if (ppgtt == READ_ONCE(src->ppgtt))
> > +                     break;
> > +
> > +             i915_ppgtt_put(ppgtt);
> > +     } while (1);
> > +     rcu_read_unlock();
> 
> I still have the same problem. What if you added here:
> 
> GEM_BUG_ON(ppgtt != READ_ONCE(src->ppgtt));
> 
> Could it trigger? If so what is the point in the last check in the loop 
> above?

Yes, it can trigger, as there is no outer mutex guarding the assignment
of src->ppgtt with our read. And that is why the check has to exist --
because it can be reassigned during the first read and before we acquire
the kref, and so ppgtt may have been freed and then reallocated and
assigned to a new ctx during that interval. We don't care that
src->ppgtt gets updated after we have taken a copy, we care that ppgtt
may get reused on another ctx entirely.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/18] drm/i915: Load balancing across a virtual engine
  2019-03-20 15:59   ` Tvrtko Ursulin
@ 2019-03-21 15:00     ` Chris Wilson
  2019-03-21 15:13       ` Tvrtko Ursulin
  0 siblings, 1 reply; 47+ messages in thread
From: Chris Wilson @ 2019-03-21 15:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-20 15:59:14)
> 
> On 19/03/2019 11:57, Chris Wilson wrote:
> >   static void execlists_dequeue(struct intel_engine_cs *engine)
> >   {
> >       struct intel_engine_execlists * const execlists = &engine->execlists;
> > @@ -691,6 +805,37 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >        * and context switches) submission.
> >        */
> >   
> > +     for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +             struct i915_request *rq = READ_ONCE(ve->request);
> > +             struct intel_engine_cs *active;
> > +
> > +             if (!rq) { /* lazily cleanup after another engine handled rq */
> > +                     rb_erase_cached(rb, &execlists->virtual);
> > +                     RB_CLEAR_NODE(rb);
> > +                     rb = rb_first_cached(&execlists->virtual);
> > +                     continue;
> > +             }
> > +
> > +             /*
> > +              * We track when the HW has completed saving the context image
> > +              * (i.e. when we have seen the final CS event switching out of
> > +              * the context) and must not overwrite the context image before
> > +              * then. This restricts us to only using the active engine
> > +              * while the previous virtualized request is inflight (so
> > +              * we reuse the register offsets). This is a very small
> > +              * hystersis on the greedy seelction algorithm.
> > +              */
> > +             active = READ_ONCE(ve->context.active);
> > +             if (active && active != engine) {
> > +                     rb = rb_next(rb);
> > +                     continue;
> > +             }
> > +
> > +             break;
> > +     }
> > +
> >       if (last) {
> >               /*
> >                * Don't resubmit or switch until all outstanding
> > @@ -712,7 +857,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> >                       return;
> >   
> > -             if (need_preempt(engine, last)) {
> > +             if (need_preempt(engine, last, rb)) {
> >                       inject_preempt_context(engine);
> >                       return;
> >               }
> > @@ -752,6 +897,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               last->tail = last->wa_tail;
> >       }
> >   
> > +     while (rb) { /* XXX virtual is always taking precedence */
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +             struct i915_request *rq;
> > +
> > +             spin_lock(&ve->base.timeline.lock);
> > +
> > +             rq = ve->request;
> > +             if (unlikely(!rq)) { /* lost the race to a sibling */
> > +                     spin_unlock(&ve->base.timeline.lock);
> > +                     rb_erase_cached(rb, &execlists->virtual);
> > +                     RB_CLEAR_NODE(rb);
> > +                     rb = rb_first_cached(&execlists->virtual);
> > +                     continue;
> > +             }
> > +
> > +             if (rq_prio(rq) >= queue_prio(execlists)) {
> > +                     if (last && !can_merge_rq(last, rq)) {
> > +                             spin_unlock(&ve->base.timeline.lock);
> > +                             return; /* leave this rq for another engine */
> 
> Should this engine attempt to dequeue something else? Maybe something 
> from the non-virtual queue for instance could be submitted/appended.

We can't pick another virtual request for this engine as we run the risk
of priority inversion (and actually scheduling something that depends on
the request we skip over, although that is excluded at the moment by
virtue of only allowing completion fences between virtual engines, that
is something that we may be able to eliminate. Hmm.). If we give up on
inserting a virtual request at all and skip onto the regular dequeue, we
end up with a bubble and worst case would be we never allow a virtual
request onto this engine (as we keep it busy and last always active).

Can you say "What do we want? A timeslicing scheduler!".

> > +                     }
> > +
> > +                     GEM_BUG_ON(rq->engine != &ve->base);
> > +                     ve->request = NULL;
> > +                     ve->base.execlists.queue_priority_hint = INT_MIN;
> > +                     rb_erase_cached(rb, &execlists->virtual);
> > +                     RB_CLEAR_NODE(rb);
> > +
> > +                     GEM_BUG_ON(rq->hw_context != &ve->context);
> > +                     rq->engine = engine;
> > +
> > +                     if (engine != ve->siblings[0]) {
> > +                             u32 *regs = ve->context.lrc_reg_state;
> > +                             unsigned int n;
> > +
> > +                             GEM_BUG_ON(READ_ONCE(ve->context.active));
> > +                             virtual_update_register_offsets(regs, engine);
> > +
> > +                             /*
> > +                              * Move the bound engine to the top of the list
> > +                              * for future execution. We then kick this
> > +                              * tasklet first before checking others, so that
> > +                              * we preferentially reuse this set of bound
> > +                              * registers.
> > +                              */
> > +                             for (n = 1; n < ve->count; n++) {
> > +                                     if (ve->siblings[n] == engine) {
> > +                                             swap(ve->siblings[n],
> > +                                                  ve->siblings[0]);
> > +                                             break;
> > +                                     }
> > +                             }
> > +
> > +                             GEM_BUG_ON(ve->siblings[0] != engine);
> > +                     }
> > +
> > +                     __i915_request_submit(rq);
> > +                     trace_i915_request_in(rq, port_index(port, execlists));
> > +                     submit = true;
> > +                     last = rq;
> > +             }
> > +
> > +             spin_unlock(&ve->base.timeline.lock);
> > +             break;
> > +     }
> > +
> >       while ((rb = rb_first_cached(&execlists->queue))) {
> >               struct i915_priolist *p = to_priolist(rb);
> >               struct i915_request *rq, *rn;
> > @@ -971,6 +1182,24 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
> >               i915_priolist_free(p);
> >       }
> >   
> > +     /* Cancel all attached virtual engines */
> > +     while ((rb = rb_first_cached(&execlists->virtual))) {
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +
> > +             rb_erase_cached(rb, &execlists->virtual);
> > +             RB_CLEAR_NODE(rb);
> > +
> > +             spin_lock(&ve->base.timeline.lock);
> > +             if (ve->request) {
> > +                     __i915_request_submit(ve->request);
> > +                     dma_fence_set_error(&ve->request->fence, -EIO);
> > +                     i915_request_mark_complete(ve->request);
> > +                     ve->request = NULL;
> > +             }
> > +             spin_unlock(&ve->base.timeline.lock);
> > +     }
> > +
> >       /* Remaining _unready_ requests will be nop'ed when submitted */
> >   
> >       execlists->queue_priority_hint = INT_MIN;
> > @@ -2897,6 +3126,303 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
> >       }
> >   }
> >   
> > +static void virtual_context_destroy(struct kref *kref)
> > +{
> > +     struct virtual_engine *ve =
> > +             container_of(kref, typeof(*ve), context.ref);
> > +     unsigned int n;
> > +
> > +     GEM_BUG_ON(ve->request);
> > +     GEM_BUG_ON(ve->context.active);
> > +
> > +     for (n = 0; n < ve->count; n++) {
> > +             struct intel_engine_cs *sibling = ve->siblings[n];
> > +             struct rb_node *node = &ve->nodes[sibling->id].rb;
> > +
> > +             if (RB_EMPTY_NODE(node))
> > +                     continue;
> > +
> > +             spin_lock_irq(&sibling->timeline.lock);
> > +
> > +             if (!RB_EMPTY_NODE(node))
> > +                     rb_erase_cached(node, &sibling->execlists.virtual);
> > +
> > +             spin_unlock_irq(&sibling->timeline.lock);
> > +     }
> > +     GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> > +
> > +     if (ve->context.state)
> > +             __execlists_context_fini(&ve->context);
> > +
> > +     i915_timeline_fini(&ve->base.timeline);
> > +     kfree(ve);
> > +}
> > +
> > +static void virtual_engine_initial_hint(struct virtual_engine *ve)
> > +{
> > +     int swp;
> > +
> > +     /*
> > +      * Pick a random sibling on starting to help spread the load around.
> > +      *
> > +      * New contexts are typically created with exactly the same order
> > +      * of siblings, and often started in batches. Due to the way we iterate
> > +      * the array of sibling when submitting requests, sibling[0] is
> > +      * prioritised for dequeuing. If we make sure that sibling[0] is fairly
> > +      * randomised across the system, we also help spread the load by the
> > +      * first engine we inspect being different each time.
> > +      *
> > +      * NB This does not force us to execute on this engine, it will just
> > +      * typically be the first we inspect for submission.
> > +      */
> > +     swp = prandom_u32_max(ve->count);
> > +     if (!swp)
> > +             return;
> > +
> > +     swap(ve->siblings[swp], ve->siblings[0]);
> > +     virtual_update_register_offsets(ve->context.lrc_reg_state,
> > +                                     ve->siblings[0]);
> > +}
> > +
> > +static int virtual_context_pin(struct intel_context *ce)
> > +{
> > +     struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> > +     int err;
> > +
> > +     /* Note: we must use a real engine class for setting up reg state */
> > +     err = __execlists_context_pin(ce, ve->siblings[0]);
> > +     if (err)
> > +             return err;
> > +
> > +     virtual_engine_initial_hint(ve);
> > +     return 0;
> > +}
> > +
> > +static const struct intel_context_ops virtual_context_ops = {
> > +     .pin = virtual_context_pin,
> > +     .unpin = execlists_context_unpin,
> > +
> > +     .destroy = virtual_context_destroy,
> > +};
> > +
> > +static void virtual_submission_tasklet(unsigned long data)
> > +{
> > +     struct virtual_engine * const ve = (struct virtual_engine *)data;
> > +     unsigned int n;
> > +     int prio;
> > +
> > +     prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> > +     if (prio == INT_MIN)
> > +             return;
> 
> When does it hit this return?

If the tasklet runs when I don't expect it to. Should be never indeed.
At least with bonding it becomes something a bit more tangible.

> > +     local_irq_disable();
> > +     for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> > +             struct intel_engine_cs *sibling = ve->siblings[n];
> > +             struct ve_node * const node = &ve->nodes[sibling->id];
> > +             struct rb_node **parent, *rb;
> > +             bool first;
> > +
> > +             spin_lock(&sibling->timeline.lock);
> > +
> > +             if (!RB_EMPTY_NODE(&node->rb)) {
> > +                     /*
> > +                      * Cheat and avoid rebalancing the tree if we can
> > +                      * reuse this node in situ.
> > +                      */
> > +                     first = rb_first_cached(&sibling->execlists.virtual) ==
> > +                             &node->rb;
> > +                     if (prio == node->prio || (prio > node->prio && first))
> > +                             goto submit_engine;
> > +
> > +                     rb_erase_cached(&node->rb, &sibling->execlists.virtual);
> 
> How the cheat works exactly? Avoids inserting into the tree in some 
> cases? And how does the real tasklet find this node then?

It's already in the sibling->execlists.virtual, so we don't need to
remove the node and reinsert it. So when we kick the sibling's tasklet
it is right there.

> > +             rb = NULL;
> > +             first = true;
> > +             parent = &sibling->execlists.virtual.rb_root.rb_node;
> > +             while (*parent) {
> > +                     struct ve_node *other;
> > +
> > +                     rb = *parent;
> > +                     other = rb_entry(rb, typeof(*other), rb);
> > +                     if (prio > other->prio) {
> > +                             parent = &rb->rb_left;
> > +                     } else {
> > +                             parent = &rb->rb_right;
> > +                             first = false;
> > +                     }
> > +             }
> > +
> > +             rb_link_node(&node->rb, rb, parent);
> > +             rb_insert_color_cached(&node->rb,
> > +                                    &sibling->execlists.virtual,
> > +                                    first);
> > +
> > +submit_engine:
> > +             GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> > +             node->prio = prio;
> > +             if (first && prio > sibling->execlists.queue_priority_hint) {
> > +                     sibling->execlists.queue_priority_hint = prio;
> > +                     tasklet_hi_schedule(&sibling->execlists.tasklet);
> > +             }
> > +
> > +             spin_unlock(&sibling->timeline.lock);
> > +     }
> > +     local_irq_enable();
> > +}
> > +
> > +static void virtual_submit_request(struct i915_request *request)
> > +{
> > +     struct virtual_engine *ve = to_virtual_engine(request->engine);
> > +
> > +     GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
> > +
> > +     GEM_BUG_ON(ve->request);
> > +     ve->base.execlists.queue_priority_hint = rq_prio(request);
> > +     WRITE_ONCE(ve->request, request);
> > +
> > +     tasklet_schedule(&ve->base.execlists.tasklet);
> > +}
> > +
> > +struct intel_engine_cs *
> > +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> > +                            struct intel_engine_cs **siblings,
> > +                            unsigned int count)
> > +{
> > +     struct virtual_engine *ve;
> > +     unsigned int n;
> > +     int err;
> > +
> > +     if (!count)
> > +             return ERR_PTR(-EINVAL);
> > +
> > +     ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> > +     if (!ve)
> > +             return ERR_PTR(-ENOMEM);
> > +
> > +     ve->base.i915 = ctx->i915;
> > +     ve->base.id = -1;
> > +     ve->base.class = OTHER_CLASS;
> > +     ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> > +     ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> > +     ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> > +
> > +     snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> > +
> > +     err = i915_timeline_init(ctx->i915, &ve->base.timeline, NULL);
> > +     if (err)
> > +             goto err_put;
> > +     i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
> > +
> > +     ve->base.cops = &virtual_context_ops;
> > +     ve->base.request_alloc = execlists_request_alloc;
> > +
> > +     ve->base.schedule = i915_schedule;
> > +     ve->base.submit_request = virtual_submit_request;
> > +
> > +     ve->base.execlists.queue_priority_hint = INT_MIN;
> > +     tasklet_init(&ve->base.execlists.tasklet,
> > +                  virtual_submission_tasklet,
> > +                  (unsigned long)ve);
> > +
> > +     intel_context_init(&ve->context, ctx, &ve->base);
> > +
> > +     for (n = 0; n < count; n++) {
> > +             struct intel_engine_cs *sibling = siblings[n];
> > +
> > +             GEM_BUG_ON(!is_power_of_2(sibling->mask));
> > +             if (sibling->mask & ve->base.mask)
> > +                     continue;
> 
> Continuing from the previous round - should we -EINVAL if two of the 
> same are found in the map? Since we are going to silently drop it.. 
> perhaps better to disallow.

Could do. I have no strong use case that expects to be able to handle
the user passing in (vcs1, vcs2, vcs2).

The really important question, is do we always create a virtual engine
even wrapping a single physical engine.

The more I ask myself, the answer is yes. (Primarily so that we can
create multiple instances of the same engine with different logical
contexts and sseus. Oh flip, i915_perf needs updating to find virtual
contexts.). It's just that submitting a stream of nops to a virtual engine
is 3x as expensive as a real engine.

It's just that you mentioned that userspace ended up wrapping everything
inside a virtual engine willy-nilly, that spells trouble.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/18] drm/i915: Load balancing across a virtual engine
  2019-03-21 15:00     ` Chris Wilson
@ 2019-03-21 15:13       ` Tvrtko Ursulin
  2019-03-21 15:28         ` Chris Wilson
  0 siblings, 1 reply; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-21 15:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/03/2019 15:00, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-20 15:59:14)
>>
>> On 19/03/2019 11:57, Chris Wilson wrote:
>>>    static void execlists_dequeue(struct intel_engine_cs *engine)
>>>    {
>>>        struct intel_engine_execlists * const execlists = &engine->execlists;
>>> @@ -691,6 +805,37 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>         * and context switches) submission.
>>>         */
>>>    
>>> +     for (rb = rb_first_cached(&execlists->virtual); rb; ) {
>>> +             struct virtual_engine *ve =
>>> +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
>>> +             struct i915_request *rq = READ_ONCE(ve->request);
>>> +             struct intel_engine_cs *active;
>>> +
>>> +             if (!rq) { /* lazily cleanup after another engine handled rq */
>>> +                     rb_erase_cached(rb, &execlists->virtual);
>>> +                     RB_CLEAR_NODE(rb);
>>> +                     rb = rb_first_cached(&execlists->virtual);
>>> +                     continue;
>>> +             }
>>> +
>>> +             /*
>>> +              * We track when the HW has completed saving the context image
>>> +              * (i.e. when we have seen the final CS event switching out of
>>> +              * the context) and must not overwrite the context image before
>>> +              * then. This restricts us to only using the active engine
>>> +              * while the previous virtualized request is inflight (so
>>> +              * we reuse the register offsets). This is a very small
>>> +              * hystersis on the greedy seelction algorithm.
>>> +              */
>>> +             active = READ_ONCE(ve->context.active);
>>> +             if (active && active != engine) {
>>> +                     rb = rb_next(rb);
>>> +                     continue;
>>> +             }
>>> +
>>> +             break;
>>> +     }
>>> +
>>>        if (last) {
>>>                /*
>>>                 * Don't resubmit or switch until all outstanding
>>> @@ -712,7 +857,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
>>>                        return;
>>>    
>>> -             if (need_preempt(engine, last)) {
>>> +             if (need_preempt(engine, last, rb)) {
>>>                        inject_preempt_context(engine);
>>>                        return;
>>>                }
>>> @@ -752,6 +897,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>>>                last->tail = last->wa_tail;
>>>        }
>>>    
>>> +     while (rb) { /* XXX virtual is always taking precedence */
>>> +             struct virtual_engine *ve =
>>> +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
>>> +             struct i915_request *rq;
>>> +
>>> +             spin_lock(&ve->base.timeline.lock);
>>> +
>>> +             rq = ve->request;
>>> +             if (unlikely(!rq)) { /* lost the race to a sibling */
>>> +                     spin_unlock(&ve->base.timeline.lock);
>>> +                     rb_erase_cached(rb, &execlists->virtual);
>>> +                     RB_CLEAR_NODE(rb);
>>> +                     rb = rb_first_cached(&execlists->virtual);
>>> +                     continue;
>>> +             }
>>> +
>>> +             if (rq_prio(rq) >= queue_prio(execlists)) {
>>> +                     if (last && !can_merge_rq(last, rq)) {
>>> +                             spin_unlock(&ve->base.timeline.lock);
>>> +                             return; /* leave this rq for another engine */
>>
>> Should this engine attempt to dequeue something else? Maybe something
>> from the non-virtual queue for instance could be submitted/appended.
> 
> We can't pick another virtual request for this engine as we run the risk
> of priority inversion (and actually scheduling something that depends on
> the request we skip over, although that is excluded at the moment by
> virtue of only allowing completion fences between virtual engines, that
> is something that we may be able to eliminate. Hmm.). If we give up on
> inserting a virtual request at all and skip onto the regular dequeue, we
> end up with a bubble and worst case would be we never allow a virtual
> request onto this engine (as we keep it busy and last always active).
> 
> Can you say "What do we want? A timeslicing scheduler!".

Makes sense.

>>> +                     }
>>> +
>>> +                     GEM_BUG_ON(rq->engine != &ve->base);
>>> +                     ve->request = NULL;
>>> +                     ve->base.execlists.queue_priority_hint = INT_MIN;
>>> +                     rb_erase_cached(rb, &execlists->virtual);
>>> +                     RB_CLEAR_NODE(rb);
>>> +
>>> +                     GEM_BUG_ON(rq->hw_context != &ve->context);
>>> +                     rq->engine = engine;
>>> +
>>> +                     if (engine != ve->siblings[0]) {
>>> +                             u32 *regs = ve->context.lrc_reg_state;
>>> +                             unsigned int n;
>>> +
>>> +                             GEM_BUG_ON(READ_ONCE(ve->context.active));
>>> +                             virtual_update_register_offsets(regs, engine);
>>> +
>>> +                             /*
>>> +                              * Move the bound engine to the top of the list
>>> +                              * for future execution. We then kick this
>>> +                              * tasklet first before checking others, so that
>>> +                              * we preferentially reuse this set of bound
>>> +                              * registers.
>>> +                              */
>>> +                             for (n = 1; n < ve->count; n++) {
>>> +                                     if (ve->siblings[n] == engine) {
>>> +                                             swap(ve->siblings[n],
>>> +                                                  ve->siblings[0]);
>>> +                                             break;
>>> +                                     }
>>> +                             }
>>> +
>>> +                             GEM_BUG_ON(ve->siblings[0] != engine);
>>> +                     }
>>> +
>>> +                     __i915_request_submit(rq);
>>> +                     trace_i915_request_in(rq, port_index(port, execlists));
>>> +                     submit = true;
>>> +                     last = rq;
>>> +             }
>>> +
>>> +             spin_unlock(&ve->base.timeline.lock);
>>> +             break;
>>> +     }
>>> +
>>>        while ((rb = rb_first_cached(&execlists->queue))) {
>>>                struct i915_priolist *p = to_priolist(rb);
>>>                struct i915_request *rq, *rn;
>>> @@ -971,6 +1182,24 @@ static void execlists_cancel_requests(struct intel_engine_cs *engine)
>>>                i915_priolist_free(p);
>>>        }
>>>    
>>> +     /* Cancel all attached virtual engines */
>>> +     while ((rb = rb_first_cached(&execlists->virtual))) {
>>> +             struct virtual_engine *ve =
>>> +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
>>> +
>>> +             rb_erase_cached(rb, &execlists->virtual);
>>> +             RB_CLEAR_NODE(rb);
>>> +
>>> +             spin_lock(&ve->base.timeline.lock);
>>> +             if (ve->request) {
>>> +                     __i915_request_submit(ve->request);
>>> +                     dma_fence_set_error(&ve->request->fence, -EIO);
>>> +                     i915_request_mark_complete(ve->request);
>>> +                     ve->request = NULL;
>>> +             }
>>> +             spin_unlock(&ve->base.timeline.lock);
>>> +     }
>>> +
>>>        /* Remaining _unready_ requests will be nop'ed when submitted */
>>>    
>>>        execlists->queue_priority_hint = INT_MIN;
>>> @@ -2897,6 +3126,303 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
>>>        }
>>>    }
>>>    
>>> +static void virtual_context_destroy(struct kref *kref)
>>> +{
>>> +     struct virtual_engine *ve =
>>> +             container_of(kref, typeof(*ve), context.ref);
>>> +     unsigned int n;
>>> +
>>> +     GEM_BUG_ON(ve->request);
>>> +     GEM_BUG_ON(ve->context.active);
>>> +
>>> +     for (n = 0; n < ve->count; n++) {
>>> +             struct intel_engine_cs *sibling = ve->siblings[n];
>>> +             struct rb_node *node = &ve->nodes[sibling->id].rb;
>>> +
>>> +             if (RB_EMPTY_NODE(node))
>>> +                     continue;
>>> +
>>> +             spin_lock_irq(&sibling->timeline.lock);
>>> +
>>> +             if (!RB_EMPTY_NODE(node))
>>> +                     rb_erase_cached(node, &sibling->execlists.virtual);
>>> +
>>> +             spin_unlock_irq(&sibling->timeline.lock);
>>> +     }
>>> +     GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
>>> +
>>> +     if (ve->context.state)
>>> +             __execlists_context_fini(&ve->context);
>>> +
>>> +     i915_timeline_fini(&ve->base.timeline);
>>> +     kfree(ve);
>>> +}
>>> +
>>> +static void virtual_engine_initial_hint(struct virtual_engine *ve)
>>> +{
>>> +     int swp;
>>> +
>>> +     /*
>>> +      * Pick a random sibling on starting to help spread the load around.
>>> +      *
>>> +      * New contexts are typically created with exactly the same order
>>> +      * of siblings, and often started in batches. Due to the way we iterate
>>> +      * the array of sibling when submitting requests, sibling[0] is
>>> +      * prioritised for dequeuing. If we make sure that sibling[0] is fairly
>>> +      * randomised across the system, we also help spread the load by the
>>> +      * first engine we inspect being different each time.
>>> +      *
>>> +      * NB This does not force us to execute on this engine, it will just
>>> +      * typically be the first we inspect for submission.
>>> +      */
>>> +     swp = prandom_u32_max(ve->count);
>>> +     if (!swp)
>>> +             return;
>>> +
>>> +     swap(ve->siblings[swp], ve->siblings[0]);
>>> +     virtual_update_register_offsets(ve->context.lrc_reg_state,
>>> +                                     ve->siblings[0]);
>>> +}
>>> +
>>> +static int virtual_context_pin(struct intel_context *ce)
>>> +{
>>> +     struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
>>> +     int err;
>>> +
>>> +     /* Note: we must use a real engine class for setting up reg state */
>>> +     err = __execlists_context_pin(ce, ve->siblings[0]);
>>> +     if (err)
>>> +             return err;
>>> +
>>> +     virtual_engine_initial_hint(ve);
>>> +     return 0;
>>> +}
>>> +
>>> +static const struct intel_context_ops virtual_context_ops = {
>>> +     .pin = virtual_context_pin,
>>> +     .unpin = execlists_context_unpin,
>>> +
>>> +     .destroy = virtual_context_destroy,
>>> +};
>>> +
>>> +static void virtual_submission_tasklet(unsigned long data)
>>> +{
>>> +     struct virtual_engine * const ve = (struct virtual_engine *)data;
>>> +     unsigned int n;
>>> +     int prio;
>>> +
>>> +     prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
>>> +     if (prio == INT_MIN)
>>> +             return;
>>
>> When does it hit this return?
> 
> If the tasklet runs when I don't expect it to. Should be never indeed.

GEM_BUG_ON then, or a GEM_WARN_ON to start with?

> At least with bonding it becomes something a bit more tangible.

Hmm how so?

>>> +     local_irq_disable();
>>> +     for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
>>> +             struct intel_engine_cs *sibling = ve->siblings[n];
>>> +             struct ve_node * const node = &ve->nodes[sibling->id];
>>> +             struct rb_node **parent, *rb;
>>> +             bool first;
>>> +
>>> +             spin_lock(&sibling->timeline.lock);
>>> +
>>> +             if (!RB_EMPTY_NODE(&node->rb)) {
>>> +                     /*
>>> +                      * Cheat and avoid rebalancing the tree if we can
>>> +                      * reuse this node in situ.
>>> +                      */
>>> +                     first = rb_first_cached(&sibling->execlists.virtual) ==
>>> +                             &node->rb;
>>> +                     if (prio == node->prio || (prio > node->prio && first))
>>> +                             goto submit_engine;
>>> +
>>> +                     rb_erase_cached(&node->rb, &sibling->execlists.virtual);
>>
>> How the cheat works exactly? Avoids inserting into the tree in some
>> cases? And how does the real tasklet find this node then?
> 
> It's already in the sibling->execlists.virtual, so we don't need to
> remove the node and reinsert it. So when we kick the sibling's tasklet
> it is right there.

So the cheat bit is just the prio and first check and the erase on 
non-empty node is normal operation?

>>> +             rb = NULL;
>>> +             first = true;
>>> +             parent = &sibling->execlists.virtual.rb_root.rb_node;
>>> +             while (*parent) {
>>> +                     struct ve_node *other;
>>> +
>>> +                     rb = *parent;
>>> +                     other = rb_entry(rb, typeof(*other), rb);
>>> +                     if (prio > other->prio) {
>>> +                             parent = &rb->rb_left;
>>> +                     } else {
>>> +                             parent = &rb->rb_right;
>>> +                             first = false;
>>> +                     }
>>> +             }
>>> +
>>> +             rb_link_node(&node->rb, rb, parent);
>>> +             rb_insert_color_cached(&node->rb,
>>> +                                    &sibling->execlists.virtual,
>>> +                                    first);
>>> +
>>> +submit_engine:
>>> +             GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
>>> +             node->prio = prio;
>>> +             if (first && prio > sibling->execlists.queue_priority_hint) {
>>> +                     sibling->execlists.queue_priority_hint = prio;
>>> +                     tasklet_hi_schedule(&sibling->execlists.tasklet);
>>> +             }
>>> +
>>> +             spin_unlock(&sibling->timeline.lock);
>>> +     }
>>> +     local_irq_enable();
>>> +}
>>> +
>>> +static void virtual_submit_request(struct i915_request *request)
>>> +{
>>> +     struct virtual_engine *ve = to_virtual_engine(request->engine);
>>> +
>>> +     GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
>>> +
>>> +     GEM_BUG_ON(ve->request);
>>> +     ve->base.execlists.queue_priority_hint = rq_prio(request);
>>> +     WRITE_ONCE(ve->request, request);
>>> +
>>> +     tasklet_schedule(&ve->base.execlists.tasklet);
>>> +}
>>> +
>>> +struct intel_engine_cs *
>>> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
>>> +                            struct intel_engine_cs **siblings,
>>> +                            unsigned int count)
>>> +{
>>> +     struct virtual_engine *ve;
>>> +     unsigned int n;
>>> +     int err;
>>> +
>>> +     if (!count)
>>> +             return ERR_PTR(-EINVAL);
>>> +
>>> +     ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
>>> +     if (!ve)
>>> +             return ERR_PTR(-ENOMEM);
>>> +
>>> +     ve->base.i915 = ctx->i915;
>>> +     ve->base.id = -1;
>>> +     ve->base.class = OTHER_CLASS;
>>> +     ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
>>> +     ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
>>> +     ve->base.flags = I915_ENGINE_IS_VIRTUAL;
>>> +
>>> +     snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
>>> +
>>> +     err = i915_timeline_init(ctx->i915, &ve->base.timeline, NULL);
>>> +     if (err)
>>> +             goto err_put;
>>> +     i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
>>> +
>>> +     ve->base.cops = &virtual_context_ops;
>>> +     ve->base.request_alloc = execlists_request_alloc;
>>> +
>>> +     ve->base.schedule = i915_schedule;
>>> +     ve->base.submit_request = virtual_submit_request;
>>> +
>>> +     ve->base.execlists.queue_priority_hint = INT_MIN;
>>> +     tasklet_init(&ve->base.execlists.tasklet,
>>> +                  virtual_submission_tasklet,
>>> +                  (unsigned long)ve);
>>> +
>>> +     intel_context_init(&ve->context, ctx, &ve->base);
>>> +
>>> +     for (n = 0; n < count; n++) {
>>> +             struct intel_engine_cs *sibling = siblings[n];
>>> +
>>> +             GEM_BUG_ON(!is_power_of_2(sibling->mask));
>>> +             if (sibling->mask & ve->base.mask)
>>> +                     continue;
>>
>> Continuing from the previous round - should we -EINVAL if two of the
>> same are found in the map? Since we are going to silently drop it..
>> perhaps better to disallow.
> 
> Could do. I have no strong use case that expects to be able to handle
> the user passing in (vcs1, vcs2, vcs2).
> 
> The really important question, is do we always create a virtual engine
> even wrapping a single physical engine.
> 
> The more I ask myself, the answer is yes. (Primarily so that we can
> create multiple instances of the same engine with different logical
> contexts and sseus. Oh flip, i915_perf needs updating to find virtual
> contexts.). It's just that submitting a stream of nops to a virtual engine
> is 3x as expensive as a real engine.

Hmmm I think we do need to. Or mandate single timeline before veng can 
be created. Otherwise I think we break the contract of multi-timeline 
having each own GPU context.

> It's just that you mentioned that userspace ended up wrapping everything
> inside a virtual engine willy-nilly, that spells trouble.

We can always say -EINVAL to that since it hardly makes sense. If they 
want contexts they can create real ones with ppgtt sharing.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation
  2019-03-21 14:38     ` Chris Wilson
@ 2019-03-21 15:19       ` Tvrtko Ursulin
  0 siblings, 0 replies; 47+ messages in thread
From: Tvrtko Ursulin @ 2019-03-21 15:19 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 21/03/2019 14:38, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-20 13:13:45)
>>
>> On 19/03/2019 11:57, Chris Wilson wrote:
>>> A usecase arose out of handling context recovery in mesa, whereby they
>>> wish to recreate a context with fresh logical state but preserving all
>>> other details of the original. Currently, they create a new context and
>>> iterate over which bits they want to copy across, but it would much more
>>> convenient if they were able to just pass in a target context to clone
>>> during creation. This essentially extends the setparam during creation
>>> to pull the details from a target context instead of the user supplied
>>> parameters.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_context.c | 154 ++++++++++++++++++++++++
>>>    include/uapi/drm/i915_drm.h             |  14 +++
>>>    2 files changed, 168 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index fc1f64e19507..f36648329074 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -1500,8 +1500,162 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
>>>        return ctx_setparam(arg->ctx, &local.param);
>>>    }
>>>    
>>> +static int clone_flags(struct i915_gem_context *dst,
>>> +                    struct i915_gem_context *src)
>>> +{
>>> +     dst->user_flags = src->user_flags;
>>> +     return 0;
>>> +}
>>> +
>>> +static int clone_schedattr(struct i915_gem_context *dst,
>>> +                        struct i915_gem_context *src)
>>> +{
>>> +     dst->sched = src->sched;
>>> +     return 0;
>>> +}
>>> +
>>> +static int clone_sseu(struct i915_gem_context *dst,
>>> +                   struct i915_gem_context *src)
>>> +{
>>> +     const struct intel_sseu default_sseu =
>>> +             intel_device_default_sseu(dst->i915);
>>> +     struct intel_engine_cs *engine;
>>> +     enum intel_engine_id id;
>>> +
>>> +     for_each_engine(engine, dst->i915, id) {
>>
>> Hm in the load balancing patch this needs to be extended so the veng ce
>> is also handled here.
>>
>> Possibly even when adding engine map the loop needs to iterate the map
>> and not for_each_engine?
> 
> One problem is that it is hard to match a veng in one context with
> another context, there may even be several :|
> 
> And then in clone_engines, we create a fresh virtual engine. So a nasty
> interoperation with clone_engines.
> 
> Bleugh.

Hmm indeed..

CLONE_ENGINES + CLONE_SSEU is possible, but order of cloning is 
important, right?

CLONE_SSEU without CLONE_ENGINES is less well intuitively defined. 
Contexts might not be "compatible" as you say. Should the code check if 
the map matches and veng instances match in their masks? Sounds 
questionable since otherwise in the patch you took the approach of 
cloning what can be cloned.

What seemed a simple patch is becoming more complicated.

>>> +             struct intel_context *ce;
>>> +             struct intel_sseu sseu;
>>> +
>>> +             ce = intel_context_lookup(src, engine);
>>> +             if (!ce)
>>> +                     continue;
>>> +
>>> +             sseu = ce->sseu;
>>> +             if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
>>
>> Could memcmp against &ce->sseu directly and keep src_ce and dst_ce so
>> you can copy over without a temporary copy on stack?
> 
> At one point, the locking favoured making a local sseu to avoid
> overlapping locks. Hmm, sseu = ce->sseu could still tear. Pedantically
> that copy should be locked.

True!

>>> +                     continue;
>>> +
>>> +             ce = intel_context_pin_lock(dst, engine);
>>> +             if (IS_ERR(ce))
>>> +                     return PTR_ERR(ce);
>>> +
>>> +             ce->sseu = sseu;
>>> +             intel_context_pin_unlock(ce);
>>> +     }
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int clone_timeline(struct i915_gem_context *dst,
>>> +                       struct i915_gem_context *src)
>>> +{
>>> +     if (src->timeline) {
>>> +             GEM_BUG_ON(src->timeline == dst->timeline);
>>> +
>>> +             if (dst->timeline)
>>> +                     i915_timeline_put(dst->timeline);
>>> +             dst->timeline = i915_timeline_get(src->timeline);
>>> +     }
>>> +
>>> +     return 0;
>>> +}
>>> +
>>> +static int clone_vm(struct i915_gem_context *dst,
>>> +                 struct i915_gem_context *src)
>>> +{
>>> +     struct i915_hw_ppgtt *ppgtt;
>>> +
>>> +     rcu_read_lock();
>>> +     do {
>>> +             ppgtt = READ_ONCE(src->ppgtt);
>>> +             if (!ppgtt)
>>> +                     break;
>>> +
>>> +             if (!kref_get_unless_zero(&ppgtt->ref))
>>> +                     continue;
>>> +
>>> +             /*
>>> +              * This ppgtt may have be reallocated between
>>> +              * the read and the kref, and reassigned to a third
>>> +              * context. In order to avoid inadvertent sharing
>>> +              * of this ppgtt with that third context (and not
>>> +              * src), we have to confirm that we have the same
>>> +              * ppgtt after passing through the strong memory
>>> +              * barrier implied by a successful
>>> +              * kref_get_unless_zero().
>>> +              *
>>> +              * Once we have acquired the current ppgtt of src,
>>> +              * we no longer care if it is released from src, as
>>> +              * it cannot be reallocated elsewhere.
>>> +              */
>>> +
>>> +             if (ppgtt == READ_ONCE(src->ppgtt))
>>> +                     break;
>>> +
>>> +             i915_ppgtt_put(ppgtt);
>>> +     } while (1);
>>> +     rcu_read_unlock();
>>
>> I still have the same problem. What if you added here:
>>
>> GEM_BUG_ON(ppgtt != READ_ONCE(src->ppgtt));
>>
>> Could it trigger? If so what is the point in the last check in the loop
>> above?
> 
> Yes, it can trigger, as there is no outer mutex guarding the assignment
> of src->ppgtt with our read. And that is why the check has to exist --
> because it can be reassigned during the first read and before we acquire
> the kref, and so ppgtt may have been freed and then reallocated and
> assigned to a new ctx during that interval. We don't care that
> src->ppgtt gets updated after we have taken a copy, we care that ppgtt
> may get reused on another ctx entirely.

Okay I get it, RCU allocation recycling? Makes sense in that case.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH 15/18] drm/i915: Load balancing across a virtual engine
  2019-03-21 15:13       ` Tvrtko Ursulin
@ 2019-03-21 15:28         ` Chris Wilson
  0 siblings, 0 replies; 47+ messages in thread
From: Chris Wilson @ 2019-03-21 15:28 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-21 15:13:59)
> 
> On 21/03/2019 15:00, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-20 15:59:14)
> >>
> >> On 19/03/2019 11:57, Chris Wilson wrote:
> >>> +static void virtual_submission_tasklet(unsigned long data)
> >>> +{
> >>> +     struct virtual_engine * const ve = (struct virtual_engine *)data;
> >>> +     unsigned int n;
> >>> +     int prio;
> >>> +
> >>> +     prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> >>> +     if (prio == INT_MIN)
> >>> +             return;
> >>
> >> When does it hit this return?
> > 
> > If the tasklet runs when I don't expect it to. Should be never indeed.
> 
> GEM_BUG_ON then, or a GEM_WARN_ON to start with?
> 
> > At least with bonding it becomes something a bit more tangible.
> 
> Hmm how so?

Instead of prio == INT_MIN, we compute the execution mask which can end
up being 0 due to user carelessness.

> >>> +     local_irq_disable();
> >>> +     for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> >>> +             struct intel_engine_cs *sibling = ve->siblings[n];
> >>> +             struct ve_node * const node = &ve->nodes[sibling->id];
> >>> +             struct rb_node **parent, *rb;
> >>> +             bool first;
> >>> +
> >>> +             spin_lock(&sibling->timeline.lock);
> >>> +
> >>> +             if (!RB_EMPTY_NODE(&node->rb)) {
> >>> +                     /*
> >>> +                      * Cheat and avoid rebalancing the tree if we can
> >>> +                      * reuse this node in situ.
> >>> +                      */
> >>> +                     first = rb_first_cached(&sibling->execlists.virtual) ==
> >>> +                             &node->rb;
> >>> +                     if (prio == node->prio || (prio > node->prio && first))
> >>> +                             goto submit_engine;
> >>> +
> >>> +                     rb_erase_cached(&node->rb, &sibling->execlists.virtual);
> >>
> >> How the cheat works exactly? Avoids inserting into the tree in some
> >> cases? And how does the real tasklet find this node then?
> > 
> > It's already in the sibling->execlists.virtual, so we don't need to
> > remove the node and reinsert it. So when we kick the sibling's tasklet
> > it is right there.
> 
> So the cheat bit is just the prio and first check and the erase on 
> non-empty node is normal operation?

Yes. As we have to update the priority, which means rebalancing the
tree. Given this is an rbtree, that means remove & insert. (If the node
isn't already in the tree, we skip to the insert.)

> >>> +     intel_context_init(&ve->context, ctx, &ve->base);
> >>> +
> >>> +     for (n = 0; n < count; n++) {
> >>> +             struct intel_engine_cs *sibling = siblings[n];
> >>> +
> >>> +             GEM_BUG_ON(!is_power_of_2(sibling->mask));
> >>> +             if (sibling->mask & ve->base.mask)
> >>> +                     continue;
> >>
> >> Continuing from the previous round - should we -EINVAL if two of the
> >> same are found in the map? Since we are going to silently drop it..
> >> perhaps better to disallow.
> > 
> > Could do. I have no strong use case that expects to be able to handle
> > the user passing in (vcs1, vcs2, vcs2).
> > 
> > The really important question, is do we always create a virtual engine
> > even wrapping a single physical engine.
> > 
> > The more I ask myself, the answer is yes. (Primarily so that we can
> > create multiple instances of the same engine with different logical
> > contexts and sseus. Oh flip, i915_perf needs updating to find virtual
> > contexts.). It's just that submitting a stream of nops to a virtual engine
> > is 3x as expensive as a real engine.
> 
> Hmmm I think we do need to. Or mandate single timeline before veng can 
> be created. Otherwise I think we break the contract of multi-timeline 
> having each own GPU context.

Yeah, following that to the logical conclusion, each entry in
ctx->engines[] should be an independence of instance. That makes sense,
and is what I would expect (if I put 2 rcs0 into engines[], then I want
2 rcs contexts!)

> > It's just that you mentioned that userspace ended up wrapping everything
> > inside a virtual engine willy-nilly, that spells trouble.
> 
> We can always say -EINVAL to that since it hardly makes sense. If they 
> want contexts they can create real ones with ppgtt sharing.

I have a plan for lightweight logical engines.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2019-03-21 15:29 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-19 11:57 [PATCH 01/18] drm/i915/selftests: Provide stub reset functions Chris Wilson
2019-03-19 11:57 ` [PATCH 02/18] drm/i915: Flush pages on acquisition Chris Wilson
2019-03-20 11:41   ` Matthew Auld
2019-03-20 11:48     ` Chris Wilson
2019-03-20 12:26       ` Matthew Auld
2019-03-20 12:35         ` Chris Wilson
2019-03-20 14:24           ` Matthew Auld
2019-03-21  0:16           ` Chris Wilson
2019-03-19 11:57 ` [PATCH 03/18] drm/i915: Move intel_engine_mask_t around for use by i915_request_types.h Chris Wilson
2019-03-19 11:57 ` [PATCH 04/18] drm/i915: Separate GEM context construction and registration to userspace Chris Wilson
2019-03-19 13:41   ` Tvrtko Ursulin
2019-03-19 13:56     ` Chris Wilson
2019-03-19 11:57 ` [PATCH 05/18] drm/i915: Introduce a mutex for file_priv->context_idr Chris Wilson
2019-03-20 10:36   ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 06/18] drm/i915: Stop storing ctx->user_handle Chris Wilson
2019-03-19 12:58   ` [PATCH v2] " Chris Wilson
2019-03-20 10:43     ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 07/18] drm/i915: Stop storing the context name as the timeline name Chris Wilson
2019-03-20 12:46   ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 08/18] drm/i915: Introduce the i915_user_extension_method Chris Wilson
2019-03-19 11:57 ` [PATCH 09/18] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
2019-03-20 13:00   ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 10/18] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
2019-03-19 11:57 ` [PATCH 11/18] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
2019-03-19 11:57 ` [PATCH 12/18] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
2019-03-20 13:13   ` Tvrtko Ursulin
2019-03-21 14:38     ` Chris Wilson
2019-03-21 15:19       ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 13/18] drm/i915: Allow a context to define its set of engines Chris Wilson
2019-03-20 13:20   ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 14/18] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
2019-03-20 13:22   ` Tvrtko Ursulin
2019-03-19 11:57 ` [PATCH 15/18] drm/i915: Load balancing across a virtual engine Chris Wilson
2019-03-20 15:59   ` Tvrtko Ursulin
2019-03-21 15:00     ` Chris Wilson
2019-03-21 15:13       ` Tvrtko Ursulin
2019-03-21 15:28         ` Chris Wilson
2019-03-19 11:57 ` [PATCH 16/18] drm/i915: Extend execution fence to support a callback Chris Wilson
2019-03-19 11:57 ` [PATCH 17/18] drm/i915/execlists: Virtual engine bonding Chris Wilson
2019-03-19 11:57 ` [PATCH 18/18] drm/i915: Allow specification of parallel execbuf Chris Wilson
2019-03-19 12:09 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions Patchwork
2019-03-19 12:18 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-03-19 12:40 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-03-19 13:12 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/18] drm/i915/selftests: Provide stub reset functions (rev2) Patchwork
2019-03-19 13:21 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-03-19 13:32 ` ✓ Fi.CI.BAT: success " Patchwork
2019-03-19 21:14 ` ✗ Fi.CI.IGT: failure " Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.