All of lore.kernel.org
 help / color / mirror / Atom feed
* Home straight for veng, the uAPI wars
@ 2019-03-08 14:12 Chris Wilson
  2019-03-08 14:12 ` [PATCH 01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Chris Wilson
                   ` (16 more replies)
  0 siblings, 17 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

All the prep work is done, and all that is left is to modify the context
uAPI to allow ourselves to construct and use these magical virtual
engines.
-Chris


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH 01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 14:12 ` [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method Chris Wilson
                   ` (15 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

It is debatable whether having an error message on suspend for forcibly
cancelling outstanding work is worthwhile. We want to know if it occurs
in the wild (as we will then have to reconsider the approach!), but
equally is not fatal across suspend, as upon resume we automatically
clear the wedged status.

However, CI does trigger this scenario with gem_eio/suspend; as there we
are intentionally wedging the device upon suspend. The dilemma is how
not to trigger a failure report for the dmesg spam, for which the
quickest response is to suppress the warning in the kernel. I'd rather
mark it as accepted in gem_eio, but for now detecting when gem_eio is
playing games and cancelling the warning for that case seems a barely
acceptable hack.

Testcase: igt/gem_eio/suspend
Reference: 5861b013e2c7 ("drm/i915: Do a synchronous switch-to-kernel-context on idling")
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e7e8c236bc8e..8e0833b01ddc 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2848,10 +2848,13 @@ static bool switch_to_kernel_context_sync(struct drm_i915_private *i915,
 		result = false;
 
 	if (!result) {
+		if (i915_modparams.reset) { /* hide the warning for gem_eio */
+			dev_err(i915->drm.dev,
+				"Failed to idle engines, declaring wedged!\n");
+			GEM_TRACE_DUMP();
+		}
+
 		/* Forcibly cancel outstanding work and leave the gpu quiet. */
-		dev_err(i915->drm.dev,
-			"Failed to idle engines, declaring wedged!\n");
-		GEM_TRACE_DUMP();
 		i915_gem_set_wedged(i915);
 	}
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
  2019-03-08 14:12 ` [PATCH 01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 14:33   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 03/13] drm/i915: Introduce a context barrier callback Chris Wilson
                   ` (14 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

An idea for extending uABI inspired by Vulkan's extension chains.
Instead of expanding the data struct for each ioctl every time we need
to add a new feature, define an extension chain instead. As we add
optional interfaces to control the ioctl, we define a new extension
struct that can be linked into the ioctl data only when required by the
user. The key advantage being able to ignore large control structs for
optional interfaces/extensions, while being able to process them in a
consistent manner.

In comparison to other extensible ioctls, the key difference is the
use of a linked chain of extension structs vs an array of tagged
pointers. For example,

struct drm_amdgpu_cs_chunk {
        __u32           chunk_id;
        __u32           length_dw;
        __u64           chunk_data;
};

struct drm_amdgpu_cs_in {
        __u32           ctx_id;
        __u32           bo_list_handle;
        __u32           num_chunks;
        __u32           _pad;
        __u64           chunks;
};

allows userspace to pass in array of pointers to extension structs, but
must therefore keep constructing that array along side the command stream.
In dynamic situations like that, a linked list is preferred and does not
similar from extra cache line misses as the extension structs themselves
must still be loaded separate to the chunks array.

v2: Apply the tail call optimisation directly to nip the worry of stack
overflow in the bud.
v3: Defend against recursion.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/Makefile               |  1 +
 drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
 drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
 drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
 include/uapi/drm/i915_drm.h                 | 20 ++++++++++
 5 files changed, 91 insertions(+)
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
 create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 68fecf355471..60de05f3fa60 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -46,6 +46,7 @@ i915-y := i915_drv.o \
 	  i915_sw_fence.o \
 	  i915_syncmap.o \
 	  i915_sysfs.o \
+	  i915_user_extensions.o \
 	  intel_csr.o \
 	  intel_device_info.o \
 	  intel_pm.o \
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
new file mode 100644
index 000000000000..879b4094b2d7
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.c
@@ -0,0 +1,43 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#include <linux/sched/signal.h>
+#include <linux/uaccess.h>
+#include <uapi/drm/i915_drm.h>
+
+#include "i915_user_extensions.h"
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned long count,
+			 void *data)
+{
+	unsigned int stackdepth = 512;
+
+	while (ext) {
+		int err;
+		u64 x;
+
+		if (!stackdepth--) /* recursion vs useful flexibility */
+			return -EINVAL;
+
+		if (get_user(x, &ext->name))
+			return -EFAULT;
+
+		err = -EINVAL;
+		if (x < count && tbl[x])
+			err = tbl[x](ext, data);
+		if (err)
+			return err;
+
+		if (get_user(x, &ext->next_extension))
+			return -EFAULT;
+
+		ext = u64_to_user_ptr(x);
+	}
+
+	return 0;
+}
diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
new file mode 100644
index 000000000000..313a510b068a
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_user_extensions.h
@@ -0,0 +1,20 @@
+/*
+ * SPDX-License-Identifier: MIT
+ *
+ * Copyright © 2018 Intel Corporation
+ */
+
+#ifndef I915_USER_EXTENSIONS_H
+#define I915_USER_EXTENSIONS_H
+
+struct i915_user_extension;
+
+typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
+				      void *data);
+
+int i915_user_extensions(struct i915_user_extension __user *ext,
+			 const i915_user_extension_fn *tbl,
+			 unsigned long count,
+			 void *data);
+
+#endif /* I915_USER_EXTENSIONS_H */
diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
index 9726df37c4c4..fcc751aa1ea8 100644
--- a/drivers/gpu/drm/i915/i915_utils.h
+++ b/drivers/gpu/drm/i915/i915_utils.h
@@ -105,6 +105,13 @@
 	__T;								\
 })
 
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })
+
 static inline u64 ptr_to_u64(const void *ptr)
 {
 	return (uintptr_t)ptr;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index aa2d4c73a97d..39835793722b 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -62,6 +62,26 @@ extern "C" {
 #define I915_ERROR_UEVENT		"ERROR"
 #define I915_RESET_UEVENT		"RESET"
 
+/*
+ * i915_user_extension: Base class for defining a chain of extensions
+ *
+ * Many interfaces need to grow over time. In most cases we can simply
+ * extend the struct and have userspace pass in more data. Another option,
+ * as demonstrated by Vulkan's approach to providing extensions for forward
+ * and backward compatibility, is to use a list of optional structs to
+ * provide those extra details.
+ *
+ * The key advantage to using an extension chain is that it allows us to
+ * redefine the interface more easily than an ever growing struct of
+ * increasing complexity, and for large parts of that interface to be
+ * entirely optional. The downside is more pointer chasing; chasing across
+ * the __user boundary with pointers encapsulated inside u64.
+ */
+struct i915_user_extension {
+	__u64 next_extension;
+	__u64 name;
+};
+
 /*
  * MOCS indexes used for GPU surfaces, defining the cacheability of the
  * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 03/13] drm/i915: Introduce a context barrier callback
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
  2019-03-08 14:12 ` [PATCH 01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Chris Wilson
  2019-03-08 14:12 ` [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 14:12 ` [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to update live state within a context.
As this state may be in use by the GPU and we haven't been explicitly
tracking its activity, we instead attach it to a request we send down
the context setup with its new state and on retiring that request
cleanup the old state as we then know that it is no longer live.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c       |  74 +++++++++++++
 .../gpu/drm/i915/selftests/i915_gem_context.c | 103 ++++++++++++++++++
 2 files changed, 177 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index f9a21a891aa4..b6370225dcb5 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -677,6 +677,80 @@ last_request_on_engine(struct i915_timeline *timeline,
 	return NULL;
 }
 
+struct context_barrier_task {
+	struct i915_active base;
+	void (*task)(void *data);
+	void *data;
+};
+
+static void cb_retire(struct i915_active *base)
+{
+	struct context_barrier_task *cb = container_of(base, typeof(*cb), base);
+
+	if (cb->task)
+		cb->task(cb->data);
+
+	i915_active_fini(&cb->base);
+	kfree(cb);
+}
+
+I915_SELFTEST_DECLARE(static unsigned long context_barrier_inject_fault);
+static int context_barrier_task(struct i915_gem_context *ctx,
+				unsigned long engines,
+				void (*task)(void *data),
+				void *data)
+{
+	struct drm_i915_private *i915 = ctx->i915;
+	struct context_barrier_task *cb;
+	struct intel_context *ce;
+	intel_wakeref_t wakeref;
+	int err = 0;
+
+	lockdep_assert_held(&i915->drm.struct_mutex);
+	GEM_BUG_ON(!task);
+
+	cb = kmalloc(sizeof(*cb), GFP_KERNEL);
+	if (!cb)
+		return -ENOMEM;
+
+	i915_active_init(i915, &cb->base, cb_retire);
+	i915_active_acquire(&cb->base);
+
+	wakeref = intel_runtime_pm_get(i915);
+	list_for_each_entry(ce, &ctx->active_engines, active_link) {
+		struct intel_engine_cs *engine = ce->engine;
+		struct i915_request *rq;
+
+		if (!(ce->engine->mask & engines))
+			continue;
+
+		if (I915_SELFTEST_ONLY(context_barrier_inject_fault &
+				       engine->mask)) {
+			err = -ENXIO;
+			break;
+		}
+
+		rq = i915_request_alloc(engine, ctx);
+		if (IS_ERR(rq)) {
+			err = PTR_ERR(rq);
+			break;
+		}
+
+		err = i915_active_ref(&cb->base, rq->fence.context, rq);
+		i915_request_add(rq);
+		if (err)
+			break;
+	}
+	intel_runtime_pm_put(i915, wakeref);
+
+	cb->task = err ? NULL : task; /* caller needs to unwind instead */
+	cb->data = data;
+
+	i915_active_release(&cb->base);
+
+	return err;
+}
+
 int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 				      unsigned long mask)
 {
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 5b8614b2fbe4..664ae1428ecc 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -1594,10 +1594,113 @@ static int igt_switch_to_kernel_context(void *arg)
 	return err;
 }
 
+static void mock_barrier_task(void *data)
+{
+	unsigned int *counter = data;
+
+	++*counter;
+}
+
+static int mock_context_barrier(void *arg)
+{
+#undef pr_fmt
+#define pr_fmt(x) "context_barrier_task():" # x
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq;
+	intel_wakeref_t wakeref;
+	unsigned int counter;
+	int err;
+
+	/*
+	 * The context barrier provides us with a callback after it emits
+	 * a request; useful for retiring old state after loading new.
+	 */
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	ctx = mock_context(i915, "mock");
+	if (IS_ERR(ctx)) {
+		err = PTR_ERR(ctx);
+		goto unlock;
+	}
+
+	counter = 0;
+	err = context_barrier_task(ctx, 0, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	if (counter == 0) {
+		pr_err("Did not retire immediately with 0 engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	counter = 0;
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	if (counter == 0) {
+		pr_err("Did not retire immediately for all inactive engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+	rq = ERR_PTR(-ENODEV);
+	with_intel_runtime_pm(i915, wakeref)
+		rq = i915_request_alloc(i915->engine[RCS0], ctx);
+	if (IS_ERR(rq)) {
+		pr_err("Request allocation failed!\n");
+		goto out;
+	}
+	i915_request_add(rq);
+	GEM_BUG_ON(list_empty(&ctx->active_engines));
+
+	counter = 0;
+	context_barrier_inject_fault = BIT(RCS0);
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	context_barrier_inject_fault = 0;
+	if (err == -ENXIO)
+		err = 0;
+	else
+		pr_err("Did not hit fault injection!\n");
+	if (counter != 0) {
+		pr_err("Invoked callback on error!\n");
+		err = -EIO;
+	}
+	if (err)
+		goto out;
+
+	counter = 0;
+	err = context_barrier_task(ctx, -1, mock_barrier_task, &counter);
+	if (err) {
+		pr_err("Failed at line %d, err=%d\n", __LINE__, err);
+		goto out;
+	}
+	mock_device_flush(i915);
+	if (counter == 0) {
+		pr_err("Did not retire on each active engines\n");
+		err = -EINVAL;
+		goto out;
+	}
+
+out:
+	mock_context_close(ctx);
+unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+#undef pr_fmt
+#define pr_fmt(x) x
+}
+
 int i915_gem_context_mock_selftests(void)
 {
 	static const struct i915_subtest tests[] = {
 		SUBTEST(igt_switch_to_kernel_context),
+		SUBTEST(mock_context_barrier),
 	};
 	struct drm_i915_private *i915;
 	int err;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (2 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 03/13] drm/i915: Introduce a context barrier callback Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 15:03   ` Tvrtko Ursulin
  2019-03-08 15:41   ` [PATCH v2] " Chris Wilson
  2019-03-08 14:12 ` [PATCH 05/13] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
                   ` (12 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit
v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
similarly, turned out different!)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c               |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
 .../gpu/drm/i915/selftests/i915_gem_context.c | 215 ++++++++++++---
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
 drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
 include/uapi/drm/i915_drm.h                   |  36 +++
 11 files changed, 497 insertions(+), 61 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 0d743907e7bc..5d53efc4c5d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3121,6 +3121,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c4ffe19ec698..8c4eb302cc0b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -218,6 +218,9 @@ struct drm_i915_file_private {
 	} mm;
 	struct idr context_idr;
 
+	struct mutex vm_lock;
+	struct idr vm_idr;
+
 	unsigned int bsd_engine;
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b6370225dcb5..fb2aba06f693 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -126,6 +126,8 @@ static void lut_close(struct i915_gem_context *ctx)
 		struct i915_vma *vma = rcu_dereference_raw(*slot);
 
 		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
+
+		vma->open_count--;
 		__i915_gem_object_release_unless_active(vma->obj);
 	}
 	rcu_read_unlock();
@@ -306,7 +308,7 @@ static void context_close(struct i915_gem_context *ctx)
 	 */
 	lut_close(ctx);
 	if (ctx->ppgtt)
-		i915_ppgtt_close(&ctx->ppgtt->vm);
+		i915_ppgtt_close(ctx->ppgtt);
 
 	ctx->file_priv = ERR_PTR(-EBADF);
 	i915_gem_context_put(ctx);
@@ -417,6 +419,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
 	context_close(ctx);
 }
 
+static struct i915_hw_ppgtt *
+__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_hw_ppgtt *old = ctx->ppgtt;
+
+	i915_ppgtt_open(ppgtt);
+	ctx->ppgtt = i915_ppgtt_get(ppgtt);
+
+	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
+
+	return old;
+}
+
+static void __assign_ppgtt(struct i915_gem_context *ctx,
+			   struct i915_hw_ppgtt *ppgtt)
+{
+	if (ppgtt == ctx->ppgtt)
+		return;
+
+	ppgtt = __set_ppgtt(ctx, ppgtt);
+	if (ppgtt) {
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
 			struct drm_i915_file_private *file_priv)
@@ -443,8 +471,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 			return ERR_CAST(ppgtt);
 		}
 
-		ctx->ppgtt = ppgtt;
-		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
+		__assign_ppgtt(ctx, ppgtt);
+		i915_ppgtt_put(ppgtt);
 	}
 
 	trace_i915_context_create(ctx);
@@ -625,19 +653,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+static int vm_idr_cleanup(int id, void *p, void *data)
+{
+	i915_ppgtt_put(p);
+	return 0;
+}
+
 int i915_gem_context_open(struct drm_i915_private *i915,
 			  struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 
+	mutex_init(&file_priv->vm_lock);
+
 	idr_init(&file_priv->context_idr);
+	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	ctx = i915_gem_create_context(i915, file_priv);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
+		idr_destroy(&file_priv->vm_idr);
 		return PTR_ERR(ctx);
 	}
 
@@ -654,6 +692,89 @@ void i915_gem_context_close(struct drm_file *file)
 
 	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
 	idr_destroy(&file_priv->context_idr);
+
+	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
+	idr_destroy(&file_priv->vm_idr);
+
+	mutex_destroy(&file_priv->vm_lock);
+}
+
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_vm_control *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+
+	if (!HAS_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	ppgtt = i915_ppgtt_create(i915, file_priv);
+	if (IS_ERR(ppgtt))
+		return PTR_ERR(ppgtt);
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		goto err_put;
+
+	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+	mutex_unlock(&file_priv->vm_lock);
+	if (err < 0)
+		goto err_put;
+
+	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
+	ppgtt->user_handle = err;
+	args->id = err;
+	return 0;
+
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_vm_control *args = data;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+	u32 id;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	id = args->id;
+	if (!id)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_remove(&file_priv->vm_idr, id);
+	if (ppgtt) {
+		GEM_BUG_ON(!ppgtt->user_handle);
+		ppgtt->user_handle = 0;
+	}
+
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	i915_ppgtt_put(ppgtt);
+	return 0;
 }
 
 static struct i915_request *
@@ -799,6 +920,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 	return 0;
 }
 
+static int get_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int ret;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	/* XXX rcu acquire? */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ppgtt = i915_ppgtt_get(ctx->ppgtt);
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	ret = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (ret)
+		goto err_put;
+
+	if (!ppgtt->user_handle) {
+		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+		GEM_BUG_ON(!ret);
+		if (ret < 0)
+			goto err_unlock;
+
+		ppgtt->user_handle = ret;
+		i915_ppgtt_get(ppgtt);
+	}
+
+	args->size = 0;
+	args->value = ppgtt->user_handle;
+
+	ret = 0;
+err_unlock:
+	mutex_unlock(&file_priv->vm_lock);
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return ret;
+}
+
+static void set_ppgtt_barrier(void *data)
+{
+	struct i915_hw_ppgtt *old = data;
+
+	i915_ppgtt_close(old);
+	i915_ppgtt_put(old);
+}
+
+static int set_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt, *old;
+	int err;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (upper_32_bits(args->value))
+		return -EINVAL;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_find(&file_priv->vm_idr, args->value);
+	if (ppgtt) {
+		GEM_BUG_ON(ppgtt->user_handle != args->value);
+		i915_ppgtt_get(ppgtt);
+	}
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (err)
+		goto out;
+
+	if (ppgtt == ctx->ppgtt)
+		goto unlock;
+
+	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
+	lut_close(ctx);
+
+	old = __set_ppgtt(ctx, ppgtt);
+
+	/*
+	 * We need to flush any requests using the current ppgtt before
+	 * we release it as the requests do not hold a reference themselves,
+	 * only indirectly through the context.
+	 */
+	err = context_barrier_task(ctx, ALL_ENGINES, set_ppgtt_barrier, old);
+	if (err) {
+		ctx->ppgtt = old;
+		ctx->desc_template = default_desc_template(ctx->i915, old);
+
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+
+unlock:
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+out:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
 {
 	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
@@ -973,6 +1208,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = get_sseu(ctx, args);
 		break;
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1274,9 +1512,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
 			ret = -EINVAL;
@@ -1332,9 +1567,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 					I915_USER_PRIORITY(priority);
 		}
 		break;
+
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = set_sseu(ctx, args);
 		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = set_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 5a32c4b4816f..1e670372892c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -153,6 +153,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev);
 
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
+
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index dac08d9c3fab..d717952cc430 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2099,10 +2099,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
 	return ppgtt;
 }
 
-void i915_ppgtt_close(struct i915_address_space *vm)
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
 {
-	GEM_BUG_ON(vm->closed);
-	vm->closed = true;
+	GEM_BUG_ON(ppgtt->vm.closed);
+
+	ppgtt->open_count++;
+}
+
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
+{
+	GEM_BUG_ON(!ppgtt->open_count);
+	if (--ppgtt->open_count)
+		return;
+
+	GEM_BUG_ON(ppgtt->vm.closed);
+	ppgtt->vm.closed = true;
 }
 
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a47e11e6fc1b..25d5f7682bda 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 
 	unsigned long pd_dirty_engines;
+	unsigned int open_count;
+
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
 		struct i915_page_directory_pointer pdp;	/* GEN8+ */
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
+
+	u32 user_handle;
 };
 
 struct gen6_hw_ppgtt {
@@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
 void i915_ppgtt_release(struct kref *kref);
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
 					struct drm_i915_file_private *fpriv);
-void i915_ppgtt_close(struct i915_address_space *vm);
-static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
+
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
+
+static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
-	if (ppgtt)
-		kref_get(&ppgtt->ref);
+	kref_get(&ppgtt->ref);
+	return ppgtt;
 }
+
 static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index 1e66cff985f8..0b7740dc18cb 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
 	err = i915_subtests(tests, ppgtt);
 
 out_close:
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 664ae1428ecc..c4a5cf26992e 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
 	return 0;
 }
 
-static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
+static noinline int cpu_check(struct drm_i915_gem_object *obj,
+			      unsigned int idx, unsigned int max)
 {
 	unsigned int n, m, needs_flush;
 	int err;
@@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (m = 0; m < max; m++) {
 			if (map[m] != m) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], m);
+				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
+				       __builtin_return_address(0), idx,
+				       n, real_page_count(obj), m, max,
+				       map[m], m);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (; m < DW_PER_PAGE; m++) {
 			if (map[m] != STACK_MAGIC) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], STACK_MAGIC);
+				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
+				       __builtin_return_address(0), idx, n, m,
+				       map[m], STACK_MAGIC);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
 static int igt_ctx_exec(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct drm_i915_gem_object *obj = NULL;
-	unsigned long ncontexts, ndwords, dw;
-	struct igt_live_test t;
-	struct drm_file *file;
-	IGT_TIMEOUT(end_time);
-	LIST_HEAD(objects);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int err = -ENODEV;
 
 	/*
@@ -495,41 +495,166 @@ static int igt_ctx_exec(void *arg)
 	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return 0;
 
+	for_each_engine(engine, i915, id) {
+		struct drm_i915_gem_object *obj = NULL;
+		unsigned long ncontexts, ndwords, dw;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		if (!engine->context_size)
+			continue; /* No logical context support in HW */
+
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
+			goto out_unlock;
+
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			struct i915_gem_context *ctx;
+			intel_wakeref_t wakeref;
+
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
+
+			if (!obj) {
+				obj = create_test_object(ctx, file, &objects);
+				if (IS_ERR(obj)) {
+					err = PTR_ERR(obj);
+					goto out_unlock;
+				}
+			}
+
+			with_intel_runtime_pm(i915, wakeref)
+				err = gpu_fill(obj, ctx, engine, dw);
+			if (err) {
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				       ndwords, dw, max_dwords(obj),
+				       engine->name, ctx->hw_id,
+				       yesno(!!ctx->ppgtt), err);
+				goto out_unlock;
+			}
+
+			if (++dw == max_dwords(obj)) {
+				obj = NULL;
+				dw = 0;
+			}
+
+			ndwords++;
+			ncontexts++;
+		}
+
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
+
+out_unlock:
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
+
+		mock_file_free(i915, file);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int igt_shared_ctx_exec(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *parent;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct igt_live_test t;
+	struct drm_file *file;
+	int err = 0;
+
+	/*
+	 * Create a few different contexts with the same mm and write
+	 * through each ctx using the GPU making sure those writes end
+	 * up in the expected pages of our obj.
+	 */
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
+		return 0;
+
 	file = mock_file(i915);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
 	mutex_lock(&i915->drm.struct_mutex);
 
+	parent = i915_gem_create_context(i915, file->driver_priv);
+	if (IS_ERR(parent)) {
+		err = PTR_ERR(parent);
+		goto out_unlock;
+	}
+
+	if (!parent->ppgtt) {
+		err = -ENODEV;
+		goto out_unlock;
+	}
+
 	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
-	ncontexts = 0;
-	ndwords = 0;
-	dw = 0;
-	while (!time_after(jiffies, end_time)) {
-		struct intel_engine_cs *engine;
-		struct i915_gem_context *ctx;
-		unsigned int id;
+	for_each_engine(engine, i915, id) {
+		unsigned long ncontexts, ndwords, dw;
+		struct drm_i915_gem_object *obj = NULL;
+		struct i915_gem_context *ctx = NULL;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-		ctx = i915_gem_create_context(i915, file->driver_priv);
-		if (IS_ERR(ctx)) {
-			err = PTR_ERR(ctx);
-			goto out_unlock;
-		}
+		if (!intel_engine_can_store_dword(engine))
+			continue;
 
-		for_each_engine(engine, i915, id) {
+		dw = 0;
+		ndwords = 0;
+		ncontexts = 0;
+		while (!time_after(jiffies, end_time)) {
 			intel_wakeref_t wakeref;
 
-			if (!engine->context_size)
-				continue; /* No logical context support in HW */
+			if (ctx)
+				__destroy_hw_context(ctx, file->driver_priv);
 
-			if (!intel_engine_can_store_dword(engine))
-				continue;
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
+
+			__assign_ppgtt(ctx, parent->ppgtt);
 
 			if (!obj) {
-				obj = create_test_object(ctx, file, &objects);
+				obj = create_test_object(parent, file, &objects);
 				if (IS_ERR(obj)) {
 					err = PTR_ERR(obj);
 					goto out_unlock;
@@ -551,25 +676,25 @@ static int igt_ctx_exec(void *arg)
 				obj = NULL;
 				dw = 0;
 			}
+
 			ndwords++;
+			ncontexts++;
 		}
-		ncontexts++;
-	}
-	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
-		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
 
-	dw = 0;
-	list_for_each_entry(obj, &objects, st_link) {
-		unsigned int rem =
-			min_t(unsigned int, ndwords - dw, max_dwords(obj));
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
 
-		err = cpu_check(obj, rem);
-		if (err)
-			break;
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				goto out_unlock;
 
-		dw += rem;
+			dw += rem;
+		}
 	}
-
 out_unlock:
 	if (igt_live_test_end(&t))
 		err = -EIO;
@@ -1048,7 +1173,7 @@ static int igt_ctx_readonly(void *arg)
 	struct drm_i915_gem_object *obj = NULL;
 	struct i915_gem_context *ctx;
 	struct i915_hw_ppgtt *ppgtt;
-	unsigned long ndwords, dw;
+	unsigned long idx, ndwords, dw;
 	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
@@ -1129,6 +1254,7 @@ static int igt_ctx_readonly(void *arg)
 		ndwords, RUNTIME_INFO(i915)->num_engines);
 
 	dw = 0;
+	idx = 0;
 	list_for_each_entry(obj, &objects, st_link) {
 		unsigned int rem =
 			min_t(unsigned int, ndwords - dw, max_dwords(obj));
@@ -1138,7 +1264,7 @@ static int igt_ctx_readonly(void *arg)
 		if (i915_gem_object_is_readonly(obj))
 			num_writes = 0;
 
-		err = cpu_check(obj, num_writes);
+		err = cpu_check(obj, idx++, num_writes);
 		if (err)
 			break;
 
@@ -1723,6 +1849,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_ctx_exec),
 		SUBTEST(igt_ctx_readonly),
 		SUBTEST(igt_ctx_sseu),
+		SUBTEST(igt_shared_ctx_exec),
 		SUBTEST(igt_vm_isolation),
 	};
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 826fd51c331e..57b3d9867070 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 
 	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
 
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 out_unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 8efa6892c6cd..f90328b21763 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
 		goto err_handles;
 
 	if (name) {
+		struct i915_hw_ppgtt *ppgtt;
+
 		ctx->name = kstrdup(name, GFP_KERNEL);
 		if (!ctx->name)
 			goto err_put;
 
-		ctx->ppgtt = mock_ppgtt(i915, name);
-		if (!ctx->ppgtt)
+		ppgtt = mock_ppgtt(i915, name);
+		if (!ppgtt)
 			goto err_put;
+
+		__set_ppgtt(ctx, ppgtt);
 	}
 
 	return ctx;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 39835793722b..6575470755d0 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -341,6 +341,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_PERF_ADD_CONFIG	0x37
 #define DRM_I915_PERF_REMOVE_CONFIG	0x38
 #define DRM_I915_QUERY			0x39
+#define DRM_I915_GEM_VM_CREATE		0x3a
+#define DRM_I915_GEM_VM_DESTROY		0x3b
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -400,6 +402,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
 #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1451,6 +1455,26 @@ struct drm_i915_gem_context_destroy {
 	__u32 pad;
 };
 
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
 struct drm_i915_reg_read {
 	/*
 	 * Register offset.
@@ -1540,7 +1564,19 @@ struct drm_i915_gem_context_param {
  * On creation, all new contexts are marked as recoverable.
  */
 #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
+
+	/*
+	 * The id of the associated virtual memory address space (ppGTT) of
+	 * this context. Can be retrieved and passed to another context
+	 * (on the same fd) for both to use the same ppGTT and so share
+	 * address layouts, and avoid reloading the page tables on context
+	 * switches between themselves.
+	 *
+	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
+	 */
+#define I915_CONTEXT_PARAM_VM		0x9
 /* Must be kept compact -- no holes and well documented */
+
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 05/13] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (3 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 14:12 ` [PATCH 06/13] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
                   ` (11 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

It can be useful to have a single ioctl to create a context with all
the initial parameters instead of a series of create + setparam + setparam
ioctls. This extension to create context allows any of the parameters
to be passed in as a linked list to be applied to the newly constructed
context.

v2: Make a local copy of user setparam (Tvrtko)
v3: Use flags to detect availability of extension interface

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c         |   2 +-
 drivers/gpu/drm/i915/i915_gem_context.c | 439 +++++++++++++-----------
 include/uapi/drm/i915_drm.h             | 166 +++++----
 3 files changed, 335 insertions(+), 272 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 5d53efc4c5d9..93e41c937d96 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3110,7 +3110,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_SET_SPRITE_COLORKEY, intel_sprite_set_colorkey_ioctl, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GET_SPRITE_COLORKEY, drm_noop, DRM_MASTER),
 	DRM_IOCTL_DEF_DRV(I915_GEM_WAIT, i915_gem_wait_ioctl, DRM_AUTH|DRM_RENDER_ALLOW),
-	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_DESTROY, i915_gem_context_destroy_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_REG_READ, i915_reg_read_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GET_RESET_STATS, i915_gem_context_reset_stats_ioctl, DRM_RENDER_ALLOW),
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index fb2aba06f693..b41b09f60edd 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -90,6 +90,7 @@
 #include "i915_drv.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
+#include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
 #include "intel_workarounds.h"
 
@@ -1034,192 +1035,6 @@ static int set_ppgtt(struct i915_gem_context *ctx,
 	return err;
 }
 
-static bool client_is_banned(struct drm_i915_file_private *file_priv)
-{
-	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
-}
-
-int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
-				  struct drm_file *file)
-{
-	struct drm_i915_private *i915 = to_i915(dev);
-	struct drm_i915_gem_context_create *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (!DRIVER_CAPS(i915)->has_logical_contexts)
-		return -ENODEV;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	ret = i915_terminally_wedged(i915);
-	if (ret)
-		return ret;
-
-	if (client_is_banned(file_priv)) {
-		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
-			  current->comm,
-			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
-
-		return -EIO;
-	}
-
-	ret = i915_mutex_lock_interruptible(dev);
-	if (ret)
-		return ret;
-
-	ctx = i915_gem_create_context(i915, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
-
-	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
-
-	args->ctx_id = ctx->user_handle;
-	DRM_DEBUG("HW context %d created\n", args->ctx_id);
-
-	return 0;
-}
-
-int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
-				   struct drm_file *file)
-{
-	struct drm_i915_gem_context_destroy *args = data;
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct i915_gem_context *ctx;
-	int ret;
-
-	if (args->pad != 0)
-		return -EINVAL;
-
-	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
-		return -ENOENT;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	ret = mutex_lock_interruptible(&dev->struct_mutex);
-	if (ret)
-		goto out;
-
-	__destroy_hw_context(ctx, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-
-out:
-	i915_gem_context_put(ctx);
-	return 0;
-}
-
-static int get_sseu(struct i915_gem_context *ctx,
-		    struct drm_i915_gem_context_param *args)
-{
-	struct drm_i915_gem_context_param_sseu user_sseu;
-	struct intel_engine_cs *engine;
-	struct intel_context *ce;
-
-	if (args->size == 0)
-		goto out;
-	else if (args->size < sizeof(user_sseu))
-		return -EINVAL;
-
-	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
-			   sizeof(user_sseu)))
-		return -EFAULT;
-
-	if (user_sseu.flags || user_sseu.rsvd)
-		return -EINVAL;
-
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
-	if (!engine)
-		return -EINVAL;
-
-	ce = intel_context_pin_lock(ctx, engine); /* serialises with set_sseu */
-	if (IS_ERR(ce))
-		return PTR_ERR(ce);
-
-	user_sseu.slice_mask = ce->sseu.slice_mask;
-	user_sseu.subslice_mask = ce->sseu.subslice_mask;
-	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
-	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
-
-	intel_context_pin_unlock(ce);
-
-	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
-			 sizeof(user_sseu)))
-		return -EFAULT;
-
-out:
-	args->size = sizeof(user_sseu);
-
-	return 0;
-}
-
-int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
-{
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
-	int ret = 0;
-
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
-	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
-	case I915_CONTEXT_PARAM_NO_ZEROMAP:
-		args->size = 0;
-		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
-		break;
-	case I915_CONTEXT_PARAM_GTT_SIZE:
-		args->size = 0;
-
-		if (ctx->ppgtt)
-			args->value = ctx->ppgtt->vm.total;
-		else if (to_i915(dev)->mm.aliasing_ppgtt)
-			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
-		else
-			args->value = to_i915(dev)->ggtt.vm.total;
-		break;
-	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
-		args->size = 0;
-		args->value = i915_gem_context_no_error_capture(ctx);
-		break;
-	case I915_CONTEXT_PARAM_BANNABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_bannable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_RECOVERABLE:
-		args->size = 0;
-		args->value = i915_gem_context_is_recoverable(ctx);
-		break;
-	case I915_CONTEXT_PARAM_PRIORITY:
-		args->size = 0;
-		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
-		break;
-	case I915_CONTEXT_PARAM_SSEU:
-		ret = get_sseu(ctx, args);
-		break;
-	case I915_CONTEXT_PARAM_VM:
-		ret = get_ppgtt(ctx, args);
-		break;
-	default:
-		ret = -EINVAL;
-		break;
-	}
-
-	i915_gem_context_put(ctx);
-	return ret;
-}
-
 static int gen8_emit_rpcs_config(struct i915_request *rq,
 				 struct intel_context *ce,
 				 struct intel_sseu sseu)
@@ -1499,18 +1314,11 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 }
 
-int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
-				    struct drm_file *file)
+static int ctx_setparam(struct i915_gem_context *ctx,
+			struct drm_i915_gem_context_param *args)
 {
-	struct drm_i915_file_private *file_priv = file->driver_priv;
-	struct drm_i915_gem_context_param *args = data;
-	struct i915_gem_context *ctx;
 	int ret = 0;
 
-	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
-	if (!ctx)
-		return -ENOENT;
-
 	switch (args->param) {
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
@@ -1520,6 +1328,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			clear_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
 		break;
+
 	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1528,6 +1337,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_no_error_capture(ctx);
 		break;
+
 	case I915_CONTEXT_PARAM_BANNABLE:
 		if (args->size)
 			ret = -EINVAL;
@@ -1554,7 +1364,7 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 
 			if (args->size)
 				ret = -EINVAL;
-			else if (!(to_i915(dev)->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
+			else if (!(ctx->i915->caps.scheduler & I915_SCHEDULER_CAP_PRIORITY))
 				ret = -ENODEV;
 			else if (priority > I915_CONTEXT_MAX_USER_PRIORITY ||
 				 priority < I915_CONTEXT_MIN_USER_PRIORITY)
@@ -1582,6 +1392,243 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		break;
 	}
 
+	return ret;
+}
+
+static int create_setparam(struct i915_user_extension __user *ext, void *data)
+{
+	struct drm_i915_gem_context_create_ext_setparam local;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	if (local.setparam.ctx_id)
+		return -EINVAL;
+
+	return ctx_setparam(data, &local.setparam);
+}
+
+static const i915_user_extension_fn create_extensions[] = {
+	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+};
+
+static bool client_is_banned(struct drm_i915_file_private *file_priv)
+{
+	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
+}
+
+int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_context_create_ext *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
+		return -ENODEV;
+
+	if (args->flags & I915_CONTEXT_CREATE_FLAGS_UNKNOWN)
+		return -EINVAL;
+
+	ret = i915_terminally_wedged(i915);
+	if (ret)
+		return ret;
+
+	if (client_is_banned(file_priv)) {
+		DRM_DEBUG("client %s[%d] banned from creating ctx\n",
+			  current->comm,
+			  pid_nr(get_task_pid(current, PIDTYPE_PID)));
+
+		return -EIO;
+	}
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret)
+		return ret;
+
+	ctx = i915_gem_create_context(i915, file_priv);
+	mutex_unlock(&dev->struct_mutex);
+	if (IS_ERR(ctx))
+		return PTR_ERR(ctx);
+
+	GEM_BUG_ON(i915_gem_context_is_kernel(ctx));
+
+	if (args->flags & I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS) {
+		ret = i915_user_extensions(u64_to_user_ptr(args->extensions),
+					   create_extensions,
+					   ARRAY_SIZE(create_extensions),
+					   ctx);
+		if (ret) {
+			idr_remove(&file_priv->context_idr, ctx->user_handle);
+			context_close(ctx);
+			return ret;
+		}
+	}
+
+	args->ctx_id = ctx->user_handle;
+	DRM_DEBUG("HW context %d created\n", args->ctx_id);
+
+	return 0;
+}
+
+int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file)
+{
+	struct drm_i915_gem_context_destroy *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	if (args->pad != 0)
+		return -EINVAL;
+
+	if (args->ctx_id == DEFAULT_CONTEXT_HANDLE)
+		return -ENOENT;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = mutex_lock_interruptible(&dev->struct_mutex);
+	if (ret)
+		goto out;
+
+	__destroy_hw_context(ctx, file_priv);
+	mutex_unlock(&dev->struct_mutex);
+
+out:
+	i915_gem_context_put(ctx);
+	return 0;
+}
+
+static int get_sseu(struct i915_gem_context *ctx,
+		    struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_gem_context_param_sseu user_sseu;
+	struct intel_engine_cs *engine;
+	struct intel_context *ce;
+
+	if (args->size == 0)
+		goto out;
+	else if (args->size < sizeof(user_sseu))
+		return -EINVAL;
+
+	if (copy_from_user(&user_sseu, u64_to_user_ptr(args->value),
+			   sizeof(user_sseu)))
+		return -EFAULT;
+
+	if (user_sseu.flags || user_sseu.rsvd)
+		return -EINVAL;
+
+	engine = intel_engine_lookup_user(ctx->i915,
+					  user_sseu.engine_class,
+					  user_sseu.engine_instance);
+	if (!engine)
+		return -EINVAL;
+
+	ce = intel_context_pin_lock(ctx, engine); /* serialises with set_sseu */
+	if (IS_ERR(ce))
+		return PTR_ERR(ce);
+
+	user_sseu.slice_mask = ce->sseu.slice_mask;
+	user_sseu.subslice_mask = ce->sseu.subslice_mask;
+	user_sseu.min_eus_per_subslice = ce->sseu.min_eus_per_subslice;
+	user_sseu.max_eus_per_subslice = ce->sseu.max_eus_per_subslice;
+
+	intel_context_pin_unlock(ce);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), &user_sseu,
+			 sizeof(user_sseu)))
+		return -EFAULT;
+
+out:
+	args->size = sizeof(user_sseu);
+
+	return 0;
+}
+
+int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret = 0;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	switch (args->param) {
+	case I915_CONTEXT_PARAM_NO_ZEROMAP:
+		args->size = 0;
+		args->value = test_bit(UCONTEXT_NO_ZEROMAP, &ctx->user_flags);
+		break;
+
+	case I915_CONTEXT_PARAM_GTT_SIZE:
+		args->size = 0;
+		if (ctx->ppgtt)
+			args->value = ctx->ppgtt->vm.total;
+		else if (to_i915(dev)->mm.aliasing_ppgtt)
+			args->value = to_i915(dev)->mm.aliasing_ppgtt->vm.total;
+		else
+			args->value = to_i915(dev)->ggtt.vm.total;
+		break;
+
+	case I915_CONTEXT_PARAM_NO_ERROR_CAPTURE:
+		args->size = 0;
+		args->value = i915_gem_context_no_error_capture(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_BANNABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_bannable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_RECOVERABLE:
+		args->size = 0;
+		args->value = i915_gem_context_is_recoverable(ctx);
+		break;
+
+	case I915_CONTEXT_PARAM_PRIORITY:
+		args->size = 0;
+		args->value = ctx->sched.priority >> I915_USER_PRIORITY_SHIFT;
+		break;
+
+	case I915_CONTEXT_PARAM_SSEU:
+		ret = get_sseu(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
+	default:
+		ret = -EINVAL;
+		break;
+	}
+
+	i915_gem_context_put(ctx);
+	return ret;
+}
+
+int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
+				    struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_context_param *args = data;
+	struct i915_gem_context *ctx;
+	int ret;
+
+	ctx = i915_gem_context_lookup(file_priv, args->ctx_id);
+	if (!ctx)
+		return -ENOENT;
+
+	ret = ctx_setparam(ctx, args);
+
 	i915_gem_context_put(ctx);
 	return ret;
 }
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 6575470755d0..0db92a4153c8 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -392,6 +392,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GET_SPRITE_COLORKEY DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GET_SPRITE_COLORKEY, struct drm_intel_sprite_colorkey)
 #define DRM_IOCTL_I915_GEM_WAIT		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_WAIT, struct drm_i915_gem_wait)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)
 #define DRM_IOCTL_I915_GEM_CONTEXT_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_DESTROY, struct drm_i915_gem_context_destroy)
 #define DRM_IOCTL_I915_REG_READ			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_REG_READ, struct drm_i915_reg_read)
 #define DRM_IOCTL_I915_GET_RESET_STATS		DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GET_RESET_STATS, struct drm_i915_reset_stats)
@@ -1445,85 +1446,17 @@ struct drm_i915_gem_wait {
 };
 
 struct drm_i915_gem_context_create {
-	/*  output: id of new context*/
-	__u32 ctx_id;
-	__u32 pad;
-};
-
-struct drm_i915_gem_context_destroy {
-	__u32 ctx_id;
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 pad;
 };
 
-/*
- * DRM_I915_GEM_VM_CREATE -
- *
- * Create a new virtual memory address space (ppGTT) for use within a context
- * on the same file. Extensions can be provided to configure exactly how the
- * address space is setup upon creation.
- *
- * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
- * returned.
- *
- * DRM_I915_GEM_VM_DESTROY -
- *
- * Destroys a previously created VM id.
- */
-struct drm_i915_gem_vm_control {
-	__u64 extensions;
+struct drm_i915_gem_context_create_ext {
+	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
-	__u32 id;
-};
-
-struct drm_i915_reg_read {
-	/*
-	 * Register offset.
-	 * For 64bit wide registers where the upper 32bits don't immediately
-	 * follow the lower 32bits, the offset of the lower 32bits must
-	 * be specified
-	 */
-	__u64 offset;
-#define I915_REG_READ_8B_WA (1ul << 0)
-
-	__u64 val; /* Return value */
-};
-/* Known registers:
- *
- * Render engine timestamp - 0x2358 + 64bit - gen7+
- * - Note this register returns an invalid value if using the default
- *   single instruction 8byte read, in order to workaround that pass
- *   flag I915_REG_READ_8B_WA in offset field.
- *
- */
-
-struct drm_i915_reset_stats {
-	__u32 ctx_id;
-	__u32 flags;
-
-	/* All resets since boot/module reload, for all contexts */
-	__u32 reset_count;
-
-	/* Number of batches lost when active in GPU, for this context */
-	__u32 batch_active;
-
-	/* Number of batches lost pending for execution, for this context */
-	__u32 batch_pending;
-
-	__u32 pad;
-};
-
-struct drm_i915_gem_userptr {
-	__u64 user_ptr;
-	__u64 user_size;
-	__u32 flags;
-#define I915_USERPTR_READ_ONLY 0x1
-#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
-	/**
-	 * Returned handle for the object.
-	 *
-	 * Object handles are nonzero.
-	 */
-	__u32 handle;
+#define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
+#define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
+	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
+	__u64 extensions;
 };
 
 struct drm_i915_gem_context_param {
@@ -1639,6 +1572,89 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct drm_i915_gem_context_create_ext_setparam {
+#define I915_CONTEXT_CREATE_EXT_SETPARAM 0
+	struct i915_user_extension base;
+	struct drm_i915_gem_context_param setparam;
+};
+
+struct drm_i915_gem_context_destroy {
+	__u32 ctx_id;
+	__u32 pad;
+};
+
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
+struct drm_i915_reg_read {
+	/*
+	 * Register offset.
+	 * For 64bit wide registers where the upper 32bits don't immediately
+	 * follow the lower 32bits, the offset of the lower 32bits must
+	 * be specified
+	 */
+	__u64 offset;
+#define I915_REG_READ_8B_WA (1ul << 0)
+
+	__u64 val; /* Return value */
+};
+
+/* Known registers:
+ *
+ * Render engine timestamp - 0x2358 + 64bit - gen7+
+ * - Note this register returns an invalid value if using the default
+ *   single instruction 8byte read, in order to workaround that pass
+ *   flag I915_REG_READ_8B_WA in offset field.
+ *
+ */
+
+struct drm_i915_reset_stats {
+	__u32 ctx_id;
+	__u32 flags;
+
+	/* All resets since boot/module reload, for all contexts */
+	__u32 reset_count;
+
+	/* Number of batches lost when active in GPU, for this context */
+	__u32 batch_active;
+
+	/* Number of batches lost pending for execution, for this context */
+	__u32 batch_pending;
+
+	__u32 pad;
+};
+
+struct drm_i915_gem_userptr {
+	__u64 user_ptr;
+	__u64 user_size;
+	__u32 flags;
+#define I915_USERPTR_READ_ONLY 0x1
+#define I915_USERPTR_UNSYNCHRONIZED 0x80000000
+	/**
+	 * Returned handle for the object.
+	 *
+	 * Object handles are nonzero.
+	 */
+	__u32 handle;
+};
+
 enum drm_i915_oa_format {
 	I915_OA_FORMAT_A13 = 1,	    /* HSW only */
 	I915_OA_FORMAT_A29,	    /* HSW only */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 06/13] drm/i915: Allow contexts to share a single timeline across all engines
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (4 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 05/13] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 15:56   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
                   ` (10 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

Previously, our view has been always to run the engines independently
within a context. (Multiple engines happened before we had contexts and
timelines, so they always operated independently and that behaviour
persisted into contexts.) However, at the user level the context often
represents a single timeline (e.g. GL contexts) and userspace must
ensure that the individual engines are serialised to present that
ordering to the client (or forgot about this detail entirely and hope no
one notices - a fair ploy if the client can only directly control one
engine themselves ;)

In the next patch, we will want to construct a set of engines that
operate as one, that have a single timeline interwoven between them, to
present a single virtual engine to the user. (They submit to the virtual
engine, then we decide which engine to execute on based.)

To that end, we want to be able to create contexts which have a single
timeline (fence context) shared between all engines, rather than multiple
timelines.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 32 ++++++--
 drivers/gpu/drm/i915/i915_gem_context_types.h |  2 +
 drivers/gpu/drm/i915/i915_request.c           | 80 +++++++++++++------
 drivers/gpu/drm/i915/i915_request.h           |  5 +-
 drivers/gpu/drm/i915/i915_sw_fence.c          | 39 +++++++--
 drivers/gpu/drm/i915/i915_sw_fence.h          | 13 ++-
 drivers/gpu/drm/i915/intel_lrc.c              |  5 +-
 .../gpu/drm/i915/selftests/i915_gem_context.c | 18 +++--
 drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
 include/uapi/drm/i915_drm.h                   |  3 +-
 10 files changed, 149 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b41b09f60edd..310892b42b68 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -237,6 +237,9 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		it->ops->destroy(it);
 
+	if (ctx->timeline)
+		i915_timeline_put(ctx->timeline);
+
 	kfree(ctx->name);
 	put_pid(ctx->pid);
 
@@ -448,12 +451,17 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
 
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
-			struct drm_i915_file_private *file_priv)
+			struct drm_i915_file_private *file_priv,
+			unsigned int flags)
 {
 	struct i915_gem_context *ctx;
 
 	lockdep_assert_held(&dev_priv->drm.struct_mutex);
 
+	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE &&
+	    !HAS_EXECLISTS(dev_priv))
+		return ERR_PTR(-EINVAL);
+
 	/* Reap the most stale context */
 	contexts_free_first(dev_priv);
 
@@ -476,6 +484,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 		i915_ppgtt_put(ppgtt);
 	}
 
+	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) {
+		struct i915_timeline *timeline;
+
+		timeline = i915_timeline_create(dev_priv, ctx->name, NULL);
+		if (IS_ERR(timeline)) {
+			__destroy_hw_context(ctx, file_priv);
+			return ERR_CAST(timeline);
+		}
+
+		ctx->timeline = timeline;
+	}
+
 	trace_i915_context_create(ctx);
 
 	return ctx;
@@ -504,7 +524,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
 	if (ret)
 		return ERR_PTR(ret);
 
-	ctx = i915_gem_create_context(to_i915(dev), NULL);
+	ctx = i915_gem_create_context(to_i915(dev), NULL, 0);
 	if (IS_ERR(ctx))
 		goto out;
 
@@ -540,7 +560,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
 	struct i915_gem_context *ctx;
 	int err;
 
-	ctx = i915_gem_create_context(i915, NULL);
+	ctx = i915_gem_create_context(i915, NULL, 0);
 	if (IS_ERR(ctx))
 		return ctx;
 
@@ -672,7 +692,7 @@ int i915_gem_context_open(struct drm_i915_private *i915,
 	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
-	ctx = i915_gem_create_context(i915, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv, 0);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
@@ -788,7 +808,7 @@ last_request_on_engine(struct i915_timeline *timeline,
 
 	rq = i915_active_request_raw(&timeline->last_request,
 				     &engine->i915->drm.struct_mutex);
-	if (rq && rq->engine == engine) {
+	if (rq && rq->engine->mask & engine->mask) {
 		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
 			  timeline->name, engine->name,
 			  rq->fence.context, rq->fence.seqno);
@@ -1448,7 +1468,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(i915, file_priv);
+	ctx = i915_gem_create_context(i915, file_priv, args->flags);
 	mutex_unlock(&dev->struct_mutex);
 	if (IS_ERR(ctx))
 		return PTR_ERR(ctx);
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index 2bf19730eaa9..f8f6e6c960a7 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -41,6 +41,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct i915_timeline *timeline;
+
 	/**
 	 * @ppgtt: unique address space (GTT)
 	 *
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 9533a85cb0b3..09046a15d218 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -993,6 +993,60 @@ void i915_request_skip(struct i915_request *rq, int error)
 	memset(vaddr + head, 0, rq->postfix - head);
 }
 
+static struct i915_request *
+__i915_request_await_timeline(struct i915_request *rq)
+{
+	struct i915_timeline *timeline = rq->timeline;
+	struct i915_request *prev;
+
+	/*
+	 * Dependency tracking and request ordering along the timeline
+	 * is special cased so that we can eliminate redundant ordering
+	 * operations while building the request (we know that the timeline
+	 * itself is order, and here we guarantee it).
+	 *
+	 * As we know we will need to emit tracking along the timeline,
+	 * we embed the hooks into our request struct -- at the cost of
+	 * having to have specialised no-allocation interfaces (which will
+	 * be beneficial elsewhere).
+	 *
+	 * A second benefit to open-coding i915_request_await_request is
+	 * that we can apply a slight variant of the rules specialised
+	 * for timelines that jump between engines (such as virtual engines).
+	 * If we consider the case of virtual engine, we must emit a dma-fence
+	 * to prevent scheduling of the second request until the first is
+	 * complete (to maximise our greedy late load balancing) and this
+	 * precludes optimising to use semaphores serialisation of a single
+	 * timeline across engines.
+	 */
+	prev = i915_active_request_raw(&timeline->last_request,
+				       &rq->i915->drm.struct_mutex);
+	if (prev && !i915_request_completed(prev)) {
+		if (is_power_of_2(prev->engine->mask | rq->engine->mask))
+			i915_sw_fence_await_sw_fence(&rq->submit,
+						     &prev->submit,
+						     &rq->submitq);
+		else
+			__i915_sw_fence_await_dma_fence(&rq->submit,
+							&prev->fence,
+							&rq->dmaq);
+		if (rq->engine->schedule)
+			__i915_sched_node_add_dependency(&rq->sched,
+							 &prev->sched,
+							 &rq->dep,
+							 0);
+	}
+
+	spin_lock_irq(&timeline->lock);
+	list_add_tail(&rq->link, &timeline->requests);
+	spin_unlock_irq(&timeline->lock);
+
+	GEM_BUG_ON(timeline->seqno != rq->fence.seqno);
+	__i915_active_request_set(&timeline->last_request, rq);
+
+	return prev;
+}
+
 /*
  * NB: This function is not allowed to fail. Doing so would mean the the
  * request is not being tracked for completion but the work itself is
@@ -1037,31 +1091,7 @@ void i915_request_add(struct i915_request *request)
 	GEM_BUG_ON(IS_ERR(cs));
 	request->postfix = intel_ring_offset(request, cs);
 
-	/*
-	 * Seal the request and mark it as pending execution. Note that
-	 * we may inspect this state, without holding any locks, during
-	 * hangcheck. Hence we apply the barrier to ensure that we do not
-	 * see a more recent value in the hws than we are tracking.
-	 */
-
-	prev = i915_active_request_raw(&timeline->last_request,
-				       &request->i915->drm.struct_mutex);
-	if (prev && !i915_request_completed(prev)) {
-		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
-					     &request->submitq);
-		if (engine->schedule)
-			__i915_sched_node_add_dependency(&request->sched,
-							 &prev->sched,
-							 &request->dep,
-							 0);
-	}
-
-	spin_lock_irq(&timeline->lock);
-	list_add_tail(&request->link, &timeline->requests);
-	spin_unlock_irq(&timeline->lock);
-
-	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
-	__i915_active_request_set(&timeline->last_request, request);
+	prev = __i915_request_await_timeline(request);
 
 	list_add_tail(&request->ring_link, &ring->request_list);
 	if (list_is_first(&request->ring_link, &ring->request_list)) {
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index 8c8fa5010644..cd6c130964cd 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -128,7 +128,10 @@ struct i915_request {
 	 * It is used by the driver to then queue the request for execution.
 	 */
 	struct i915_sw_fence submit;
-	wait_queue_entry_t submitq;
+	union {
+		wait_queue_entry_t submitq;
+		struct i915_sw_dma_fence_cb dmaq;
+	};
 	struct list_head execute_cb;
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
index 8d1400d378d7..5387aafd3424 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.c
+++ b/drivers/gpu/drm/i915/i915_sw_fence.c
@@ -359,11 +359,6 @@ int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 	return __i915_sw_fence_await_sw_fence(fence, signaler, NULL, gfp);
 }
 
-struct i915_sw_dma_fence_cb {
-	struct dma_fence_cb base;
-	struct i915_sw_fence *fence;
-};
-
 struct i915_sw_dma_fence_cb_timer {
 	struct i915_sw_dma_fence_cb base;
 	struct dma_fence *dma;
@@ -480,6 +475,40 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 	return ret;
 }
 
+static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
+				     struct dma_fence_cb *data)
+{
+	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
+
+	i915_sw_fence_complete(cb->fence);
+}
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb)
+{
+	int ret;
+
+	debug_fence_assert(fence);
+
+	if (dma_fence_is_signaled(dma))
+		return 0;
+
+	cb->fence = fence;
+	i915_sw_fence_await(fence);
+
+	ret = dma_fence_add_callback(dma, &cb->base, __dma_i915_sw_fence_wake);
+	if (ret == 0) {
+		ret = 1;
+	} else {
+		i915_sw_fence_complete(fence);
+		if (ret == -ENOENT) /* fence already signaled */
+			ret = 0;
+	}
+
+	return ret;
+}
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
index 6dec9e1d1102..9cb5c3b307a6 100644
--- a/drivers/gpu/drm/i915/i915_sw_fence.h
+++ b/drivers/gpu/drm/i915/i915_sw_fence.h
@@ -9,14 +9,13 @@
 #ifndef _I915_SW_FENCE_H_
 #define _I915_SW_FENCE_H_
 
+#include <linux/dma-fence.h>
 #include <linux/gfp.h>
 #include <linux/kref.h>
 #include <linux/notifier.h> /* for NOTIFY_DONE */
 #include <linux/wait.h>
 
 struct completion;
-struct dma_fence;
-struct dma_fence_ops;
 struct reservation_object;
 
 struct i915_sw_fence {
@@ -68,10 +67,20 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
 int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
 				     struct i915_sw_fence *after,
 				     gfp_t gfp);
+
+struct i915_sw_dma_fence_cb {
+	struct dma_fence_cb base;
+	struct i915_sw_fence *fence;
+};
+
+int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
+				    struct dma_fence *dma,
+				    struct i915_sw_dma_fence_cb *cb);
 int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
 				  struct dma_fence *dma,
 				  unsigned long timeout,
 				  gfp_t gfp);
+
 int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
 				    struct reservation_object *resv,
 				    const struct dma_fence_ops *exclude,
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 748352d513d6..7b938eaff9c5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2809,7 +2809,10 @@ populate_lr_context(struct intel_context *ce,
 
 static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
 {
-	return i915_timeline_create(ctx->i915, ctx->name, NULL);
+	if (ctx->timeline)
+		return i915_timeline_get(ctx->timeline);
+	else
+		return i915_timeline_create(ctx->i915, ctx->name, NULL);
 }
 
 static int execlists_context_deferred_alloc(struct intel_context *ce,
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index c4a5cf26992e..3e5e384d00d5 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
 	}
 
 	for (n = 0; n < nctx; n++) {
-		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
+		ctx[n] = i915_gem_create_context(i915, file->driver_priv, 0);
 		if (IS_ERR(ctx[n])) {
 			err = PTR_ERR(ctx[n]);
 			goto out_unlock;
@@ -526,7 +526,8 @@ static int igt_ctx_exec(void *arg)
 			struct i915_gem_context *ctx;
 			intel_wakeref_t wakeref;
 
-			ctx = i915_gem_create_context(i915, file->driver_priv);
+			ctx = i915_gem_create_context(i915,
+						      file->driver_priv, 0);
 			if (IS_ERR(ctx)) {
 				err = PTR_ERR(ctx);
 				goto out_unlock;
@@ -611,7 +612,7 @@ static int igt_shared_ctx_exec(void *arg)
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	parent = i915_gem_create_context(i915, file->driver_priv);
+	parent = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(parent)) {
 		err = PTR_ERR(parent);
 		goto out_unlock;
@@ -645,7 +646,8 @@ static int igt_shared_ctx_exec(void *arg)
 			if (ctx)
 				__destroy_hw_context(ctx, file->driver_priv);
 
-			ctx = i915_gem_create_context(i915, file->driver_priv);
+			ctx = i915_gem_create_context(i915,
+						      file->driver_priv, 0);
 			if (IS_ERR(ctx)) {
 				err = PTR_ERR(ctx);
 				goto out_unlock;
@@ -1087,7 +1089,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
 
 	mutex_lock(&i915->drm.struct_mutex);
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx)) {
 		ret = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1197,7 +1199,7 @@ static int igt_ctx_readonly(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx = i915_gem_create_context(i915, file->driver_priv);
+	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto out_unlock;
@@ -1523,13 +1525,13 @@ static int igt_vm_isolation(void *arg)
 	if (err)
 		goto out_unlock;
 
-	ctx_a = i915_gem_create_context(i915, file->driver_priv);
+	ctx_a = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx_a)) {
 		err = PTR_ERR(ctx_a);
 		goto out_unlock;
 	}
 
-	ctx_b = i915_gem_create_context(i915, file->driver_priv);
+	ctx_b = i915_gem_create_context(i915, file->driver_priv, 0);
 	if (IS_ERR(ctx_b)) {
 		err = PTR_ERR(ctx_b);
 		goto out_unlock;
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index f90328b21763..1d6dc2fe36ab 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -94,7 +94,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
 {
 	lockdep_assert_held(&i915->drm.struct_mutex);
 
-	return i915_gem_create_context(i915, file->driver_priv);
+	return i915_gem_create_context(i915, file->driver_priv, 0);
 }
 
 struct i915_gem_context *
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 0db92a4153c8..007d77ff7295 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1454,8 +1454,9 @@ struct drm_i915_gem_context_create_ext {
 	__u32 ctx_id; /* output: id of new context*/
 	__u32 flags;
 #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
+#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
 #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
-	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
+	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
 	__u64 extensions;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (5 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 06/13] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 16:13   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 08/13] drm/i915: Allow a context to define its set of engines Chris Wilson
                   ` (9 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

A usecase arose out of handling context recovery in mesa, whereby they
wish to recreate a context with fresh logical state but preserving all
other details of the original. Currently, they create a new context and
iterate over which bits they want to copy across, but it would much more
convenient if they were able to just pass in a target context to clone
during creation. This essentially extends the setparam during creation
to pull the details from a target context instead of the user supplied
parameters.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 90 +++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h             | 14 ++++
 2 files changed, 104 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 310892b42b68..2cfc68b66944 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1428,8 +1428,98 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
 	return ctx_setparam(data, &local.setparam);
 }
 
+static int clone_sseu(struct i915_gem_context *dst,
+		      struct i915_gem_context *src)
+{
+	const struct intel_sseu default_sseu =
+		intel_device_default_sseu(dst->i915);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+
+	for_each_engine(engine, dst->i915, id) {
+		struct intel_context *ce;
+		struct intel_sseu sseu;
+
+		ce = intel_context_lookup(src, engine);
+		if (!ce)
+			continue;
+
+		sseu = ce->sseu;
+		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
+			continue;
+
+		ce = intel_context_pin_lock(dst, engine);
+		if (IS_ERR(ce))
+			return PTR_ERR(ce);
+
+		ce->sseu = sseu;
+		intel_context_pin_unlock(ce);
+	}
+
+	return 0;
+}
+
+static int create_clone(struct i915_user_extension __user *ext, void *data)
+{
+	struct drm_i915_gem_context_create_ext_clone local;
+	struct i915_gem_context *dst = data;
+	struct i915_gem_context *src;
+	int err;
+
+	if (copy_from_user(&local, ext, sizeof(local)))
+		return -EFAULT;
+
+	if (local.flags & I915_CONTEXT_CLONE_UNKNOWN)
+		return -EINVAL;
+
+	if (local.rsvd)
+		return -EINVAL;
+
+	if (local.clone == dst->user_handle) /* good guess! denied. */
+		return -ENOENT;
+
+	rcu_read_lock();
+	src = __i915_gem_context_lookup_rcu(dst->file_priv, local.clone);
+	rcu_read_unlock();
+	if (!src)
+		return -ENOENT;
+
+	GEM_BUG_ON(src == dst);
+
+	if (local.flags & I915_CONTEXT_CLONE_FLAGS)
+		dst->user_flags = src->user_flags;
+
+	if (local.flags & I915_CONTEXT_CLONE_SCHED)
+		dst->sched = src->sched;
+
+	if (local.flags & I915_CONTEXT_CLONE_SSEU) {
+		err = clone_sseu(dst, src);
+		if (err)
+			return err;
+	}
+
+	if (local.flags & I915_CONTEXT_CLONE_TIMELINE && src->timeline) {
+		if (dst->timeline)
+			i915_timeline_put(dst->timeline);
+		dst->timeline = i915_timeline_get(src->timeline);
+	}
+
+	if (local.flags & I915_CONTEXT_CLONE_VM && src->ppgtt) {
+		GEM_BUG_ON(dst->ppgtt == src->ppgtt);
+
+		if (dst->ppgtt)
+			i915_ppgtt_put(dst->ppgtt);
+
+		dst->ppgtt = i915_ppgtt_get(src->ppgtt);
+		i915_ppgtt_open(dst->ppgtt);
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn create_extensions[] = {
 	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
+	[I915_CONTEXT_CREATE_EXT_CLONE] = create_clone,
 };
 
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 007d77ff7295..50d154954d5f 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1579,6 +1579,20 @@ struct drm_i915_gem_context_create_ext_setparam {
 	struct drm_i915_gem_context_param setparam;
 };
 
+struct drm_i915_gem_context_create_ext_clone {
+#define I915_CONTEXT_CREATE_EXT_CLONE 1
+	struct i915_user_extension base;
+	__u32 clone;
+	__u32 flags;
+#define I915_CONTEXT_CLONE_FLAGS	(1u << 0)
+#define I915_CONTEXT_CLONE_SCHED	(1u << 1)
+#define I915_CONTEXT_CLONE_SSEU		(1u << 2)
+#define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
+#define I915_CONTEXT_CLONE_VM		(1u << 4)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+	__u64 rsvd;
+};
+
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (6 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 16:27   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

Over the last few years, we have debated how to extend the user API to
support an increase in the number of engines, that may be sparse and
even be heterogeneous within a class (not all video decoders created
equal). We settled on using (class, instance) tuples to identify a
specific engine, with an API for the user to construct a map of engines
to capabilities. Into this picture, we then add a challenge of virtual
engines; one user engine that maps behind the scenes to any number of
physical engines. To keep it general, we want the user to have full
control over that mapping. To that end, we allow the user to constrain a
context to define the set of engines that it can access, order fully
controlled by the user via (class, instance). With such precise control
in context setup, we can continue to use the existing execbuf uABI of
specifying a single index; only now it doesn't automagically map onto
the engines, it uses the user defined engine map from the context.

The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
execbuf. It's use will be revealed in the next patch.

v2: Fixup freeing of local on success of get_engines()

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c       | 204 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context_types.h |   4 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  22 +-
 include/uapi/drm/i915_drm.h                   |  42 +++-
 4 files changed, 259 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 2cfc68b66944..86d9bea6f275 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -101,6 +101,21 @@ static struct i915_global_gem_context {
 	struct kmem_cache *slab_luts;
 } global;
 
+static struct intel_engine_cs *
+lookup_user_engine(struct i915_gem_context *ctx,
+		   unsigned long flags, u16 class, u16 instance)
+#define LOOKUP_USER_INDEX BIT(0)
+{
+	if (flags & LOOKUP_USER_INDEX) {
+		if (instance >= ctx->nengine)
+			return NULL;
+
+		return ctx->engines[instance];
+	}
+
+	return intel_engine_lookup_user(ctx->i915, class, instance);
+}
+
 struct i915_lut_handle *i915_lut_handle_alloc(void)
 {
 	return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
@@ -234,6 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
 
+	kfree(ctx->engines);
+
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		it->ops->destroy(it);
 
@@ -1311,9 +1328,9 @@ static int set_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1331,9 +1348,154 @@ static int set_sseu(struct i915_gem_context *ctx,
 
 	args->size = sizeof(user_sseu);
 
+	return 0;
+};
+
+struct set_engines {
+	struct i915_gem_context *ctx;
+	struct intel_engine_cs **engines;
+	unsigned int nengine;
+};
+
+static const i915_user_extension_fn set_engines__extensions[] = {
+};
+
+static int
+set_engines(struct i915_gem_context *ctx,
+	    const struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines __user *user;
+	struct set_engines set = { .ctx = ctx };
+	u64 size, extensions;
+	unsigned int n;
+	int err;
+
+	user = u64_to_user_ptr(args->value);
+	size = args->size;
+	if (!size)
+		goto out;
+
+	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
+	if (size < sizeof(*user) || size % sizeof(*user->class_instance))
+		return -EINVAL;
+
+	set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
+	if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK + 1)
+		return -EINVAL;
+
+	set.engines = kmalloc_array(set.nengine,
+				    sizeof(*set.engines),
+				    GFP_KERNEL);
+	if (!set.engines)
+		return -ENOMEM;
+
+	for (n = 0; n < set.nengine; n++) {
+		u16 class, inst;
+
+		if (get_user(class, &user->class_instance[n].engine_class) ||
+		    get_user(inst, &user->class_instance[n].engine_instance)) {
+			kfree(set.engines);
+			return -EFAULT;
+		}
+
+		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
+		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
+			set.engines[n] = NULL;
+			continue;
+		}
+
+		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
+		if (!set.engines[n]) {
+			kfree(set.engines);
+			return -ENOENT;
+		}
+	}
+
+	err = -EFAULT;
+	if (!get_user(extensions, &user->extensions))
+		err = i915_user_extensions(u64_to_user_ptr(extensions),
+					   set_engines__extensions,
+					   ARRAY_SIZE(set_engines__extensions),
+					   &set);
+	if (err) {
+		kfree(set.engines);
+		return err;
+	}
+
+out:
+	mutex_lock(&ctx->i915->drm.struct_mutex);
+	kfree(ctx->engines);
+	ctx->engines = set.engines;
+	ctx->nengine = set.nengine;
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
 	return 0;
 }
 
+static int
+get_engines(struct i915_gem_context *ctx,
+	    struct drm_i915_gem_context_param *args)
+{
+	struct i915_context_param_engines *local;
+	unsigned int n, count, size;
+	int err = 0;
+
+restart:
+	count = READ_ONCE(ctx->nengine);
+	if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
+		return -ENOMEM; /* unrepresentable! */
+
+	size = sizeof(*local) + count * sizeof(*local->class_instance);
+	if (!args->size) {
+		args->size = size;
+		return 0;
+	}
+	if (args->size < size)
+		return -EINVAL;
+
+	local = kmalloc(size, GFP_KERNEL);
+	if (!local)
+		return -ENOMEM;
+
+	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
+		err = -EINTR;
+		goto out;
+	}
+
+	if (READ_ONCE(ctx->nengine) != count) {
+		mutex_unlock(&ctx->i915->drm.struct_mutex);
+		kfree(local);
+		goto restart;
+	}
+
+	local->extensions = 0;
+	for (n = 0; n < count; n++) {
+		if (ctx->engines[n]) {
+			local->class_instance[n].engine_class =
+				ctx->engines[n]->uabi_class;
+			local->class_instance[n].engine_instance =
+				ctx->engines[n]->instance;
+		} else {
+			local->class_instance[n].engine_class =
+				I915_ENGINE_CLASS_INVALID;
+			local->class_instance[n].engine_instance =
+				I915_ENGINE_CLASS_INVALID_NONE;
+		}
+	}
+
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
+		err = -EFAULT;
+		goto out;
+	}
+	args->size = size;
+
+out:
+	kfree(local);
+	return err;
+}
+
 static int ctx_setparam(struct i915_gem_context *ctx,
 			struct drm_i915_gem_context_param *args)
 {
@@ -1406,6 +1568,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
 		ret = set_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = set_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
@@ -1459,6 +1625,22 @@ static int clone_sseu(struct i915_gem_context *dst,
 	return 0;
 }
 
+static int clone_engines(struct i915_gem_context *dst,
+			 struct i915_gem_context *src)
+{
+	struct intel_engine_cs **engines;
+
+	engines = kmemdup(src->engines,
+			  sizeof(*src->engines) * src->nengine,
+			  GFP_KERNEL);
+	if (!engines)
+		return -ENOMEM;
+
+	dst->engines = engines;
+	dst->nengine = src->nengine;
+	return 0;
+}
+
 static int create_clone(struct i915_user_extension __user *ext, void *data)
 {
 	struct drm_i915_gem_context_create_ext_clone local;
@@ -1514,6 +1696,12 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
 		i915_ppgtt_open(dst->ppgtt);
 	}
 
+	if (local.flags & I915_CONTEXT_CLONE_ENGINES && src->nengine) {
+		err = clone_engines(dst, src);
+		if (err)
+			return err;
+	}
+
 	return 0;
 }
 
@@ -1632,9 +1820,9 @@ static int get_sseu(struct i915_gem_context *ctx,
 	if (user_sseu.flags || user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = intel_engine_lookup_user(ctx->i915,
-					  user_sseu.engine_class,
-					  user_sseu.engine_instance);
+	engine = lookup_user_engine(ctx, 0,
+				    user_sseu.engine_class,
+				    user_sseu.engine_instance);
 	if (!engine)
 		return -EINVAL;
 
@@ -1715,6 +1903,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 		ret = get_ppgtt(ctx, args);
 		break;
 
+	case I915_CONTEXT_PARAM_ENGINES:
+		ret = get_engines(ctx, args);
+		break;
+
 	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
index f8f6e6c960a7..8a89f3053f73 100644
--- a/drivers/gpu/drm/i915/i915_gem_context_types.h
+++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
@@ -41,6 +41,8 @@ struct i915_gem_context {
 	/** file_priv: owning file descriptor */
 	struct drm_i915_file_private *file_priv;
 
+	struct intel_engine_cs **engines;
+
 	struct i915_timeline *timeline;
 
 	/**
@@ -110,6 +112,8 @@ struct i915_gem_context {
 #define CONTEXT_CLOSED			1
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
 
+	unsigned int nengine;
+
 	/**
 	 * @hw_id: - unique identifier for the context
 	 *
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index ee6d301a9627..67e4a0c2ebff 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2090,13 +2090,23 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
 };
 
 static struct intel_engine_cs *
-eb_select_engine(struct drm_i915_private *dev_priv,
+eb_select_engine(struct i915_execbuffer *eb,
 		 struct drm_file *file,
 		 struct drm_i915_gem_execbuffer2 *args)
 {
 	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
 	struct intel_engine_cs *engine;
 
+	if (eb->ctx->engines) {
+		if (user_ring_id >= eb->ctx->nengine) {
+			DRM_DEBUG("execbuf with unknown ring: %u\n",
+				  user_ring_id);
+			return NULL;
+		}
+
+		return eb->ctx->engines[user_ring_id];
+	}
+
 	if (user_ring_id > I915_USER_RINGS) {
 		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
 		return NULL;
@@ -2109,11 +2119,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 		return NULL;
 	}
 
-	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(dev_priv, VCS1)) {
+	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(eb->i915, VCS1)) {
 		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
 
 		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
-			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
+			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
 		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
 			   bsd_idx <= I915_EXEC_BSD_RING2) {
 			bsd_idx >>= I915_EXEC_BSD_SHIFT;
@@ -2124,9 +2134,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
 			return NULL;
 		}
 
-		engine = dev_priv->engine[_VCS(bsd_idx)];
+		engine = eb->i915->engine[_VCS(bsd_idx)];
 	} else {
-		engine = dev_priv->engine[user_ring_map[user_ring_id]];
+		engine = eb->i915->engine[user_ring_map[user_ring_id]];
 	}
 
 	if (!engine) {
@@ -2336,7 +2346,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 	if (unlikely(err))
 		goto err_destroy;
 
-	eb.engine = eb_select_engine(eb.i915, file, args);
+	eb.engine = eb_select_engine(&eb, file, args);
 	if (!eb.engine) {
 		err = -EINVAL;
 		goto err_engine;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 50d154954d5f..00147b990e63 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -124,6 +124,8 @@ enum drm_i915_gem_engine_class {
 	I915_ENGINE_CLASS_INVALID	= -1
 };
 
+#define I915_ENGINE_CLASS_INVALID_NONE -1
+
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
  *
@@ -1509,6 +1511,26 @@ struct drm_i915_gem_context_param {
 	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
 	 */
 #define I915_CONTEXT_PARAM_VM		0x9
+
+/*
+ * I915_CONTEXT_PARAM_ENGINES:
+ *
+ * Bind this context to operate on this subset of available engines. Henceforth,
+ * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
+ * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
+ * and upwards. Slots 0...N are filled in using the specified (class, instance).
+ * Use
+ *	engine_class: I915_ENGINE_CLASS_INVALID,
+ *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
+ * to specify a gap in the array that can be filled in later, e.g. by a
+ * virtual engine used for load balancing.
+ *
+ * Setting the number of engines bound to the context to 0, by passing a zero
+ * sized argument, will revert back to default settings.
+ *
+ * See struct i915_context_param_engines.
+ */
+#define I915_CONTEXT_PARAM_ENGINES	0xa
 /* Must be kept compact -- no holes and well documented */
 
 	__u64 value;
@@ -1573,6 +1595,23 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+struct i915_context_param_engines {
+	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+
+	struct {
+		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
+		__u16 engine_instance;
+	} class_instance[0];
+} __attribute__((packed));
+
+#define I915_DEFINE_CONTEXT_PARAM_ENGINES(name__, N__) struct { \
+	__u64 extensions; \
+	struct { \
+		__u16 engine_class; \
+		__u16 engine_instance; \
+	} class_instance[N__]; \
+} __attribute__((packed)) name__
+
 struct drm_i915_gem_context_create_ext_setparam {
 #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
 	struct i915_user_extension base;
@@ -1589,7 +1628,8 @@ struct drm_i915_gem_context_create_ext_clone {
 #define I915_CONTEXT_CLONE_SSEU		(1u << 2)
 #define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
 #define I915_CONTEXT_CLONE_VM		(1u << 4)
-#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
+#define I915_CONTEXT_CLONE_ENGINES	(1u << 5)
+#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_ENGINES << 1)
 	__u64 rsvd;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (7 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 08/13] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-08 16:31   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 10/13] drm/i915: Load balancing across a virtual engine Chris Wilson
                   ` (7 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

Allow the user to specify a local engine index (as opposed to
class:index) that they can use to refer to a preset engine inside the
ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
This will be useful for setting SSEU parameters on virtual engines that
are local to the context and do not have a valid global class:instance
lookup.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
 include/uapi/drm/i915_drm.h             |  3 ++-
 2 files changed, 22 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 86d9bea6f275..a581c01ffff1 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1313,6 +1313,7 @@ static int set_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_sseu sseu;
+	unsigned long lookup;
 	int ret;
 
 	if (args->size < sizeof(user_sseu))
@@ -1325,10 +1326,17 @@ static int set_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
@@ -1807,6 +1815,7 @@ static int get_sseu(struct i915_gem_context *ctx,
 	struct drm_i915_gem_context_param_sseu user_sseu;
 	struct intel_engine_cs *engine;
 	struct intel_context *ce;
+	unsigned long lookup;
 
 	if (args->size == 0)
 		goto out;
@@ -1817,10 +1826,17 @@ static int get_sseu(struct i915_gem_context *ctx,
 			   sizeof(user_sseu)))
 		return -EFAULT;
 
-	if (user_sseu.flags || user_sseu.rsvd)
+	if (user_sseu.rsvd)
 		return -EINVAL;
 
-	engine = lookup_user_engine(ctx, 0,
+	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
+		return -EINVAL;
+
+	lookup = 0;
+	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
+		lookup |= LOOKUP_USER_INDEX;
+
+	engine = lookup_user_engine(ctx, lookup,
 				    user_sseu.engine_class,
 				    user_sseu.engine_instance);
 	if (!engine)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 00147b990e63..a609619610f2 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1565,9 +1565,10 @@ struct drm_i915_gem_context_param_sseu {
 	__u16 engine_instance;
 
 	/*
-	 * Unused for now. Must be cleared to zero.
+	 * Unknown flags must be cleared to zero.
 	 */
 	__u32 flags;
+#define I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX (1u << 0)
 
 	/*
 	 * Mask of slices to enable for the context. Valid values are a subset
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 10/13] drm/i915: Load balancing across a virtual engine
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (8 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-11 12:47   ` Tvrtko Ursulin
  2019-03-12  7:52   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 11/13] drm/i915: Extend execution fence to support a callback Chris Wilson
                   ` (6 subsequent siblings)
  16 siblings, 2 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

Having allowed the user to define a set of engines that they will want
to only use, we go one step further and allow them to bind those engines
into a single virtual instance. Submitting a batch to the virtual engine
will then forward it to any one of the set in a manner as best to
distribute load.  The virtual engine has a single timeline across all
engines (it operates as a single queue), so it is not able to concurrently
run batches across multiple engines by itself; that is left up to the user
to submit multiple concurrent batches to multiple queues. Multiple users
will be load balanced across the system.

The mechanism used for load balancing in this patch is a late greedy
balancer. When a request is ready for execution, it is added to each
engine's queue, and when an engine is ready for its next request it
claims it from the virtual engine. The first engine to do so, wins, i.e.
the request is executed at the earliest opportunity (idle moment) in the
system.

As not all HW is created equal, the user is still able to skip the
virtual engine and execute the batch on a specific engine, all within the
same queue. It will then be executed in order on the correct engine,
with execution on other virtual engines being moved away due to the load
detection.

A couple of areas for potential improvement left!

- The virtual engine always take priority over equal-priority tasks.
Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
and hopefully the virtual and real engines are not then congested (i.e.
all work is via virtual engines, or all work is to the real engine).

- We require the breadcrumb irq around every virtual engine request. For
normal engines, we eliminate the need for the slow round trip via
interrupt by using the submit fence and queueing in order. For virtual
engines, we have to allow any job to transfer to a new ring, and cannot
coalesce the submissions, so require the completion fence instead,
forcing the persistent use of interrupts.

- We only drip feed single requests through each virtual engine and onto
the physical engines, even if there was enough work to fill all ELSP,
leaving small stalls with an idle CS event at the end of every request.
Could we be greedy and fill both slots? Being lazy is virtuous for load
distribution on less-than-full workloads though.

Other areas of improvement are more general, such as reducing lock
contention, reducing dispatch overhead, looking at direct submission
rather than bouncing around tasklets etc.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    | 153 +++++-
 drivers/gpu/drm/i915/i915_scheduler.c      |  17 +-
 drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
 drivers/gpu/drm/i915/intel_lrc.c           | 521 ++++++++++++++++++++-
 drivers/gpu/drm/i915/intel_lrc.h           |  11 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 +++++++
 include/uapi/drm/i915_drm.h                |  30 ++
 9 files changed, 895 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
index 74a2ddc1b52f..dbcea6e29d48 100644
--- a/drivers/gpu/drm/i915/i915_gem.h
+++ b/drivers/gpu/drm/i915/i915_gem.h
@@ -91,4 +91,9 @@ static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
 	return !atomic_read(&t->count);
 }
 
+static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
+{
+	return test_bit(TASKLET_STATE_SCHED, &t->state);
+}
+
 #endif /* __I915_GEM_H__ */
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index a581c01ffff1..13b79980f7f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -86,12 +86,16 @@
  */
 
 #include <linux/log2.h>
+#include <linux/nospec.h>
+
 #include <drm/i915_drm.h>
+
 #include "i915_drv.h"
 #include "i915_globals.h"
 #include "i915_trace.h"
 #include "i915_user_extensions.h"
 #include "intel_lrc_reg.h"
+#include "intel_lrc.h"
 #include "intel_workarounds.h"
 
 #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
@@ -238,6 +242,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
 	mutex_unlock(&i915->contexts.mutex);
 }
 
+static void free_engines(struct intel_engine_cs **engines, int count)
+{
+	int i;
+
+	if (!engines)
+		return;
+
+	/* We own the veng we created; regular engines are ignored */
+	for (i = 0; i < count; i++)
+		intel_virtual_engine_destroy(engines[i]);
+
+	kfree(engines);
+}
+
 static void i915_gem_context_free(struct i915_gem_context *ctx)
 {
 	struct intel_context *it, *n;
@@ -248,8 +266,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
 
 	release_hw_id(ctx);
 	i915_ppgtt_put(ctx->ppgtt);
-
-	kfree(ctx->engines);
+	free_engines(ctx->engines, ctx->nengine);
 
 	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
 		it->ops->destroy(it);
@@ -1359,13 +1376,116 @@ static int set_sseu(struct i915_gem_context *ctx,
 	return 0;
 };
 
+static int check_user_mbz16(u16 __user *user)
+{
+	u16 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
+static int check_user_mbz32(u32 __user *user)
+{
+	u32 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
+static int check_user_mbz64(u64 __user *user)
+{
+	u64 mbz;
+
+	if (get_user(mbz, user))
+		return -EFAULT;
+
+	return mbz ? -EINVAL : 0;
+}
+
 struct set_engines {
 	struct i915_gem_context *ctx;
 	struct intel_engine_cs **engines;
 	unsigned int nengine;
 };
 
+static int
+set_engines__load_balance(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_load_balance __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *ve;
+	unsigned int n;
+	u64 mask;
+	u16 idx;
+	int err;
+
+	if (!HAS_EXECLISTS(set->ctx->i915))
+		return -ENODEV;
+
+	if (USES_GUC_SUBMISSION(set->ctx->i915))
+		return -ENODEV; /* not implement yet */
+
+	if (get_user(idx, &ext->engine_index))
+		return -EFAULT;
+
+	if (idx >= set->nengine)
+		return -EINVAL;
+
+	idx = array_index_nospec(idx, set->nengine);
+	if (set->engines[idx])
+		return -EEXIST;
+
+	err = check_user_mbz16(&ext->mbz16);
+	if (err)
+		return err;
+
+	err = check_user_mbz32(&ext->flags);
+	if (err)
+		return err;
+
+	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
+		err = check_user_mbz64(&ext->mbz64[n]);
+		if (err)
+			return err;
+	}
+
+	if (get_user(mask, &ext->engines_mask))
+		return -EFAULT;
+
+	mask &= GENMASK_ULL(set->nengine - 1, 0) & ~BIT_ULL(idx);
+	if (!mask)
+		return -EINVAL;
+
+	if (is_power_of_2(mask)) {
+		ve = set->engines[__ffs64(mask)];
+	} else {
+		struct intel_engine_cs *stack[64];
+		int bit;
+
+		n = 0;
+		for_each_set_bit(bit, (unsigned long *)&mask, set->nengine)
+			stack[n++] = set->engines[bit];
+
+		ve = intel_execlists_create_virtual(set->ctx, stack, n);
+	}
+	if (IS_ERR(ve))
+		return PTR_ERR(ve);
+
+	if (cmpxchg(&set->engines[idx], NULL, ve)) {
+		intel_virtual_engine_destroy(ve);
+		return -EEXIST;
+	}
+
+	return 0;
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
+	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
 };
 
 static int
@@ -1426,13 +1546,13 @@ set_engines(struct i915_gem_context *ctx,
 					   ARRAY_SIZE(set_engines__extensions),
 					   &set);
 	if (err) {
-		kfree(set.engines);
+		free_engines(set.engines, set.nengine);
 		return err;
 	}
 
 out:
 	mutex_lock(&ctx->i915->drm.struct_mutex);
-	kfree(ctx->engines);
+	free_engines(ctx->engines, ctx->nengine);
 	ctx->engines = set.engines;
 	ctx->nengine = set.nengine;
 	mutex_unlock(&ctx->i915->drm.struct_mutex);
@@ -1637,6 +1757,7 @@ static int clone_engines(struct i915_gem_context *dst,
 			 struct i915_gem_context *src)
 {
 	struct intel_engine_cs **engines;
+	int i;
 
 	engines = kmemdup(src->engines,
 			  sizeof(*src->engines) * src->nengine,
@@ -1644,6 +1765,30 @@ static int clone_engines(struct i915_gem_context *dst,
 	if (!engines)
 		return -ENOMEM;
 
+	/*
+	 * Virtual engines are singletons; they can only exist
+	 * inside a single context, because they embed their
+	 * HW context... As each virtual context implies a single
+	 * timeline (each engine can only dequeue a single request
+	 * at any time), it would be surprising for two contexts
+	 * to use the same engine. So let's create a copy of
+	 * the virtual engine instead.
+	 */
+	for (i = 0; i < src->nengine; i++) {
+		struct intel_engine_cs *engine = engines[i];
+
+		if (!intel_engine_is_virtual(engine))
+			continue;
+
+		engine = intel_execlists_clone_virtual(dst, engine);
+		if (IS_ERR(engine)) {
+			free_engines(engines, i);
+			return PTR_ERR(engine);
+		}
+
+		engines[i] = engine;
+	}
+
 	dst->engines = engines;
 	dst->nengine = src->nengine;
 	return 0;
diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
index e0f609d01564..bb9819dbe313 100644
--- a/drivers/gpu/drm/i915/i915_scheduler.c
+++ b/drivers/gpu/drm/i915/i915_scheduler.c
@@ -247,17 +247,25 @@ sched_lock_engine(const struct i915_sched_node *node,
 		  struct intel_engine_cs *locked,
 		  struct sched_cache *cache)
 {
-	struct intel_engine_cs *engine = node_to_request(node)->engine;
+	const struct i915_request *rq = node_to_request(node);
+	struct intel_engine_cs *engine;
 
 	GEM_BUG_ON(!locked);
 
-	if (engine != locked) {
+	/*
+	 * Virtual engines complicate acquiring the engine timeline lock,
+	 * as their rq->engine pointer is not stable until under that
+	 * engine lock. The simple ploy we use is to take the lock then
+	 * check that the rq still belongs to the newly locked engine.
+	 */
+	while (locked != (engine = READ_ONCE(rq->engine))) {
 		spin_unlock(&locked->timeline.lock);
 		memset(cache, 0, sizeof(*cache));
 		spin_lock(&engine->timeline.lock);
+		locked = engine;
 	}
 
-	return engine;
+	return locked;
 }
 
 static bool inflight(const struct i915_request *rq,
@@ -370,8 +378,11 @@ static void __i915_schedule(struct i915_request *rq,
 		if (prio <= node->attr.priority || node_signaled(node))
 			continue;
 
+		GEM_BUG_ON(node_to_request(node)->engine != engine);
+
 		node->attr.priority = prio;
 		if (!list_empty(&node->link)) {
+			GEM_BUG_ON(intel_engine_is_virtual(engine));
 			if (!cache.priolist)
 				cache.priolist =
 					i915_sched_lookup_priolist(engine,
diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
index 8ff146dc05ba..5e445f145eb1 100644
--- a/drivers/gpu/drm/i915/i915_timeline_types.h
+++ b/drivers/gpu/drm/i915/i915_timeline_types.h
@@ -25,6 +25,7 @@ struct i915_timeline {
 	spinlock_t lock;
 #define TIMELINE_CLIENT 0 /* default subclass */
 #define TIMELINE_ENGINE 1
+#define TIMELINE_VIRTUAL 2
 	struct mutex mutex; /* protects the flow of requests */
 
 	unsigned int pin_count;
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index b0aa1f0d4e47..d54d2a1840cc 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -216,6 +216,7 @@ struct intel_engine_execlists {
 	 * @queue: queue of requests, in priority lists
 	 */
 	struct rb_root_cached queue;
+	struct rb_root_cached virtual;
 
 	/**
 	 * @csb_write: control register for Context Switch buffer
@@ -421,6 +422,7 @@ struct intel_engine_cs {
 #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
 #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
 #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
+#define I915_ENGINE_IS_VIRTUAL       BIT(4)
 	unsigned int flags;
 
 	/*
@@ -504,6 +506,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
 	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
 }
 
+static inline bool
+intel_engine_is_virtual(const struct intel_engine_cs *engine)
+{
+	return engine->flags & I915_ENGINE_IS_VIRTUAL;
+}
+
 #define instdone_slice_mask(dev_priv__) \
 	(IS_GEN(dev_priv__, 7) ? \
 	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 7b938eaff9c5..0c97e8f30223 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -166,6 +166,28 @@
 
 #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
 
+struct virtual_engine {
+	struct intel_engine_cs base;
+
+	struct intel_context context;
+	struct kref kref;
+	struct rcu_head rcu;
+
+	struct i915_request *request;
+	struct ve_node {
+		struct rb_node rb;
+		int prio;
+	} nodes[I915_NUM_ENGINES];
+
+	unsigned int count;
+	struct intel_engine_cs *siblings[0];
+};
+
+static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
+{
+	return container_of(engine, struct virtual_engine, base);
+}
+
 static int execlists_context_deferred_alloc(struct intel_context *ce,
 					    struct intel_engine_cs *engine);
 static void execlists_init_reg_state(u32 *reg_state,
@@ -235,7 +257,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
 }
 
 static inline bool need_preempt(const struct intel_engine_cs *engine,
-				const struct i915_request *rq)
+				const struct i915_request *rq,
+				struct rb_node *rb)
 {
 	int last_prio;
 
@@ -270,6 +293,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
 	    rq_prio(list_next_entry(rq, link)) > last_prio)
 		return true;
 
+	if (rb) { /* XXX virtual precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		bool preempt = false;
+
+		if (engine == ve->siblings[0]) { /* only preempt one sibling */
+			spin_lock(&ve->base.timeline.lock);
+			if (ve->request)
+				preempt = rq_prio(ve->request) > last_prio;
+			spin_unlock(&ve->base.timeline.lock);
+		}
+
+		if (preempt)
+			return preempt;
+	}
+
 	/*
 	 * If the inflight context did not trigger the preemption, then maybe
 	 * it was the set of queued requests? Pick the highest priority in
@@ -388,6 +427,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 	list_for_each_entry_safe_reverse(rq, rn,
 					 &engine->timeline.requests,
 					 link) {
+		struct intel_engine_cs *owner;
+
 		if (i915_request_completed(rq))
 			break;
 
@@ -396,14 +437,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
 
 		GEM_BUG_ON(rq->hw_context->active);
 
-		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
-		if (rq_prio(rq) != prio) {
-			prio = rq_prio(rq);
-			pl = i915_sched_lookup_priolist(engine, prio);
-		}
-		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+		owner = rq->hw_context->engine;
+		if (likely(owner == engine)) {
+			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
+			if (rq_prio(rq) != prio) {
+				prio = rq_prio(rq);
+				pl = i915_sched_lookup_priolist(engine, prio);
+			}
+			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
+
+			list_add(&rq->sched.link, pl);
+		} else {
+			if (__i915_request_has_started(rq))
+				rq->sched.attr.priority |= ACTIVE_PRIORITY;
 
-		list_add(&rq->sched.link, pl);
+			rq->engine = owner;
+			owner->submit_request(rq);
+		}
 
 		active = rq;
 	}
@@ -665,6 +715,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
 						  execlists));
 }
 
+static void virtual_update_register_offsets(u32 *regs,
+					    struct intel_engine_cs *engine)
+{
+	u32 base = engine->mmio_base;
+
+	regs[CTX_CONTEXT_CONTROL] =
+		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
+	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
+	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
+	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
+	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
+
+	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
+	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
+	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
+	regs[CTX_SECOND_BB_HEAD_U] =
+		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
+	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
+	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
+
+	regs[CTX_CTX_TIMESTAMP] =
+		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
+	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
+	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
+	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
+	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
+	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
+	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
+	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
+	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
+
+	if (engine->class == RENDER_CLASS) {
+		regs[CTX_RCS_INDIRECT_CTX] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
+		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
+			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
+		regs[CTX_BB_PER_CTX_PTR] =
+			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
+
+		regs[CTX_R_PWR_CLK_STATE] =
+			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
+	}
+}
+
 static void execlists_dequeue(struct intel_engine_cs *engine)
 {
 	struct intel_engine_execlists * const execlists = &engine->execlists;
@@ -697,6 +791,28 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 	 * and context switches) submission.
 	 */
 
+	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+		struct intel_engine_cs *active;
+
+		if (!rq) {
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		active = READ_ONCE(ve->context.active);
+		if (active && active != engine) {
+			rb = rb_next(rb);
+			continue;
+		}
+
+		break;
+	}
+
 	if (last) {
 		/*
 		 * Don't resubmit or switch until all outstanding
@@ -718,7 +834,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
 			return;
 
-		if (need_preempt(engine, last)) {
+		if (need_preempt(engine, last, rb)) {
 			inject_preempt_context(engine);
 			return;
 		}
@@ -758,6 +874,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
 		last->tail = last->wa_tail;
 	}
 
+	while (rb) { /* XXX virtual is always taking precedence */
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq;
+
+		spin_lock(&ve->base.timeline.lock);
+
+		rq = ve->request;
+		if (unlikely(!rq)) { /* lost the race to a sibling */
+			spin_unlock(&ve->base.timeline.lock);
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+			rb = rb_first_cached(&execlists->virtual);
+			continue;
+		}
+
+		if (rq_prio(rq) >= queue_prio(execlists)) {
+			if (last && !can_merge_rq(last, rq)) {
+				spin_unlock(&ve->base.timeline.lock);
+				return; /* leave this rq for another engine */
+			}
+
+			GEM_BUG_ON(rq->engine != &ve->base);
+			ve->request = NULL;
+			ve->base.execlists.queue_priority_hint = INT_MIN;
+			rb_erase_cached(rb, &execlists->virtual);
+			RB_CLEAR_NODE(rb);
+
+			GEM_BUG_ON(rq->hw_context != &ve->context);
+			rq->engine = engine;
+
+			if (engine != ve->siblings[0]) {
+				u32 *regs = ve->context.lrc_reg_state;
+				unsigned int n;
+
+				GEM_BUG_ON(READ_ONCE(ve->context.active));
+				virtual_update_register_offsets(regs, engine);
+
+				/*
+				 * Move the bound engine to the top of the list
+				 * for future execution. We then kick this
+				 * tasklet first before checking others, so that
+				 * we preferentially reuse this set of bound
+				 * registers.
+				 */
+				for (n = 1; n < ve->count; n++) {
+					if (ve->siblings[n] == engine) {
+						swap(ve->siblings[n],
+						     ve->siblings[0]);
+						break;
+					}
+				}
+
+				GEM_BUG_ON(ve->siblings[0] != engine);
+			}
+
+			__i915_request_submit(rq);
+			trace_i915_request_in(rq, port_index(port, execlists));
+			submit = true;
+			last = rq;
+		}
+
+		spin_unlock(&ve->base.timeline.lock);
+		break;
+	}
+
 	while ((rb = rb_first_cached(&execlists->queue))) {
 		struct i915_priolist *p = to_priolist(rb);
 		struct i915_request *rq, *rn;
@@ -2904,6 +3086,304 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
 	}
 }
 
+static void __virtual_engine_free(struct rcu_head *rcu)
+{
+	struct virtual_engine *ve = container_of(rcu, typeof(*ve), rcu);
+
+	kfree(ve);
+}
+
+static void virtual_engine_free(struct kref *kref)
+{
+	struct virtual_engine *ve = container_of(kref, typeof(*ve), kref);
+	unsigned int n;
+
+	GEM_BUG_ON(ve->request);
+	GEM_BUG_ON(ve->context.active);
+
+	for (n = 0; n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct rb_node *node = &ve->nodes[sibling->id].rb;
+
+		if (RB_EMPTY_NODE(node))
+			continue;
+
+		spin_lock_irq(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(node))
+			rb_erase_cached(node, &sibling->execlists.virtual);
+
+		spin_unlock_irq(&sibling->timeline.lock);
+	}
+	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
+
+	if (ve->context.state)
+		__execlists_context_fini(&ve->context);
+
+	i915_timeline_fini(&ve->base.timeline);
+	call_rcu(&ve->rcu, __virtual_engine_free);
+}
+
+static void virtual_context_unpin(struct intel_context *ce)
+{
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+
+	execlists_context_unpin(ce);
+
+	kref_put(&ve->kref, virtual_engine_free);
+}
+
+static void virtual_engine_initial_hint(struct virtual_engine *ve)
+{
+	int swp;
+
+	/*
+	 * Pick a random sibling on starting to help spread the load around.
+	 *
+	 * New contexts are typically created with exactly the same order
+	 * of siblings, and often started in batches. Due to the way we iterate
+	 * the array of sibling when submitting requests, sibling[0] is
+	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
+	 * randomised across the system, we also help spread the load by the
+	 * first engine we inspect being different each time.
+	 *
+	 * NB This does not force us to execute on this engine, it will just
+	 * typically be the first we inspect for submission.
+	 */
+	swp = prandom_u32_max(ve->count);
+	if (!swp)
+		return;
+
+	swap(ve->siblings[swp], ve->siblings[0]);
+	virtual_update_register_offsets(ve->context.lrc_reg_state,
+					ve->siblings[0]);
+}
+
+static int virtual_context_pin(struct intel_context *ce)
+{
+	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
+	int err;
+
+	/* Note: we must use a real engine class for setting up reg state */
+	err = __execlists_context_pin(ce, ve->siblings[0]);
+	if (err)
+		return err;
+
+	virtual_engine_initial_hint(ve);
+
+	kref_get(&ve->kref);
+	return 0;
+}
+
+static const struct intel_context_ops virtual_context_ops = {
+	.pin = virtual_context_pin,
+	.unpin = virtual_context_unpin,
+};
+
+static void virtual_submission_tasklet(unsigned long data)
+{
+	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int n;
+	int prio;
+
+	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
+	if (prio == INT_MIN)
+		return;
+
+	local_irq_disable();
+	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
+		struct intel_engine_cs *sibling = ve->siblings[n];
+		struct ve_node * const node = &ve->nodes[sibling->id];
+		struct rb_node **parent, *rb;
+		bool first;
+
+		spin_lock(&sibling->timeline.lock);
+
+		if (!RB_EMPTY_NODE(&node->rb)) {
+			first = rb_first_cached(&sibling->execlists.virtual) == &node->rb;
+			if (prio == node->prio || (prio > node->prio && first))
+				goto submit_engine;
+
+			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
+		}
+
+		rb = NULL;
+		first = true;
+		parent = &sibling->execlists.virtual.rb_root.rb_node;
+		while (*parent) {
+			struct ve_node *other;
+
+			rb = *parent;
+			other = rb_entry(rb, typeof(*other), rb);
+			if (prio > other->prio) {
+				parent = &rb->rb_left;
+			} else {
+				parent = &rb->rb_right;
+				first = false;
+			}
+		}
+
+		rb_link_node(&node->rb, rb, parent);
+		rb_insert_color_cached(&node->rb,
+				       &sibling->execlists.virtual,
+				       first);
+
+submit_engine:
+		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
+		node->prio = prio;
+		if (first && prio > sibling->execlists.queue_priority_hint) {
+			sibling->execlists.queue_priority_hint = prio;
+			tasklet_hi_schedule(&sibling->execlists.tasklet);
+		}
+
+		spin_unlock(&sibling->timeline.lock);
+	}
+	local_irq_enable();
+}
+
+static void virtual_submit_request(struct i915_request *request)
+{
+	struct virtual_engine *ve = to_virtual_engine(request->engine);
+
+	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
+
+	GEM_BUG_ON(ve->request);
+	ve->base.execlists.queue_priority_hint = rq_prio(request);
+	WRITE_ONCE(ve->request, request);
+
+	tasklet_schedule(&ve->base.execlists.tasklet);
+}
+
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count)
+{
+	struct virtual_engine *ve;
+	unsigned int n;
+	int err;
+
+	if (!count)
+		return ERR_PTR(-EINVAL);
+
+	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
+	if (!ve)
+		return ERR_PTR(-ENOMEM);
+
+	kref_init(&ve->kref);
+	rcu_head_init(&ve->rcu);
+	ve->base.i915 = ctx->i915;
+	ve->base.id = -1;
+	ve->base.class = OTHER_CLASS;
+	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
+	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
+	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
+
+	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
+
+	err = i915_timeline_init(ctx->i915,
+				 &ve->base.timeline,
+				 ve->base.name,
+				 NULL);
+	if (err)
+		goto err_put;
+	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
+
+	ve->base.cops = &virtual_context_ops;
+	ve->base.request_alloc = execlists_request_alloc;
+
+	ve->base.schedule = i915_schedule;
+	ve->base.submit_request = virtual_submit_request;
+
+	ve->base.execlists.queue_priority_hint = INT_MIN;
+	tasklet_init(&ve->base.execlists.tasklet,
+		     virtual_submission_tasklet,
+		     (unsigned long)ve);
+
+	intel_context_init(&ve->context, ctx, &ve->base);
+
+	for (n = 0; n < count; n++) {
+		struct intel_engine_cs *sibling = siblings[n];
+
+		GEM_BUG_ON(!is_power_of_2(sibling->mask));
+		if (sibling->mask & ve->base.mask)
+			continue;
+
+		if (sibling->execlists.tasklet.func != execlists_submission_tasklet) {
+			err = -ENODEV;
+			goto err_put;
+		}
+
+		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
+		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
+
+		ve->siblings[ve->count++] = sibling;
+		ve->base.mask |= sibling->mask;
+
+		if (ve->base.class != OTHER_CLASS) {
+			if (ve->base.class != sibling->class) {
+				err = -EINVAL;
+				goto err_put;
+			}
+			continue;
+		}
+
+		ve->base.class = sibling->class;
+		snprintf(ve->base.name, sizeof(ve->base.name),
+			 "v%dx%d", ve->base.class, count);
+		ve->base.context_size = sibling->context_size;
+
+		ve->base.emit_bb_start = sibling->emit_bb_start;
+		ve->base.emit_flush = sibling->emit_flush;
+		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
+		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
+		ve->base.emit_fini_breadcrumb_dw =
+			sibling->emit_fini_breadcrumb_dw;
+	}
+
+	/* gracefully replace a degenerate virtual engine */
+	if (is_power_of_2(ve->base.mask)) {
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);
+		return actual;
+	}
+
+	__intel_context_insert(ctx, &ve->base, &ve->context);
+	return &ve->base;
+
+err_put:
+	virtual_engine_free(&ve->kref);
+	return ERR_PTR(err);
+}
+
+struct intel_engine_cs *
+intel_execlists_clone_virtual(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *src)
+{
+	struct virtual_engine *se = to_virtual_engine(src);
+	struct intel_engine_cs *dst;
+
+	dst = intel_execlists_create_virtual(ctx,
+					     se->siblings,
+					     se->count);
+	if (IS_ERR(dst))
+		return dst;
+
+	return dst;
+}
+
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+
+	if (!engine || !intel_engine_is_virtual(engine))
+		return;
+
+	__intel_context_remove(&ve->context);
+
+	kref_put(&ve->kref, virtual_engine_free);
+}
+
 void intel_execlists_show_requests(struct intel_engine_cs *engine,
 				   struct drm_printer *m,
 				   void (*show_request)(struct drm_printer *m,
@@ -2961,6 +3441,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 		show_request(m, last, "\t\tQ ");
 	}
 
+	last = NULL;
+	count = 0;
+	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
+		struct virtual_engine *ve =
+			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
+		struct i915_request *rq = READ_ONCE(ve->request);
+
+		if (rq) {
+			if (count++ < max - 1)
+				show_request(m, rq, "\t\tV ");
+			else
+				last = rq;
+		}
+	}
+	if (last) {
+		if (count > max) {
+			drm_printf(m,
+				   "\t\t...skipping %d virtual requests...\n",
+				   count - max);
+		}
+		show_request(m, last, "\t\tV ");
+	}
+
 	spin_unlock_irqrestore(&engine->timeline.lock, flags);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index f1aec8a6986f..9d90dc68e02b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -112,6 +112,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
 							const char *prefix),
 				   unsigned int max);
 
+struct intel_engine_cs *
+intel_execlists_create_virtual(struct i915_gem_context *ctx,
+			       struct intel_engine_cs **siblings,
+			       unsigned int count);
+
+struct intel_engine_cs *
+intel_execlists_clone_virtual(struct i915_gem_context *ctx,
+			      struct intel_engine_cs *src);
+
+void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
+
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index d61520ea03c1..4b8a339529d1 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -10,6 +10,7 @@
 
 #include "../i915_selftest.h"
 #include "igt_flush_test.h"
+#include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
 
@@ -1060,6 +1061,169 @@ static int live_preempt_smoke(void *arg)
 	return err;
 }
 
+static int nop_virtual_engine(struct drm_i915_private *i915,
+			      struct intel_engine_cs **siblings,
+			      unsigned int nsibling,
+			      unsigned int nctx,
+			      unsigned int flags)
+#define CHAIN BIT(0)
+{
+	IGT_TIMEOUT(end_time);
+	struct i915_request *request[16];
+	struct i915_gem_context *ctx[16];
+	struct intel_engine_cs *ve[16];
+	unsigned long n, prime, nc;
+	struct igt_live_test t;
+	ktime_t times[2] = {};
+	int err;
+
+	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
+
+	for (n = 0; n < nctx; n++) {
+		ctx[n] = kernel_context(i915);
+		if (!ctx[n])
+			return -ENOMEM;
+
+		ve[n] = intel_execlists_create_virtual(ctx[n],
+						       siblings, nsibling);
+		if (IS_ERR(ve[n]))
+			return PTR_ERR(ve[n]);
+	}
+
+	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
+	if (err)
+		goto out;
+
+	for_each_prime_number_from(prime, 1, 8192) {
+		times[1] = ktime_get_raw();
+
+		if (flags & CHAIN) {
+			for (nc = 0; nc < nctx; nc++) {
+				for (n = 0; n < prime; n++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		} else {
+			for (n = 0; n < prime; n++) {
+				for (nc = 0; nc < nctx; nc++) {
+					request[nc] =
+						i915_request_alloc(ve[nc], ctx[nc]);
+					if (IS_ERR(request[nc])) {
+						err = PTR_ERR(request[nc]);
+						goto out;
+					}
+
+					i915_request_add(request[nc]);
+				}
+			}
+		}
+
+		for (nc = 0; nc < nctx; nc++) {
+			if (i915_request_wait(request[nc],
+					      I915_WAIT_LOCKED,
+					      HZ / 10) < 0) {
+				pr_err("%s(%s): wait for %llx:%lld timed out\n",
+				       __func__, ve[0]->name,
+				       request[nc]->fence.context,
+				       request[nc]->fence.seqno);
+
+				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
+					  __func__, ve[0]->name,
+					  request[nc]->fence.context,
+					  request[nc]->fence.seqno);
+				GEM_TRACE_DUMP();
+				i915_gem_set_wedged(i915);
+				break;
+			}
+		}
+
+		times[1] = ktime_sub(ktime_get_raw(), times[1]);
+		if (prime == 1)
+			times[0] = times[1];
+
+		if (__igt_timeout(end_time, NULL))
+			break;
+	}
+
+	err = igt_live_test_end(&t);
+	if (err)
+		goto out;
+
+	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
+		nctx, ve[0]->name, ktime_to_ns(times[0]),
+		prime, div64_u64(ktime_to_ns(times[1]), prime));
+
+out:
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	for (nc = 0; nc < nctx; nc++) {
+		intel_virtual_engine_destroy(ve[nc]);
+		kernel_context_close(ctx[nc]);
+	}
+	return err;
+}
+
+static int live_virtual_engine(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	unsigned int class, inst;
+	int err = -ENODEV;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for_each_engine(engine, i915, id) {
+		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
+		if (err) {
+			pr_err("Failed to wrap engine %s: err=%d\n",
+			       engine->name, err);
+			goto out_unlock;
+		}
+	}
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		int nsibling, n;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (n = 1; n <= nsibling + 1; n++) {
+			err = nop_virtual_engine(i915, siblings, nsibling,
+						 n, 0);
+			if (err)
+				goto out_unlock;
+		}
+
+		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
+		if (err)
+			goto out_unlock;
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1071,6 +1235,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_chain_preempt),
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
+		SUBTEST(live_virtual_engine),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index a609619610f2..592b02676044 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -125,6 +125,7 @@ enum drm_i915_gem_engine_class {
 };
 
 #define I915_ENGINE_CLASS_INVALID_NONE -1
+#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
 
 /**
  * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
@@ -1596,8 +1597,37 @@ struct drm_i915_gem_context_param_sseu {
 	__u32 rsvd;
 };
 
+/*
+ * i915_context_engines_load_balance:
+ *
+ * Enable load balancing across this set of engines.
+ *
+ * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
+ * used will proxy the execbuffer request onto one of the set of engines
+ * in such a way as to distribute the load evenly across the set.
+ *
+ * The set of engines must be compatible (e.g. the same HW class) as they
+ * will share the same logical GPU context and ring.
+ *
+ * To intermix rendering with the virtual engine and direct rendering onto
+ * the backing engines (bypassing the load balancing proxy), the context must
+ * be defined to use a single timeline for all engines.
+ */
+struct i915_context_engines_load_balance {
+	struct i915_user_extension base;
+
+	__u16 engine_index;
+	__u16 mbz16; /* reserved for future use; must be zero */
+	__u32 flags; /* all undefined flags must be zero */
+
+	__u64 engines_mask; /* selection mask of engines[] */
+
+	__u64 mbz64[4]; /* reserved for future use; must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
+#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 11/13] drm/i915: Extend execution fence to support a callback
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (9 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 10/13] drm/i915: Load balancing across a virtual engine Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-11 13:09   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 12/13] drm/i915/execlists: Virtual engine bonding Chris Wilson
                   ` (5 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

In the next patch, we will want to configure the slave request
depending on which physical engine the master request is executed on.
For this, we introduce a callback from the execute fence to convey this
information.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_request.c | 84 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_request.h |  4 ++
 2 files changed, 83 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 09046a15d218..5527ab22dbf2 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -38,6 +38,8 @@ struct execute_cb {
 	struct list_head link;
 	struct irq_work work;
 	struct i915_sw_fence *fence;
+	void (*hook)(struct i915_request *rq, struct dma_fence *signal);
+	struct i915_request *signal;
 };
 
 static struct i915_global_request {
@@ -343,6 +345,17 @@ static void irq_execute_cb(struct irq_work *wrk)
 	kmem_cache_free(global.slab_execute_cbs, cb);
 }
 
+static void irq_execute_cb_hook(struct irq_work *wrk)
+{
+	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
+
+	cb->hook(container_of(cb->fence, struct i915_request, submit),
+		 &cb->signal->fence);
+	i915_request_put(cb->signal);
+
+	irq_execute_cb(wrk);
+}
+
 static void __notify_execute_cb(struct i915_request *rq)
 {
 	struct execute_cb *cb;
@@ -369,14 +382,19 @@ static void __notify_execute_cb(struct i915_request *rq)
 }
 
 static int
-i915_request_await_execution(struct i915_request *rq,
-			     struct i915_request *signal,
-			     gfp_t gfp)
+__i915_request_await_execution(struct i915_request *rq,
+			       struct i915_request *signal,
+			       void (*hook)(struct i915_request *rq,
+					    struct dma_fence *signal),
+			       gfp_t gfp)
 {
 	struct execute_cb *cb;
 
-	if (i915_request_is_active(signal))
+	if (i915_request_is_active(signal)) {
+		if (hook)
+			hook(rq, &signal->fence);
 		return 0;
+	}
 
 	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
 	if (!cb)
@@ -386,8 +404,18 @@ i915_request_await_execution(struct i915_request *rq,
 	i915_sw_fence_await(cb->fence);
 	init_irq_work(&cb->work, irq_execute_cb);
 
+	if (hook) {
+		cb->hook = hook;
+		cb->signal = i915_request_get(signal);
+		cb->work.func = irq_execute_cb_hook;
+	}
+
 	spin_lock_irq(&signal->lock);
 	if (i915_request_is_active(signal)) {
+		if (hook) {
+			hook(rq, &signal->fence);
+			i915_request_put(signal);
+		}
 		i915_sw_fence_complete(cb->fence);
 		kmem_cache_free(global.slab_execute_cbs, cb);
 	} else {
@@ -790,7 +818,7 @@ emit_semaphore_wait(struct i915_request *to,
 		return err;
 
 	/* Only submit our spinner after the signaler is running! */
-	err = i915_request_await_execution(to, from, gfp);
+	err = __i915_request_await_execution(to, from, NULL, gfp);
 	if (err)
 		return err;
 
@@ -910,6 +938,52 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
 	return 0;
 }
 
+int
+i915_request_await_execution(struct i915_request *rq,
+			     struct dma_fence *fence,
+			     void (*hook)(struct i915_request *rq,
+					  struct dma_fence *signal))
+{
+	struct dma_fence **child = &fence;
+	unsigned int nchild = 1;
+	int ret;
+
+	if (dma_fence_is_array(fence)) {
+		struct dma_fence_array *array = to_dma_fence_array(fence);
+
+		/* XXX Error for signal-on-any fence arrays */
+
+		child = array->fences;
+		nchild = array->num_fences;
+		GEM_BUG_ON(!nchild);
+	}
+
+	do {
+		fence = *child++;
+		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
+			continue;
+
+		/*
+		 * We don't squash repeated fence dependencies here as we
+		 * want to run our callback in all cases.
+		 */
+
+		if (dma_fence_is_i915(fence))
+			ret = __i915_request_await_execution(rq,
+							     to_request(fence),
+							     hook,
+							     I915_FENCE_GFP);
+		else
+			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
+							    I915_FENCE_TIMEOUT,
+							    GFP_KERNEL);
+		if (ret < 0)
+			return ret;
+	} while (--nchild);
+
+	return 0;
+}
+
 /**
  * i915_request_await_object - set this request to (async) wait upon a bo
  * @to: request we are wishing to use
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index cd6c130964cd..d4f6b2940130 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -265,6 +265,10 @@ int i915_request_await_object(struct i915_request *to,
 			      bool write);
 int i915_request_await_dma_fence(struct i915_request *rq,
 				 struct dma_fence *fence);
+int i915_request_await_execution(struct i915_request *rq,
+				 struct dma_fence *fence,
+				 void (*hook)(struct i915_request *rq,
+					      struct dma_fence *signal));
 
 void i915_request_add(struct i915_request *rq);
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 12/13] drm/i915/execlists: Virtual engine bonding
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (10 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 11/13] drm/i915: Extend execution fence to support a callback Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-11 13:38   ` Tvrtko Ursulin
  2019-03-08 14:12 ` [PATCH 13/13] drm/i915: Allow specification of parallel execbuf Chris Wilson
                   ` (4 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

Some users require that when a master batch is executed on one particular
engine, a companion batch is run simultaneously on a specific slave
engine. For this purpose, we introduce virtual engine bonding, allowing
maps of master:slaves to be constructed to constrain which physical
engines a virtual engine may select given a fence on a master engine.

For the moment, we continue to ignore the issue of preemption deferring
the master request for later. Ideally, we would like to then also remove
the slave and run something else rather than have it stall the pipeline.
With load balancing, we should be able to move workload around it, but
there is a similar stall on the master pipeline while it may wait for
the slave to be executed. At the cost of more latency for the bonded
request, it may be interesting to launch both on their engines in
lockstep. (Bubbles abound.)

Opens: Also what about bonding an engine as its own master? It doesn't
break anything internally, so allow the silliness.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem_context.c    |  50 ++++++
 drivers/gpu/drm/i915/i915_request.c        |   1 +
 drivers/gpu/drm/i915/i915_request.h        |   1 +
 drivers/gpu/drm/i915/intel_engine_types.h  |   7 +
 drivers/gpu/drm/i915/intel_lrc.c           | 111 ++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.h           |   4 +
 drivers/gpu/drm/i915/selftests/intel_lrc.c | 167 +++++++++++++++++++++
 include/uapi/drm/i915_drm.h                |  22 +++
 8 files changed, 363 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 13b79980f7f3..0d86306497b8 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -1484,8 +1484,58 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
 	return 0;
 }
 
+static int
+set_engines__bond(struct i915_user_extension __user *base, void *data)
+{
+	struct i915_context_engines_bond __user *ext =
+		container_of_user(base, typeof(*ext), base);
+	const struct set_engines *set = data;
+	struct intel_engine_cs *master;
+	u32 class, instance, siblings;
+	u16 idx;
+	int err;
+
+	if (get_user(idx, &ext->engine_index))
+		return -EFAULT;
+
+	if (idx >= set->nengine)
+		return -EINVAL;
+
+	idx = array_index_nospec(idx, set->nengine);
+	if (!set->engines[idx])
+		return -EINVAL;
+
+	/*
+	 * A non-virtual engine has 0 siblings to choose between; and submit
+	 * fence will always be directed to the one engine.
+	 */
+	if (!intel_engine_is_virtual(set->engines[idx]))
+		return 0;
+
+	err = check_user_mbz16(&ext->mbz);
+	if (err)
+		return err;
+
+	if (get_user(class, &ext->master_class))
+		return -EFAULT;
+
+	if (get_user(instance, &ext->master_instance))
+		return -EFAULT;
+
+	master = intel_engine_lookup_user(set->ctx->i915, class, instance);
+	if (!master)
+		return -EINVAL;
+
+	if (get_user(siblings, &ext->sibling_mask))
+		return -EFAULT;
+
+	return intel_virtual_engine_attach_bond(set->engines[idx],
+						master, siblings);
+}
+
 static const i915_user_extension_fn set_engines__extensions[] = {
 	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
+	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
 };
 
 static int
diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
index 5527ab22dbf2..0caf31de2b98 100644
--- a/drivers/gpu/drm/i915/i915_request.c
+++ b/drivers/gpu/drm/i915/i915_request.c
@@ -743,6 +743,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
 	rq->batch = NULL;
 	rq->capture_list = NULL;
 	rq->waitboost = false;
+	rq->execution_mask = ~0u;
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
index d4f6b2940130..862b25930de0 100644
--- a/drivers/gpu/drm/i915/i915_request.h
+++ b/drivers/gpu/drm/i915/i915_request.h
@@ -145,6 +145,7 @@ struct i915_request {
 	 */
 	struct i915_sched_node sched;
 	struct i915_dependency dep;
+	unsigned int execution_mask;
 
 	/*
 	 * A convenience pointer to the current breadcrumb value stored in
diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
index d54d2a1840cc..6dfcf5cc08c1 100644
--- a/drivers/gpu/drm/i915/intel_engine_types.h
+++ b/drivers/gpu/drm/i915/intel_engine_types.h
@@ -382,6 +382,13 @@ struct intel_engine_cs {
 	 */
 	void		(*submit_request)(struct i915_request *rq);
 
+	/*
+	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
+	 * request down to the bonded pairs.
+	 */
+	void            (*bond_execute)(struct i915_request *rq,
+					struct dma_fence *signal);
+
 	/*
 	 * Call when the priority on a request has changed and it and its
 	 * dependencies may need rescheduling. Note the request itself may
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 0c97e8f30223..f06312d185af 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -179,6 +179,12 @@ struct virtual_engine {
 		int prio;
 	} nodes[I915_NUM_ENGINES];
 
+	struct ve_bond {
+		struct intel_engine_cs *master;
+		unsigned int sibling_mask;
+	} *bonds;
+	unsigned int nbond;
+
 	unsigned int count;
 	struct intel_engine_cs *siblings[0];
 };
@@ -3183,6 +3189,7 @@ static const struct intel_context_ops virtual_context_ops = {
 static void virtual_submission_tasklet(unsigned long data)
 {
 	struct virtual_engine * const ve = (struct virtual_engine *)data;
+	unsigned int mask;
 	unsigned int n;
 	int prio;
 
@@ -3191,12 +3198,30 @@ static void virtual_submission_tasklet(unsigned long data)
 		return;
 
 	local_irq_disable();
+
+	mask = 0;
+	spin_lock(&ve->base.timeline.lock);
+	if (ve->request)
+		mask = ve->request->execution_mask;
+	spin_unlock(&ve->base.timeline.lock);
+
 	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
 		struct intel_engine_cs *sibling = ve->siblings[n];
 		struct ve_node * const node = &ve->nodes[sibling->id];
 		struct rb_node **parent, *rb;
 		bool first;
 
+		if (unlikely(!(mask & sibling->mask))) {
+			if (!RB_EMPTY_NODE(&node->rb)) {
+				spin_lock(&sibling->timeline.lock);
+				rb_erase_cached(&node->rb,
+						&sibling->execlists.virtual);
+				RB_CLEAR_NODE(&node->rb);
+				spin_unlock(&sibling->timeline.lock);
+			}
+			continue;
+		}
+
 		spin_lock(&sibling->timeline.lock);
 
 		if (!RB_EMPTY_NODE(&node->rb)) {
@@ -3254,6 +3279,30 @@ static void virtual_submit_request(struct i915_request *request)
 	tasklet_schedule(&ve->base.execlists.tasklet);
 }
 
+static struct ve_bond *
+virtual_find_bond(struct virtual_engine *ve, struct intel_engine_cs *master)
+{
+	int i;
+
+	for (i = 0; i < ve->nbond; i++) {
+		if (ve->bonds[i].master == master)
+			return &ve->bonds[i];
+	}
+
+	return NULL;
+}
+
+static void
+virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
+{
+	struct virtual_engine *ve = to_virtual_engine(rq->engine);
+	struct ve_bond *bond;
+
+	bond = virtual_find_bond(ve, to_request(signal)->engine);
+	if (bond) /* XXX serialise with rq->lock? */
+		rq->execution_mask &= bond->sibling_mask;
+}
+
 struct intel_engine_cs *
 intel_execlists_create_virtual(struct i915_gem_context *ctx,
 			       struct intel_engine_cs **siblings,
@@ -3294,6 +3343,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
 
 	ve->base.schedule = i915_schedule;
 	ve->base.submit_request = virtual_submit_request;
+	ve->base.bond_execute = virtual_bond_execute;
 
 	ve->base.execlists.queue_priority_hint = INT_MIN;
 	tasklet_init(&ve->base.execlists.tasklet,
@@ -3369,9 +3419,70 @@ intel_execlists_clone_virtual(struct i915_gem_context *ctx,
 	if (IS_ERR(dst))
 		return dst;
 
+	if (se->nbond) {
+		struct virtual_engine *de = to_virtual_engine(dst);
+
+		de->bonds = kmemdup(se->bonds,
+				    sizeof(*se->bonds) * se->nbond,
+				    GFP_KERNEL);
+		if (!de->bonds) {
+			intel_virtual_engine_destroy(dst);
+			return ERR_PTR(-ENOMEM);
+		}
+
+		de->nbond = se->nbond;
+	}
+
 	return dst;
 }
 
+static unsigned long
+virtual_execution_mask(struct virtual_engine *ve, unsigned long mask)
+{
+	unsigned long emask = 0;
+	int bit;
+
+	for_each_set_bit(bit, &mask, ve->count)
+		emask |= ve->siblings[bit]->mask;
+
+	return emask;
+}
+
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long mask)
+{
+	struct virtual_engine *ve = to_virtual_engine(engine);
+	struct ve_bond *bond;
+
+	if (mask >> ve->count)
+		return -EINVAL;
+
+	mask = virtual_execution_mask(ve, mask);
+	if (!mask)
+		return -EINVAL;
+
+	bond = virtual_find_bond(ve, master);
+	if (bond) {
+		bond->sibling_mask |= mask;
+		return 0;
+	}
+
+	bond = krealloc(ve->bonds,
+			sizeof(*bond) * (ve->nbond + 1),
+			GFP_KERNEL);
+	if (!bond)
+		return -ENOMEM;
+
+	bond[ve->nbond].master = master;
+	bond[ve->nbond].sibling_mask = mask;
+
+	ve->bonds = bond;
+	ve->nbond++;
+
+	return 0;
+}
+
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
 {
 	struct virtual_engine *ve = to_virtual_engine(engine);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 9d90dc68e02b..77b85648045a 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -121,6 +121,10 @@ struct intel_engine_cs *
 intel_execlists_clone_virtual(struct i915_gem_context *ctx,
 			      struct intel_engine_cs *src);
 
+int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
+				     struct intel_engine_cs *master,
+				     unsigned long mask);
+
 void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
 
 u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
index 4b8a339529d1..a7de7a8fc24a 100644
--- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
+++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
@@ -13,6 +13,7 @@
 #include "igt_live_test.h"
 #include "igt_spinner.h"
 #include "i915_random.h"
+#include "lib_sw_fence.h"
 
 #include "mock_context.h"
 
@@ -1224,6 +1225,171 @@ static int live_virtual_engine(void *arg)
 	return err;
 }
 
+static int bond_virtual_engine(struct drm_i915_private *i915,
+			       unsigned int class,
+			       struct intel_engine_cs **siblings,
+			       unsigned int nsibling,
+			       unsigned int flags)
+#define BOND_SCHEDULE BIT(0)
+{
+	struct intel_engine_cs *master;
+	struct i915_gem_context *ctx;
+	struct i915_request *rq[16];
+	enum intel_engine_id id;
+	unsigned long n;
+	int err;
+
+	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
+
+	ctx = kernel_context(i915);
+	if (!ctx)
+		return -ENOMEM;
+
+	err = 0;
+	rq[0] = ERR_PTR(-ENOMEM);
+	for_each_engine(master, i915, id) {
+		struct i915_sw_fence fence;
+
+		if (master->class == class)
+			continue;
+
+		rq[0] = i915_request_alloc(master, ctx);
+		if (IS_ERR(rq[0])) {
+			err = PTR_ERR(rq[0]);
+			goto out;
+		}
+
+		if (flags & BOND_SCHEDULE)
+			onstack_fence_init(&fence);
+
+		i915_request_get(rq[0]);
+		i915_request_add(rq[0]);
+
+		for (n = 0; n < nsibling; n++) {
+			struct intel_engine_cs *engine;
+
+			engine = intel_execlists_create_virtual(ctx,
+								siblings,
+								nsibling);
+			if (IS_ERR(engine)) {
+				err = PTR_ERR(engine);
+				goto out;
+			}
+
+			err = intel_virtual_engine_attach_bond(engine,
+							       master,
+							       BIT(n));
+			if (err) {
+				intel_virtual_engine_destroy(engine);
+				goto out;
+			}
+
+			rq[n + 1] = i915_request_alloc(engine, ctx);
+			if (IS_ERR(rq[n + 1])) {
+				err = PTR_ERR(rq[n + 1]);
+				intel_virtual_engine_destroy(engine);
+				goto out;
+			}
+			i915_request_get(rq[n + 1]);
+
+			err = i915_request_await_execution(rq[n + 1],
+							   &rq[0]->fence,
+							   engine->bond_execute);
+			i915_request_add(rq[n + 1]);
+			intel_virtual_engine_destroy(engine);
+			if (err < 0)
+				goto out;
+		}
+		rq[n + 1] = ERR_PTR(-EINVAL);
+
+		if (flags & BOND_SCHEDULE)
+			onstack_fence_fini(&fence);
+
+		for (n = 0; n < nsibling; n++) {
+			if (i915_request_wait(rq[n + 1],
+					      I915_WAIT_LOCKED,
+					      MAX_SCHEDULE_TIMEOUT) < 0) {
+				err = -EIO;
+				goto out;
+			}
+
+			if (rq[n + 1]->engine != siblings[n]) {
+				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
+				       siblings[n]->name,
+				       rq[n + 1]->engine->name,
+				       rq[0]->engine->name);
+				err = -EINVAL;
+				goto out;
+			}
+		}
+
+		for (n = 0; !IS_ERR(rq[n]); n++)
+			i915_request_put(rq[n]);
+		rq[0] = ERR_PTR(-ENOMEM);
+	}
+
+out:
+	for (n = 0; !IS_ERR(rq[n]); n++)
+		i915_request_put(rq[n]);
+	if (igt_flush_test(i915, I915_WAIT_LOCKED))
+		err = -EIO;
+
+	kernel_context_close(ctx);
+	return err;
+}
+
+static int live_virtual_bond(void *arg)
+{
+	static const struct phase {
+		const char *name;
+		unsigned int flags;
+	} phases[] = {
+		{ "", 0 },
+		{ "schedule", BOND_SCHEDULE },
+		{ },
+	};
+	struct drm_i915_private *i915 = arg;
+	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
+	unsigned int class, inst;
+	int err = 0;
+
+	if (USES_GUC_SUBMISSION(i915))
+		return 0;
+
+	mutex_lock(&i915->drm.struct_mutex);
+
+	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
+		const struct phase *p;
+		int nsibling;
+
+		nsibling = 0;
+		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
+			if (!i915->engine_class[class][inst])
+				break;
+
+			GEM_BUG_ON(nsibling == ARRAY_SIZE(siblings));
+			siblings[nsibling++] = i915->engine_class[class][inst];
+		}
+		if (nsibling < 2)
+			continue;
+
+		for (p = phases; p->name; p++) {
+			err = bond_virtual_engine(i915,
+						  class, siblings, nsibling,
+						  p->flags);
+			if (err) {
+				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
+				       __func__, p->name, class, nsibling, err);
+				goto out_unlock;
+			}
+		}
+	}
+
+out_unlock:
+	mutex_unlock(&i915->drm.struct_mutex);
+	return err;
+}
+
 int intel_execlists_live_selftests(struct drm_i915_private *i915)
 {
 	static const struct i915_subtest tests[] = {
@@ -1236,6 +1402,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
 		SUBTEST(live_preempt_hang),
 		SUBTEST(live_preempt_smoke),
 		SUBTEST(live_virtual_engine),
+		SUBTEST(live_virtual_bond),
 	};
 
 	if (!HAS_EXECLISTS(i915))
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 592b02676044..94e72ae954a0 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1530,6 +1530,10 @@ struct drm_i915_gem_context_param {
  * sized argument, will revert back to default settings.
  *
  * See struct i915_context_param_engines.
+ *
+ * Extensions:
+ *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
+ *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
  */
 #define I915_CONTEXT_PARAM_ENGINES	0xa
 /* Must be kept compact -- no holes and well documented */
@@ -1625,9 +1629,27 @@ struct i915_context_engines_load_balance {
 	__u64 mbz64[4]; /* reserved for future use; must be zero */
 };
 
+/*
+ * i915_context_engines_bond:
+ *
+ */
+struct i915_context_engines_bond {
+	struct i915_user_extension base;
+
+	__u16 engine_index;
+	__u16 mbz;
+
+	__u16 master_class;
+	__u16 master_instance;
+
+	__u64 sibling_mask;
+	__u64 flags; /* all undefined flags must be zero */
+};
+
 struct i915_context_param_engines {
 	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
 #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
+#define I915_CONTEXT_ENGINES_EXT_BOND 1
 
 	struct {
 		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* [PATCH 13/13] drm/i915: Allow specification of parallel execbuf
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (11 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 12/13] drm/i915/execlists: Virtual engine bonding Chris Wilson
@ 2019-03-08 14:12 ` Chris Wilson
  2019-03-11 13:40   ` Tvrtko Ursulin
  2019-03-08 14:58 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Patchwork
                   ` (3 subsequent siblings)
  16 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 14:12 UTC (permalink / raw)
  To: intel-gfx

There is a desire to split a task onto two engines and have them run at
the same time, e.g. scanline interleaving to spread the workload evenly.
Through the use of the out-fence from the first execbuf, we can
coordinate secondary execbuf to only become ready simultaneously with
the first, so that with all things idle the second execbufs are executed
in parallel with the first. The key difference here between the new
EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
waits for the completion of the first request (so that all of its
rendering results are visible to the second execbuf, the more common
userspace fence requirement).

Since we only have a single input fence slot, userspace cannot mix an
in-fence and a submit-fence. It has to use one or the other! This is not
such a harsh requirement, since by virtue of the submit-fence, the
secondary execbuf inherit all of the dependencies from the first
request, and for the application the dependencies should be common
between the primary and secondary execbuf.

Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Testcase: igt/gem_exec_fence/parallel
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c            |  1 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++++++++++++++++-
 include/uapi/drm/i915_drm.h                | 17 ++++++++++++++-
 3 files changed, 41 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 93e41c937d96..afdfced262e6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -421,6 +421,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_PARAM_HAS_EXEC_CAPTURE:
 	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
 	case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
+	case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
 		/* For the time being all of these are always true;
 		 * if some supported hardware does not have one of these
 		 * features this value needs to be provided from
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 67e4a0c2ebff..8f14ea41d4e7 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -2285,6 +2285,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 {
 	struct i915_execbuffer eb;
 	struct dma_fence *in_fence = NULL;
+	struct dma_fence *exec_fence = NULL;
 	struct sync_file *out_fence = NULL;
 	intel_wakeref_t wakeref;
 	int out_fence_fd = -1;
@@ -2328,11 +2329,24 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			return -EINVAL;
 	}
 
+	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
+		if (in_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+
+		exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
+		if (!exec_fence) {
+			err = -EINVAL;
+			goto err_in_fence;
+		}
+	}
+
 	if (args->flags & I915_EXEC_FENCE_OUT) {
 		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
 		if (out_fence_fd < 0) {
 			err = out_fence_fd;
-			goto err_in_fence;
+			goto err_exec_fence;
 		}
 	}
 
@@ -2464,6 +2478,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 			goto err_request;
 	}
 
+	if (exec_fence) {
+		err = i915_request_await_execution(eb.request, exec_fence,
+						   eb.engine->bond_execute);
+		if (err < 0)
+			goto err_request;
+	}
+
 	if (fences) {
 		err = await_fence_array(&eb, fences);
 		if (err)
@@ -2524,6 +2545,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
 err_out_fence:
 	if (out_fence_fd != -1)
 		put_unused_fd(out_fence_fd);
+err_exec_fence:
+	dma_fence_put(exec_fence);
 err_in_fence:
 	dma_fence_put(in_fence);
 	return err;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 94e72ae954a0..a6cfd1232537 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -591,6 +591,12 @@ typedef struct drm_i915_irq_wait {
  */
 #define I915_PARAM_MMAP_GTT_COHERENT	52
 
+/*
+ * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
+ * execution through use of explicit fence support.
+ * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
+ */
+#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
 /* Must be kept compact -- no holes and well documented */
 
 typedef struct drm_i915_getparam {
@@ -1113,7 +1119,16 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_FENCE_ARRAY   (1<<19)
 
-#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
+/*
+ * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
+ * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
+ * the batch.
+ *
+ * Returns -EINVAL if the sync_file fd cannot be found.
+ */
+#define I915_EXEC_FENCE_SUBMIT		(1 << 20)
+
+#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT << 1))
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-08 14:12 ` [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method Chris Wilson
@ 2019-03-08 14:33   ` Tvrtko Ursulin
  2019-03-13 10:50     ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-08 14:33 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> An idea for extending uABI inspired by Vulkan's extension chains.
> Instead of expanding the data struct for each ioctl every time we need
> to add a new feature, define an extension chain instead. As we add
> optional interfaces to control the ioctl, we define a new extension
> struct that can be linked into the ioctl data only when required by the
> user. The key advantage being able to ignore large control structs for
> optional interfaces/extensions, while being able to process them in a
> consistent manner.
> 
> In comparison to other extensible ioctls, the key difference is the
> use of a linked chain of extension structs vs an array of tagged
> pointers. For example,
> 
> struct drm_amdgpu_cs_chunk {
>          __u32           chunk_id;
>          __u32           length_dw;
>          __u64           chunk_data;
> };
> 
> struct drm_amdgpu_cs_in {
>          __u32           ctx_id;
>          __u32           bo_list_handle;
>          __u32           num_chunks;
>          __u32           _pad;
>          __u64           chunks;
> };
> 
> allows userspace to pass in array of pointers to extension structs, but
> must therefore keep constructing that array along side the command stream.
> In dynamic situations like that, a linked list is preferred and does not
> similar from extra cache line misses as the extension structs themselves
> must still be loaded separate to the chunks array.
> 
> v2: Apply the tail call optimisation directly to nip the worry of stack
> overflow in the bud.
> v3: Defend against recursion.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/Makefile               |  1 +
>   drivers/gpu/drm/i915/i915_user_extensions.c | 43 +++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_user_extensions.h | 20 ++++++++++
>   drivers/gpu/drm/i915/i915_utils.h           |  7 ++++
>   include/uapi/drm/i915_drm.h                 | 20 ++++++++++
>   5 files changed, 91 insertions(+)
>   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.c
>   create mode 100644 drivers/gpu/drm/i915/i915_user_extensions.h
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 68fecf355471..60de05f3fa60 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -46,6 +46,7 @@ i915-y := i915_drv.o \
>   	  i915_sw_fence.o \
>   	  i915_syncmap.o \
>   	  i915_sysfs.o \
> +	  i915_user_extensions.o \
>   	  intel_csr.o \
>   	  intel_device_info.o \
>   	  intel_pm.o \
> diff --git a/drivers/gpu/drm/i915/i915_user_extensions.c b/drivers/gpu/drm/i915/i915_user_extensions.c
> new file mode 100644
> index 000000000000..879b4094b2d7
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_user_extensions.c
> @@ -0,0 +1,43 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#include <linux/sched/signal.h>
> +#include <linux/uaccess.h>
> +#include <uapi/drm/i915_drm.h>
> +
> +#include "i915_user_extensions.h"
> +
> +int i915_user_extensions(struct i915_user_extension __user *ext,
> +			 const i915_user_extension_fn *tbl,
> +			 unsigned long count,
> +			 void *data)
> +{
> +	unsigned int stackdepth = 512;

I have doubts about usefulness of trying to impose some limit now. And 
also reservations about using the name stack. But both are irrelevant 
implementation details at this stage so meh.

> +
> +	while (ext) {
> +		int err;
> +		u64 x;
> +
> +		if (!stackdepth--) /* recursion vs useful flexibility */
> +			return -EINVAL;
> +
> +		if (get_user(x, &ext->name))
> +			return -EFAULT;
> +
> +		err = -EINVAL;
> +		if (x < count && tbl[x])
> +			err = tbl[x](ext, data);

How about:

		put_user(err, &ext->result);

And:

struct i915_user_extension {
	__u64 next_extension;
	__u64 name;
	__u32 result;
	__u32 mbz;
};

So we add the ability for each extension to store it's exit code giving 
userspace opportunity to know which one failed.

With this I would be satisfied usability is future proof enough.

Regards,

Tvrtko

> +		if (err)
> +			return err;
> +
> +		if (get_user(x, &ext->next_extension))
> +			return -EFAULT;
> +
> +		ext = u64_to_user_ptr(x);
> +	}
> +
> +	return 0;
> +}
> diff --git a/drivers/gpu/drm/i915/i915_user_extensions.h b/drivers/gpu/drm/i915/i915_user_extensions.h
> new file mode 100644
> index 000000000000..313a510b068a
> --- /dev/null
> +++ b/drivers/gpu/drm/i915/i915_user_extensions.h
> @@ -0,0 +1,20 @@
> +/*
> + * SPDX-License-Identifier: MIT
> + *
> + * Copyright © 2018 Intel Corporation
> + */
> +
> +#ifndef I915_USER_EXTENSIONS_H
> +#define I915_USER_EXTENSIONS_H
> +
> +struct i915_user_extension;
> +
> +typedef int (*i915_user_extension_fn)(struct i915_user_extension __user *ext,
> +				      void *data);
> +
> +int i915_user_extensions(struct i915_user_extension __user *ext,
> +			 const i915_user_extension_fn *tbl,
> +			 unsigned long count,
> +			 void *data);
> +
> +#endif /* I915_USER_EXTENSIONS_H */
> diff --git a/drivers/gpu/drm/i915/i915_utils.h b/drivers/gpu/drm/i915/i915_utils.h
> index 9726df37c4c4..fcc751aa1ea8 100644
> --- a/drivers/gpu/drm/i915/i915_utils.h
> +++ b/drivers/gpu/drm/i915/i915_utils.h
> @@ -105,6 +105,13 @@
>   	__T;								\
>   })
>   
> +#define container_of_user(ptr, type, member) ({				\
> +	void __user *__mptr = (void __user *)(ptr);			\
> +	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
> +			 !__same_type(*(ptr), void),			\
> +			 "pointer type mismatch in container_of()");	\
> +	((type __user *)(__mptr - offsetof(type, member))); })
> +
>   static inline u64 ptr_to_u64(const void *ptr)
>   {
>   	return (uintptr_t)ptr;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index aa2d4c73a97d..39835793722b 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -62,6 +62,26 @@ extern "C" {
>   #define I915_ERROR_UEVENT		"ERROR"
>   #define I915_RESET_UEVENT		"RESET"
>   
> +/*
> + * i915_user_extension: Base class for defining a chain of extensions
> + *
> + * Many interfaces need to grow over time. In most cases we can simply
> + * extend the struct and have userspace pass in more data. Another option,
> + * as demonstrated by Vulkan's approach to providing extensions for forward
> + * and backward compatibility, is to use a list of optional structs to
> + * provide those extra details.
> + *
> + * The key advantage to using an extension chain is that it allows us to
> + * redefine the interface more easily than an ever growing struct of
> + * increasing complexity, and for large parts of that interface to be
> + * entirely optional. The downside is more pointer chasing; chasing across
> + * the __user boundary with pointers encapsulated inside u64.
> + */
> +struct i915_user_extension {
> +	__u64 next_extension;
> +	__u64 name;
> +};
> +
>   /*
>    * MOCS indexes used for GPU surfaces, defining the cacheability of the
>    * surface data and the coherency for this data wrt. CPU vs. GPU accesses.
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (12 preceding siblings ...)
  2019-03-08 14:12 ` [PATCH 13/13] drm/i915: Allow specification of parallel execbuf Chris Wilson
@ 2019-03-08 14:58 ` Patchwork
  2019-03-08 15:05 ` ✗ Fi.CI.SPARSE: " Patchwork
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 58+ messages in thread
From: Patchwork @ 2019-03-08 14:58 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
URL   : https://patchwork.freedesktop.org/series/57742/
State : warning

== Summary ==

$ dim checkpatch origin/drm-tip
4dedd46a945e drm/i915: Suppress the "Failed to idle" warning for gem_eio
-:21: WARNING:COMMIT_LOG_LONG_LINE: Possible unwrapped commit description (prefer a maximum 75 chars per line)
#21: 
Reference: 5861b013e2c7 ("drm/i915: Do a synchronous switch-to-kernel-context on idling")

-:21: ERROR:GIT_COMMIT_ID: Please use git commit description style 'commit <12+ chars of sha1> ("<title line>")' - ie: 'commit 5861b013e2c7 ("drm/i915: Do a synchronous switch-to-kernel-context on idling")'
#21: 
Reference: 5861b013e2c7 ("drm/i915: Do a synchronous switch-to-kernel-context on idling")

total: 1 errors, 1 warnings, 0 checks, 16 lines checked
c04c584b0234 drm/i915: Introduce the i915_user_extension_method
-:58: WARNING:FILE_PATH_CHANGES: added, moved or deleted file(s), does MAINTAINERS need updating?
#58: 
new file mode 100644

-:63: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#63: FILE: drivers/gpu/drm/i915/i915_user_extensions.c:1:
+/*

-:112: WARNING:SPDX_LICENSE_TAG: Missing or malformed SPDX-License-Identifier tag in line 1
#112: FILE: drivers/gpu/drm/i915/i915_user_extensions.h:1:
+/*

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'ptr' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_REUSE: Macro argument reuse 'member' - possible side-effects?
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

-:140: CHECK:MACRO_ARG_PRECEDENCE: Macro argument 'member' may be better as '(member)' to avoid precedence issues
#140: FILE: drivers/gpu/drm/i915/i915_utils.h:108:
+#define container_of_user(ptr, type, member) ({				\
+	void __user *__mptr = (void __user *)(ptr);			\
+	BUILD_BUG_ON_MSG(!__same_type(*(ptr), ((type *)0)->member) &&	\
+			 !__same_type(*(ptr), void),			\
+			 "pointer type mismatch in container_of()");	\
+	((type __user *)(__mptr - offsetof(type, member))); })

total: 0 errors, 3 warnings, 3 checks, 109 lines checked
7b994f43e7c4 drm/i915: Introduce a context barrier callback
b24cfcf439bd drm/i915: Create/destroy VM (ppGTT) for use with contexts
-:40: CHECK:UNCOMMENTED_DEFINITION: struct mutex definition without comment
#40: FILE: drivers/gpu/drm/i915/i915_drv.h:221:
+	struct mutex vm_lock;

-:551: WARNING:LINE_SPACING: Missing a blank line after declarations
#551: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:503:
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);

-:613: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#613: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:565:
+		ncontexts = dw = 0;

-:688: WARNING:LINE_SPACING: Missing a blank line after declarations
#688: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:633:
+		struct i915_gem_context *ctx = NULL;
+		IGT_TIMEOUT(end_time);

-:746: CHECK:MULTIPLE_ASSIGNMENTS: multiple assignments should be avoided
#746: FILE: drivers/gpu/drm/i915/selftests/i915_gem_context.c:686:
+		ncontexts = dw = 0;

-:853: WARNING:LONG_LINE: line over 100 characters
#853: FILE: include/uapi/drm/i915_drm.h:405:
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)

-:854: WARNING:LONG_LINE: line over 100 characters
#854: FILE: include/uapi/drm/i915_drm.h:406:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:854: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#854: FILE: include/uapi/drm/i915_drm.h:406:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

-:854: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#854: FILE: include/uapi/drm/i915_drm.h:406:
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)

total: 1 errors, 5 warnings, 3 checks, 809 lines checked
821206137812 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
-:28: WARNING:LONG_LINE: line over 100 characters
#28: FILE: drivers/gpu/drm/i915/i915_drv.c:3113:
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE_EXT, i915_gem_context_create_ioctl, DRM_RENDER_ALLOW),

-:535: WARNING:LONG_LINE: line over 100 characters
#535: FILE: include/uapi/drm/i915_drm.h:395:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:535: WARNING:SPACING: space prohibited between function name and open parenthesis '('
#535: FILE: include/uapi/drm/i915_drm.h:395:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

-:535: ERROR:COMPLEX_MACRO: Macros with complex values should be enclosed in parentheses
#535: FILE: include/uapi/drm/i915_drm.h:395:
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE_EXT	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE, struct drm_i915_gem_context_create_ext)

total: 1 errors, 3 warnings, 0 checks, 680 lines checked
fa2d02c9d176 drm/i915: Allow contexts to share a single timeline across all engines
045e92a606a6 drm/i915: Allow userspace to clone contexts on creation
a2eea5288054 drm/i915: Allow a context to define its set of engines
cd2cecf7bc25 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
d1b97ccd96c6 drm/i915: Load balancing across a virtual engine
-:914: WARNING:LINE_SPACING: Missing a blank line after declarations
#914: FILE: drivers/gpu/drm/i915/intel_lrc.c:3347:
+		struct intel_engine_cs *actual = ve->siblings[0];
+		virtual_engine_free(&ve->kref);

total: 0 errors, 1 warnings, 0 checks, 1126 lines checked
27b55d0ef7aa drm/i915: Extend execution fence to support a callback
0ece4182687d drm/i915/execlists: Virtual engine bonding
9103b86a1051 drm/i915: Allow specification of parallel execbuf

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-08 14:12 ` [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
@ 2019-03-08 15:03   ` Tvrtko Ursulin
  2019-03-08 15:35     ` Chris Wilson
  2019-03-08 15:41   ` [PATCH v2] " Chris Wilson
  1 sibling, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-08 15:03 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> In preparation to making the ppGTT binding for a context explicit (to
> facilitate reusing the same ppGTT between different contexts), allow the
> user to create and destroy named ppGTT.
> 
> v2: Replace global barrier for swapping over the ppgtt and tlbs with a
> local context barrier (Tvrtko)
> v3: serialise with struct_mutex; it's lazy but required dammit
> v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
> similarly, turned out different!)
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c               |   2 +
>   drivers/gpu/drm/i915/i915_drv.h               |   3 +
>   drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
>   drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
>   drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
>   drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 215 ++++++++++++---
>   drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
>   drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
>   include/uapi/drm/i915_drm.h                   |  36 +++
>   11 files changed, 497 insertions(+), 61 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 0d743907e7bc..5d53efc4c5d9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -3121,6 +3121,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>   	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
>   	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
>   };
>   
>   static struct drm_driver driver = {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index c4ffe19ec698..8c4eb302cc0b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -218,6 +218,9 @@ struct drm_i915_file_private {
>   	} mm;
>   	struct idr context_idr;
>   
> +	struct mutex vm_lock;
> +	struct idr vm_idr;
> +
>   	unsigned int bsd_engine;
>   
>   /*
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b6370225dcb5..fb2aba06f693 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -126,6 +126,8 @@ static void lut_close(struct i915_gem_context *ctx)
>   		struct i915_vma *vma = rcu_dereference_raw(*slot);
>   
>   		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
> +
> +		vma->open_count--;

Okay figured it out in N-th attempt.. I think.. lut gets unlinked from 
the obj lut list before this decrement so there can not be a double 
decrement.

>   		__i915_gem_object_release_unless_active(vma->obj);
>   	}
>   	rcu_read_unlock();
> @@ -306,7 +308,7 @@ static void context_close(struct i915_gem_context *ctx)
>   	 */
>   	lut_close(ctx);
>   	if (ctx->ppgtt)
> -		i915_ppgtt_close(&ctx->ppgtt->vm);
> +		i915_ppgtt_close(ctx->ppgtt);
>   
>   	ctx->file_priv = ERR_PTR(-EBADF);
>   	i915_gem_context_put(ctx);
> @@ -417,6 +419,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
>   	context_close(ctx);
>   }
>   
> +static struct i915_hw_ppgtt *
> +__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
> +{
> +	struct i915_hw_ppgtt *old = ctx->ppgtt;
> +
> +	i915_ppgtt_open(ppgtt);
> +	ctx->ppgtt = i915_ppgtt_get(ppgtt);
> +
> +	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
> +
> +	return old;
> +}
> +
> +static void __assign_ppgtt(struct i915_gem_context *ctx,
> +			   struct i915_hw_ppgtt *ppgtt)
> +{
> +	if (ppgtt == ctx->ppgtt)
> +		return;
> +
> +	ppgtt = __set_ppgtt(ctx, ppgtt);
> +	if (ppgtt) {
> +		i915_ppgtt_close(ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +}
> +
>   static struct i915_gem_context *
>   i915_gem_create_context(struct drm_i915_private *dev_priv,
>   			struct drm_i915_file_private *file_priv)
> @@ -443,8 +471,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   			return ERR_CAST(ppgtt);
>   		}
>   
> -		ctx->ppgtt = ppgtt;
> -		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
> +		__assign_ppgtt(ctx, ppgtt);
> +		i915_ppgtt_put(ppgtt);
>   	}
>   
>   	trace_i915_context_create(ctx);
> @@ -625,19 +653,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
>   	return 0;
>   }
>   
> +static int vm_idr_cleanup(int id, void *p, void *data)
> +{
> +	i915_ppgtt_put(p);
> +	return 0;
> +}
> +
>   int i915_gem_context_open(struct drm_i915_private *i915,
>   			  struct drm_file *file)
>   {
>   	struct drm_i915_file_private *file_priv = file->driver_priv;
>   	struct i915_gem_context *ctx;
>   
> +	mutex_init(&file_priv->vm_lock);
> +
>   	idr_init(&file_priv->context_idr);
> +	idr_init_base(&file_priv->vm_idr, 1);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   	ctx = i915_gem_create_context(i915, file_priv);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	if (IS_ERR(ctx)) {
>   		idr_destroy(&file_priv->context_idr);
> +		idr_destroy(&file_priv->vm_idr);
>   		return PTR_ERR(ctx);
>   	}
>   
> @@ -654,6 +692,89 @@ void i915_gem_context_close(struct drm_file *file)
>   
>   	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
>   	idr_destroy(&file_priv->context_idr);
> +
> +	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
> +	idr_destroy(&file_priv->vm_idr);
> +
> +	mutex_destroy(&file_priv->vm_lock);
> +}
> +
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file)
> +{
> +	struct drm_i915_private *i915 = to_i915(dev);
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +
> +	if (!HAS_FULL_PPGTT(i915))
> +		return -ENODEV;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	ppgtt = i915_ppgtt_create(i915, file_priv);
> +	if (IS_ERR(ppgtt))
> +		return PTR_ERR(ppgtt);
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		goto err_put;
> +
> +	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (err < 0)
> +		goto err_put;
> +
> +	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
> +	ppgtt->user_handle = err;
> +	args->id = err;
> +	return 0;
> +
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file)
> +{
> +	struct drm_i915_file_private *file_priv = file->driver_priv;
> +	struct drm_i915_gem_vm_control *args = data;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int err;
> +	u32 id;
> +
> +	if (args->flags)
> +		return -EINVAL;
> +
> +	if (args->extensions)
> +		return -EINVAL;
> +
> +	id = args->id;
> +	if (!id)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_remove(&file_priv->vm_idr, id);
> +	if (ppgtt) {
> +		GEM_BUG_ON(!ppgtt->user_handle);
> +		ppgtt->user_handle = 0;
> +	}
> +
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	i915_ppgtt_put(ppgtt);
> +	return 0;
>   }
>   
>   static struct i915_request *
> @@ -799,6 +920,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
>   	return 0;
>   }
>   
> +static int get_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt;
> +	int ret;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	/* XXX rcu acquire? */
> +	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (ret)
> +		return ret;
> +
> +	ppgtt = i915_ppgtt_get(ctx->ppgtt);
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	ret = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (ret)
> +		goto err_put;
> +
> +	if (!ppgtt->user_handle) {
> +		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
> +		GEM_BUG_ON(!ret);
> +		if (ret < 0)
> +			goto err_unlock;
> +
> +		ppgtt->user_handle = ret;
> +		i915_ppgtt_get(ppgtt);
> +	}
> +
> +	args->size = 0;
> +	args->value = ppgtt->user_handle;
> +
> +	ret = 0;
> +err_unlock:
> +	mutex_unlock(&file_priv->vm_lock);
> +err_put:
> +	i915_ppgtt_put(ppgtt);
> +	return ret;
> +}
> +
> +static void set_ppgtt_barrier(void *data)
> +{
> +	struct i915_hw_ppgtt *old = data;
> +
> +	i915_ppgtt_close(old);
> +	i915_ppgtt_put(old);
> +}
> +
> +static int set_ppgtt(struct i915_gem_context *ctx,
> +		     struct drm_i915_gem_context_param *args)
> +{
> +	struct drm_i915_file_private *file_priv = ctx->file_priv;
> +	struct i915_hw_ppgtt *ppgtt, *old;
> +	int err;
> +
> +	if (args->size)
> +		return -EINVAL;
> +
> +	if (upper_32_bits(args->value))
> +		return -EINVAL;
> +
> +	if (!ctx->ppgtt)
> +		return -ENODEV;
> +
> +	err = mutex_lock_interruptible(&file_priv->vm_lock);
> +	if (err)
> +		return err;
> +
> +	ppgtt = idr_find(&file_priv->vm_idr, args->value);
> +	if (ppgtt) {
> +		GEM_BUG_ON(ppgtt->user_handle != args->value);
> +		i915_ppgtt_get(ppgtt);
> +	}
> +	mutex_unlock(&file_priv->vm_lock);
> +	if (!ppgtt)
> +		return -ENOENT;
> +
> +	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
> +	if (err)
> +		goto out;
> +
> +	if (ppgtt == ctx->ppgtt)
> +		goto unlock;
> +
> +	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
> +	lut_close(ctx);
> +
> +	old = __set_ppgtt(ctx, ppgtt);
> +
> +	/*
> +	 * We need to flush any requests using the current ppgtt before
> +	 * we release it as the requests do not hold a reference themselves,
> +	 * only indirectly through the context.
> +	 */
> +	err = context_barrier_task(ctx, ALL_ENGINES, set_ppgtt_barrier, old);
> +	if (err) {
> +		ctx->ppgtt = old;
> +		ctx->desc_template = default_desc_template(ctx->i915, old);
> +
> +		i915_ppgtt_close(ppgtt);
> +		i915_ppgtt_put(ppgtt);
> +	}
> +
> +unlock:
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +out:
> +	i915_ppgtt_put(ppgtt);
> +	return err;
> +}
> +
>   static bool client_is_banned(struct drm_i915_file_private *file_priv)
>   {
>   	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
> @@ -973,6 +1208,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = get_sseu(ctx, args);
>   		break;
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = get_ppgtt(ctx, args);
> +		break;
>   	default:
>   		ret = -EINVAL;
>   		break;
> @@ -1274,9 +1512,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   		return -ENOENT;
>   
>   	switch (args->param) {
> -	case I915_CONTEXT_PARAM_BAN_PERIOD:
> -		ret = -EINVAL;
> -		break;
>   	case I915_CONTEXT_PARAM_NO_ZEROMAP:
>   		if (args->size)
>   			ret = -EINVAL;
> @@ -1332,9 +1567,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
>   					I915_USER_PRIORITY(priority);
>   		}
>   		break;
> +
>   	case I915_CONTEXT_PARAM_SSEU:
>   		ret = set_sseu(ctx, args);
>   		break;
> +
> +	case I915_CONTEXT_PARAM_VM:
> +		ret = set_ppgtt(ctx, args);
> +		break;
> +
> +	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
>   		break;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
> index 5a32c4b4816f..1e670372892c 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context.h
> @@ -153,6 +153,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
>   struct i915_gem_context *
>   i915_gem_context_create_gvt(struct drm_device *dev);
>   
> +int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
> +			     struct drm_file *file);
> +int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
> +			      struct drm_file *file);
> +
>   int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   				  struct drm_file *file);
>   int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index dac08d9c3fab..d717952cc430 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -2099,10 +2099,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
>   	return ppgtt;
>   }
>   
> -void i915_ppgtt_close(struct i915_address_space *vm)
> +void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
>   {
> -	GEM_BUG_ON(vm->closed);
> -	vm->closed = true;
> +	GEM_BUG_ON(ppgtt->vm.closed);
> +
> +	ppgtt->open_count++;
> +}
> +
> +void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
> +{
> +	GEM_BUG_ON(!ppgtt->open_count);
> +	if (--ppgtt->open_count)
> +		return;
> +
> +	GEM_BUG_ON(ppgtt->vm.closed);
> +	ppgtt->vm.closed = true;
>   }
>   
>   static void ppgtt_destroy_vma(struct i915_address_space *vm)
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
> index a47e11e6fc1b..25d5f7682bda 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.h
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
> @@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
>   	struct kref ref;
>   
>   	unsigned long pd_dirty_engines;
> +	unsigned int open_count;
> +
>   	union {
>   		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
>   		struct i915_page_directory_pointer pdp;	/* GEN8+ */
>   		struct i915_page_directory pd;		/* GEN6-7 */
>   	};
> +
> +	u32 user_handle;
>   };
>   
>   struct gen6_hw_ppgtt {
> @@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
>   void i915_ppgtt_release(struct kref *kref);
>   struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
>   					struct drm_i915_file_private *fpriv);
> -void i915_ppgtt_close(struct i915_address_space *vm);
> -static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
> +
> +void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
> +void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
> +
> +static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
>   {
> -	if (ppgtt)
> -		kref_get(&ppgtt->ref);
> +	kref_get(&ppgtt->ref);
> +	return ppgtt;
>   }
> +
>   static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
>   {
>   	if (ppgtt)
> diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
> index 1e66cff985f8..0b7740dc18cb 100644
> --- a/drivers/gpu/drm/i915/selftests/huge_pages.c
> +++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
> @@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
>   	err = i915_subtests(tests, ppgtt);
>   
>   out_close:
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   
>   out_unlock:
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index 664ae1428ecc..c4a5cf26992e 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
>   	return 0;
>   }
>   
> -static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
> +static noinline int cpu_check(struct drm_i915_gem_object *obj,
> +			      unsigned int idx, unsigned int max)
>   {
>   	unsigned int n, m, needs_flush;
>   	int err;
> @@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (m = 0; m < max; m++) {
>   			if (map[m] != m) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], m);
> +				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
> +				       __builtin_return_address(0), idx,
> +				       n, real_page_count(obj), m, max,
> +				       map[m], m);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
>   
>   		for (; m < DW_PER_PAGE; m++) {
>   			if (map[m] != STACK_MAGIC) {
> -				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
> -				       n, m, map[m], STACK_MAGIC);
> +				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
> +				       __builtin_return_address(0), idx, n, m,
> +				       map[m], STACK_MAGIC);
>   				err = -EINVAL;
>   				goto out_unmap;
>   			}
> @@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
>   static int igt_ctx_exec(void *arg)
>   {
>   	struct drm_i915_private *i915 = arg;
> -	struct drm_i915_gem_object *obj = NULL;
> -	unsigned long ncontexts, ndwords, dw;
> -	struct igt_live_test t;
> -	struct drm_file *file;
> -	IGT_TIMEOUT(end_time);
> -	LIST_HEAD(objects);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
>   	int err = -ENODEV;
>   
>   	/*
> @@ -495,41 +495,166 @@ static int igt_ctx_exec(void *arg)
>   	if (!DRIVER_CAPS(i915)->has_logical_contexts)
>   		return 0;
>   
> +	for_each_engine(engine, i915, id) {
> +		struct drm_i915_gem_object *obj = NULL;
> +		unsigned long ncontexts, ndwords, dw;
> +		struct igt_live_test t;
> +		struct drm_file *file;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
> +
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
> +
> +		if (!engine->context_size)
> +			continue; /* No logical context support in HW */
> +
> +		file = mock_file(i915);
> +		if (IS_ERR(file))
> +			return PTR_ERR(file);
> +
> +		mutex_lock(&i915->drm.struct_mutex);
> +
> +		err = igt_live_test_begin(&t, i915, __func__, engine->name);
> +		if (err)
> +			goto out_unlock;
> +
> +		ncontexts = 0;
> +		ndwords = 0;
> +		dw = 0;
> +		while (!time_after(jiffies, end_time)) {
> +			struct i915_gem_context *ctx;
> +			intel_wakeref_t wakeref;
> +
> +			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
> +
> +			if (!obj) {
> +				obj = create_test_object(ctx, file, &objects);
> +				if (IS_ERR(obj)) {
> +					err = PTR_ERR(obj);
> +					goto out_unlock;
> +				}
> +			}
> +
> +			with_intel_runtime_pm(i915, wakeref)
> +				err = gpu_fill(obj, ctx, engine, dw);
> +			if (err) {
> +				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
> +				       ndwords, dw, max_dwords(obj),
> +				       engine->name, ctx->hw_id,
> +				       yesno(!!ctx->ppgtt), err);
> +				goto out_unlock;
> +			}
> +
> +			if (++dw == max_dwords(obj)) {
> +				obj = NULL;
> +				dw = 0;
> +			}
> +
> +			ndwords++;
> +			ncontexts++;
> +		}
> +
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
> +
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				break;
> +
> +			dw += rem;
> +		}
> +
> +out_unlock:
> +		if (igt_live_test_end(&t))
> +			err = -EIO;
> +		mutex_unlock(&i915->drm.struct_mutex);
> +
> +		mock_file_free(i915, file);
> +		if (err)
> +			return err;
> +	}
> +
> +	return 0;
> +}
> +
> +static int igt_shared_ctx_exec(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct i915_gem_context *parent;
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	struct igt_live_test t;
> +	struct drm_file *file;
> +	int err = 0;
> +
> +	/*
> +	 * Create a few different contexts with the same mm and write
> +	 * through each ctx using the GPU making sure those writes end
> +	 * up in the expected pages of our obj.
> +	 */
> +	if (!DRIVER_CAPS(i915)->has_logical_contexts)
> +		return 0;
> +
>   	file = mock_file(i915);
>   	if (IS_ERR(file))
>   		return PTR_ERR(file);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   
> +	parent = i915_gem_create_context(i915, file->driver_priv);
> +	if (IS_ERR(parent)) {
> +		err = PTR_ERR(parent);
> +		goto out_unlock;
> +	}
> +
> +	if (!parent->ppgtt) {
> +		err = -ENODEV;
> +		goto out_unlock;
> +	}
> +
>   	err = igt_live_test_begin(&t, i915, __func__, "");
>   	if (err)
>   		goto out_unlock;
>   
> -	ncontexts = 0;
> -	ndwords = 0;
> -	dw = 0;
> -	while (!time_after(jiffies, end_time)) {
> -		struct intel_engine_cs *engine;
> -		struct i915_gem_context *ctx;
> -		unsigned int id;
> +	for_each_engine(engine, i915, id) {
> +		unsigned long ncontexts, ndwords, dw;
> +		struct drm_i915_gem_object *obj = NULL;
> +		struct i915_gem_context *ctx = NULL;
> +		IGT_TIMEOUT(end_time);
> +		LIST_HEAD(objects);
>   
> -		ctx = i915_gem_create_context(i915, file->driver_priv);
> -		if (IS_ERR(ctx)) {
> -			err = PTR_ERR(ctx);
> -			goto out_unlock;
> -		}
> +		if (!intel_engine_can_store_dword(engine))
> +			continue;
>   
> -		for_each_engine(engine, i915, id) {
> +		dw = 0;
> +		ndwords = 0;
> +		ncontexts = 0;
> +		while (!time_after(jiffies, end_time)) {
>   			intel_wakeref_t wakeref;
>   
> -			if (!engine->context_size)
> -				continue; /* No logical context support in HW */
> +			if (ctx)
> +				__destroy_hw_context(ctx, file->driver_priv);
>   
> -			if (!intel_engine_can_store_dword(engine))
> -				continue;
> +			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			if (IS_ERR(ctx)) {
> +				err = PTR_ERR(ctx);
> +				goto out_unlock;
> +			}
> +
> +			__assign_ppgtt(ctx, parent->ppgtt);
>   
>   			if (!obj) {
> -				obj = create_test_object(ctx, file, &objects);
> +				obj = create_test_object(parent, file, &objects);
>   				if (IS_ERR(obj)) {
>   					err = PTR_ERR(obj);
>   					goto out_unlock;
> @@ -551,25 +676,25 @@ static int igt_ctx_exec(void *arg)
>   				obj = NULL;
>   				dw = 0;
>   			}
> +
>   			ndwords++;
> +			ncontexts++;
>   		}
> -		ncontexts++;
> -	}
> -	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
> -		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
> +		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
> +			ncontexts, engine->name, ndwords);
>   
> -	dw = 0;
> -	list_for_each_entry(obj, &objects, st_link) {
> -		unsigned int rem =
> -			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> +		ncontexts = dw = 0;
> +		list_for_each_entry(obj, &objects, st_link) {
> +			unsigned int rem =
> +				min_t(unsigned int, ndwords - dw, max_dwords(obj));
>   
> -		err = cpu_check(obj, rem);
> -		if (err)
> -			break;
> +			err = cpu_check(obj, ncontexts++, rem);
> +			if (err)
> +				goto out_unlock;
>   
> -		dw += rem;
> +			dw += rem;
> +		}
>   	}
> -
>   out_unlock:
>   	if (igt_live_test_end(&t))
>   		err = -EIO;
> @@ -1048,7 +1173,7 @@ static int igt_ctx_readonly(void *arg)
>   	struct drm_i915_gem_object *obj = NULL;
>   	struct i915_gem_context *ctx;
>   	struct i915_hw_ppgtt *ppgtt;
> -	unsigned long ndwords, dw;
> +	unsigned long idx, ndwords, dw;
>   	struct igt_live_test t;
>   	struct drm_file *file;
>   	I915_RND_STATE(prng);
> @@ -1129,6 +1254,7 @@ static int igt_ctx_readonly(void *arg)
>   		ndwords, RUNTIME_INFO(i915)->num_engines);
>   
>   	dw = 0;
> +	idx = 0;
>   	list_for_each_entry(obj, &objects, st_link) {
>   		unsigned int rem =
>   			min_t(unsigned int, ndwords - dw, max_dwords(obj));
> @@ -1138,7 +1264,7 @@ static int igt_ctx_readonly(void *arg)
>   		if (i915_gem_object_is_readonly(obj))
>   			num_writes = 0;
>   
> -		err = cpu_check(obj, num_writes);
> +		err = cpu_check(obj, idx++, num_writes);
>   		if (err)
>   			break;
>   
> @@ -1723,6 +1849,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
>   		SUBTEST(igt_ctx_exec),
>   		SUBTEST(igt_ctx_readonly),
>   		SUBTEST(igt_ctx_sseu),
> +		SUBTEST(igt_shared_ctx_exec),
>   		SUBTEST(igt_vm_isolation),
>   	};
>   
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> index 826fd51c331e..57b3d9867070 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
> @@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
>   
>   	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
>   
> -	i915_ppgtt_close(&ppgtt->vm);
>   	i915_ppgtt_put(ppgtt);
>   out_unlock:
>   	mutex_unlock(&dev_priv->drm.struct_mutex);
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index 8efa6892c6cd..f90328b21763 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
>   		goto err_handles;
>   
>   	if (name) {
> +		struct i915_hw_ppgtt *ppgtt;
> +
>   		ctx->name = kstrdup(name, GFP_KERNEL);
>   		if (!ctx->name)
>   			goto err_put;
>   
> -		ctx->ppgtt = mock_ppgtt(i915, name);
> -		if (!ctx->ppgtt)
> +		ppgtt = mock_ppgtt(i915, name);
> +		if (!ppgtt)
>   			goto err_put;
> +
> +		__set_ppgtt(ctx, ppgtt);
>   	}
>   
>   	return ctx;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 39835793722b..6575470755d0 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -341,6 +341,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_I915_PERF_ADD_CONFIG	0x37
>   #define DRM_I915_PERF_REMOVE_CONFIG	0x38
>   #define DRM_I915_QUERY			0x39
> +#define DRM_I915_GEM_VM_CREATE		0x3a
> +#define DRM_I915_GEM_VM_DESTROY		0x3b
>   /* Must be kept compact -- no holes */
>   
>   #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
> @@ -400,6 +402,8 @@ typedef struct _drm_i915_sarea {
>   #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
>   #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
>   #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
> +#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
> +#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
>   
>   /* Allow drivers to submit batchbuffers directly to hardware, relying
>    * on the security mechanisms provided by hardware.
> @@ -1451,6 +1455,26 @@ struct drm_i915_gem_context_destroy {
>   	__u32 pad;
>   };
>   
> +/*
> + * DRM_I915_GEM_VM_CREATE -
> + *
> + * Create a new virtual memory address space (ppGTT) for use within a context
> + * on the same file. Extensions can be provided to configure exactly how the
> + * address space is setup upon creation.
> + *
> + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> + * returned.

returned and store in id - do we need both? Maybe return zero or error 
to make it simpler?

> + *
> + * DRM_I915_GEM_VM_DESTROY -
> + *
> + * Destroys a previously created VM id.
> + */
> +struct drm_i915_gem_vm_control {
> +	__u64 extensions;
> +	__u32 flags;
> +	__u32 id;
> +};
> +
>   struct drm_i915_reg_read {
>   	/*
>   	 * Register offset.
> @@ -1540,7 +1564,19 @@ struct drm_i915_gem_context_param {
>    * On creation, all new contexts are marked as recoverable.
>    */
>   #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
> +
> +	/*
> +	 * The id of the associated virtual memory address space (ppGTT) of
> +	 * this context. Can be retrieved and passed to another context
> +	 * (on the same fd) for both to use the same ppGTT and so share
> +	 * address layouts, and avoid reloading the page tables on context
> +	 * switches between themselves.
> +	 *
> +	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
> +	 */
> +#define I915_CONTEXT_PARAM_VM		0x9
>   /* Must be kept compact -- no holes and well documented */
> +
>   	__u64 value;
>   };
>   
> 

Looks ready to me.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* ✗ Fi.CI.SPARSE: warning for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (13 preceding siblings ...)
  2019-03-08 14:58 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Patchwork
@ 2019-03-08 15:05 ` Patchwork
  2019-03-08 15:19 ` ✗ Fi.CI.BAT: failure " Patchwork
  2019-03-08 16:47 ` ✗ Fi.CI.BAT: failure for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio (rev2) Patchwork
  16 siblings, 0 replies; 58+ messages in thread
From: Patchwork @ 2019-03-08 15:05 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
URL   : https://patchwork.freedesktop.org/series/57742/
State : warning

== Summary ==

$ dim sparse origin/drm-tip
Sparse version: v0.5.2
Commit: drm/i915: Suppress the "Failed to idle" warning for gem_eio
Okay!

Commit: drm/i915: Introduce the i915_user_extension_method
Okay!

Commit: drm/i915: Introduce a context barrier callback
Okay!

Commit: drm/i915: Create/destroy VM (ppGTT) for use with contexts
-drivers/gpu/drm/i915/selftests/../i915_drv.h:3553:16: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:1134:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/../i915_drv.h:3556:16: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:1260:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
-O:drivers/gpu/drm/i915/selftests/i915_gem_context.c:564:25: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:568:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)
+drivers/gpu/drm/i915/selftests/i915_gem_context.c:689:33: warning: expression using sizeof(void)

Commit: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Okay!

Commit: drm/i915: Allow contexts to share a single timeline across all engines
Okay!

Commit: drm/i915: Allow userspace to clone contexts on creation
Okay!

Commit: drm/i915: Allow a context to define its set of engines
+./include/linux/slab.h:664:13: error: not a function <noident>

Commit: drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
Okay!

Commit: drm/i915: Load balancing across a virtual engine
+./include/linux/overflow.h:285:13: error: incorrect type in conditional
+./include/linux/overflow.h:285:13: error: undefined identifier '__builtin_mul_overflow'
+./include/linux/overflow.h:285:13:    got void
+./include/linux/overflow.h:285:13: warning: call with no type!
+./include/linux/overflow.h:287:13: error: incorrect type in conditional
+./include/linux/overflow.h:287:13: error: undefined identifier '__builtin_add_overflow'
+./include/linux/overflow.h:287:13:    got void
+./include/linux/overflow.h:287:13: warning: call with no type!

Commit: drm/i915: Extend execution fence to support a callback
Okay!

Commit: drm/i915/execlists: Virtual engine bonding
Okay!

Commit: drm/i915: Allow specification of parallel execbuf
Okay!

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (14 preceding siblings ...)
  2019-03-08 15:05 ` ✗ Fi.CI.SPARSE: " Patchwork
@ 2019-03-08 15:19 ` Patchwork
  2019-03-08 16:47 ` ✗ Fi.CI.BAT: failure for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio (rev2) Patchwork
  16 siblings, 0 replies; 58+ messages in thread
From: Patchwork @ 2019-03-08 15:19 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio
URL   : https://patchwork.freedesktop.org/series/57742/
State : failure

== Summary ==

CI Bug Log - changes from CI_DRM_5723 -> Patchwork_12419
====================================================

Summary
-------

  **FAILURE**

  Serious unknown changes coming with Patchwork_12419 absolutely need to be
  verified manually.
  
  If you think the reported changes have nothing to do with the changes
  introduced in Patchwork_12419, please notify your bug team to allow them
  to document this new failure mode, which will reduce false positives in CI.

  External URL: https://patchwork.freedesktop.org/api/1.0/series/57742/revisions/1/mbox/

Possible new issues
-------------------

  Here are the unknown changes that may have been introduced in Patchwork_12419:

### IGT changes ###

#### Possible regressions ####

  * igt@i915_selftest@live_contexts:
    - fi-snb-2520m:       NOTRUN -> INCOMPLETE

  * igt@runner@aborted:
    - fi-snb-2520m:       NOTRUN -> FAIL
    - fi-snb-2600:        NOTRUN -> FAIL

  
Known issues
------------

  Here are the changes found in Patchwork_12419 that come from known issues:

### IGT changes ###

#### Issues hit ####

  * igt@gem_exec_basic@gtt-bsd2:
    - fi-byt-clapper:     NOTRUN -> SKIP [fdo#109271] +57

  * igt@gem_exec_basic@readonly-bsd1:
    - fi-snb-2520m:       NOTRUN -> SKIP [fdo#109271] +57

  * igt@i915_selftest@live_contexts:
    - fi-snb-2600:        PASS -> INCOMPLETE [fdo#105411]

  * igt@i915_selftest@live_execlists:
    - fi-apl-guc:         PASS -> INCOMPLETE [fdo#103927] / [fdo#109720]

  * igt@kms_busy@basic-flip-a:
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278] +1

  * igt@kms_busy@basic-flip-c:
    - fi-byt-clapper:     NOTRUN -> SKIP [fdo#109271] / [fdo#109278]
    - fi-snb-2520m:       NOTRUN -> SKIP [fdo#109271] / [fdo#109278]

  * igt@kms_chamelium@dp-crc-fast:
    - fi-kbl-7500u:       PASS -> DMESG-WARN [fdo#103841]

  * igt@kms_chamelium@hdmi-crc-fast:
    - fi-bsw-n3050:       NOTRUN -> SKIP [fdo#109271] +62

  * igt@kms_pipe_crc_basic@read-crc-pipe-b-frame-sequence:
    - fi-byt-clapper:     NOTRUN -> FAIL [fdo#103191] / [fdo#107362] +1

  * igt@runner@aborted:
    - fi-kbl-7500u:       NOTRUN -> FAIL [fdo#103841]
    - fi-apl-guc:         NOTRUN -> FAIL [fdo#108622] / [fdo#109720]

  
#### Warnings ####

  * igt@prime_vgem@basic-fence-flip:
    - fi-gdg-551:         FAIL [fdo#103182] -> DMESG-FAIL [fdo#103182]

  
  [fdo#103182]: https://bugs.freedesktop.org/show_bug.cgi?id=103182
  [fdo#103191]: https://bugs.freedesktop.org/show_bug.cgi?id=103191
  [fdo#103841]: https://bugs.freedesktop.org/show_bug.cgi?id=103841
  [fdo#103927]: https://bugs.freedesktop.org/show_bug.cgi?id=103927
  [fdo#105411]: https://bugs.freedesktop.org/show_bug.cgi?id=105411
  [fdo#107362]: https://bugs.freedesktop.org/show_bug.cgi?id=107362
  [fdo#108622]: https://bugs.freedesktop.org/show_bug.cgi?id=108622
  [fdo#109271]: https://bugs.freedesktop.org/show_bug.cgi?id=109271
  [fdo#109278]: https://bugs.freedesktop.org/show_bug.cgi?id=109278
  [fdo#109720]: https://bugs.freedesktop.org/show_bug.cgi?id=109720


Participating hosts (43 -> 42)
------------------------------

  Additional (3): fi-byt-clapper fi-snb-2520m fi-bsw-n3050 
  Missing    (4): fi-kbl-soraka fi-ilk-m540 fi-bsw-cyan fi-hsw-4200u 


Build changes
-------------

    * Linux: CI_DRM_5723 -> Patchwork_12419

  CI_DRM_5723: 3dae54a987335b5f2d9990c7dc5fd21826fd2a05 @ git://anongit.freedesktop.org/gfx-ci/linux
  IGT_4877: d15ad69be07a987d5c2ba408201b287adae8ca59 @ git://anongit.freedesktop.org/xorg/app/intel-gpu-tools
  Patchwork_12419: 9103b86a105149c6b1d046b43ee750cc18423bd1 @ git://anongit.freedesktop.org/gfx-ci/linux


== Linux commits ==

9103b86a1051 drm/i915: Allow specification of parallel execbuf
0ece4182687d drm/i915/execlists: Virtual engine bonding
27b55d0ef7aa drm/i915: Extend execution fence to support a callback
d1b97ccd96c6 drm/i915: Load balancing across a virtual engine
cd2cecf7bc25 drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
a2eea5288054 drm/i915: Allow a context to define its set of engines
045e92a606a6 drm/i915: Allow userspace to clone contexts on creation
fa2d02c9d176 drm/i915: Allow contexts to share a single timeline across all engines
821206137812 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
b24cfcf439bd drm/i915: Create/destroy VM (ppGTT) for use with contexts
7b994f43e7c4 drm/i915: Introduce a context barrier callback
c04c584b0234 drm/i915: Introduce the i915_user_extension_method
4dedd46a945e drm/i915: Suppress the "Failed to idle" warning for gem_eio

== Logs ==

For more details see: https://intel-gfx-ci.01.org/tree/drm-tip/Patchwork_12419/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-08 15:03   ` Tvrtko Ursulin
@ 2019-03-08 15:35     ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 15:35 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-08 15:03:15)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > +/*
> > + * DRM_I915_GEM_VM_CREATE -
> > + *
> > + * Create a new virtual memory address space (ppGTT) for use within a context
> > + * on the same file. Extensions can be provided to configure exactly how the
> > + * address space is setup upon creation.
> > + *
> > + * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
> > + * returned.
> 
> returned and store in id - do we need both? Maybe return zero or error 
> to make it simpler?

I mean returned in the out parameter. Loose language.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* [PATCH v2] drm/i915: Create/destroy VM (ppGTT) for use with contexts
  2019-03-08 14:12 ` [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
  2019-03-08 15:03   ` Tvrtko Ursulin
@ 2019-03-08 15:41   ` Chris Wilson
  1 sibling, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 15:41 UTC (permalink / raw)
  To: intel-gfx

In preparation to making the ppGTT binding for a context explicit (to
facilitate reusing the same ppGTT between different contexts), allow the
user to create and destroy named ppGTT.

v2: Replace global barrier for swapping over the ppgtt and tlbs with a
local context barrier (Tvrtko)
v3: serialise with struct_mutex; it's lazy but required dammit
v4: Rewrite igt_ctx_shared_exec to be more different (aimed to be more
similarly, turned out different!)

v2: Fix up test unwind for aliasing-ppgtt (snb)
v3: Tighten language for uapi struct drm_i915_gem_vm_control.

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.c               |   2 +
 drivers/gpu/drm/i915/i915_drv.h               |   3 +
 drivers/gpu/drm/i915/i915_gem_context.c       | 254 +++++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h       |   5 +
 drivers/gpu/drm/i915/i915_gem_gtt.c           |  17 +-
 drivers/gpu/drm/i915/i915_gem_gtt.h           |  16 +-
 drivers/gpu/drm/i915/selftests/huge_pages.c   |   1 -
 .../gpu/drm/i915/selftests/i915_gem_context.c | 222 +++++++++++----
 drivers/gpu/drm/i915/selftests/i915_gem_gtt.c |   1 -
 drivers/gpu/drm/i915/selftests/mock_context.c |   8 +-
 include/uapi/drm/i915_drm.h                   |  43 +++
 11 files changed, 508 insertions(+), 64 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 0d743907e7bc..5d53efc4c5d9 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -3121,6 +3121,8 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_PERF_ADD_CONFIG, i915_perf_add_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_PERF_REMOVE_CONFIG, i915_perf_remove_config_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_QUERY, i915_query_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_CREATE, i915_gem_vm_create_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_VM_DESTROY, i915_gem_vm_destroy_ioctl, DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index c4ffe19ec698..8c4eb302cc0b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -218,6 +218,9 @@ struct drm_i915_file_private {
 	} mm;
 	struct idr context_idr;
 
+	struct mutex vm_lock;
+	struct idr vm_idr;
+
 	unsigned int bsd_engine;
 
 /*
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b6370225dcb5..fb2aba06f693 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -126,6 +126,8 @@ static void lut_close(struct i915_gem_context *ctx)
 		struct i915_vma *vma = rcu_dereference_raw(*slot);
 
 		radix_tree_iter_delete(&ctx->handles_vma, &iter, slot);
+
+		vma->open_count--;
 		__i915_gem_object_release_unless_active(vma->obj);
 	}
 	rcu_read_unlock();
@@ -306,7 +308,7 @@ static void context_close(struct i915_gem_context *ctx)
 	 */
 	lut_close(ctx);
 	if (ctx->ppgtt)
-		i915_ppgtt_close(&ctx->ppgtt->vm);
+		i915_ppgtt_close(ctx->ppgtt);
 
 	ctx->file_priv = ERR_PTR(-EBADF);
 	i915_gem_context_put(ctx);
@@ -417,6 +419,32 @@ static void __destroy_hw_context(struct i915_gem_context *ctx,
 	context_close(ctx);
 }
 
+static struct i915_hw_ppgtt *
+__set_ppgtt(struct i915_gem_context *ctx, struct i915_hw_ppgtt *ppgtt)
+{
+	struct i915_hw_ppgtt *old = ctx->ppgtt;
+
+	i915_ppgtt_open(ppgtt);
+	ctx->ppgtt = i915_ppgtt_get(ppgtt);
+
+	ctx->desc_template = default_desc_template(ctx->i915, ppgtt);
+
+	return old;
+}
+
+static void __assign_ppgtt(struct i915_gem_context *ctx,
+			   struct i915_hw_ppgtt *ppgtt)
+{
+	if (ppgtt == ctx->ppgtt)
+		return;
+
+	ppgtt = __set_ppgtt(ctx, ppgtt);
+	if (ppgtt) {
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+}
+
 static struct i915_gem_context *
 i915_gem_create_context(struct drm_i915_private *dev_priv,
 			struct drm_i915_file_private *file_priv)
@@ -443,8 +471,8 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
 			return ERR_CAST(ppgtt);
 		}
 
-		ctx->ppgtt = ppgtt;
-		ctx->desc_template = default_desc_template(dev_priv, ppgtt);
+		__assign_ppgtt(ctx, ppgtt);
+		i915_ppgtt_put(ppgtt);
 	}
 
 	trace_i915_context_create(ctx);
@@ -625,19 +653,29 @@ static int context_idr_cleanup(int id, void *p, void *data)
 	return 0;
 }
 
+static int vm_idr_cleanup(int id, void *p, void *data)
+{
+	i915_ppgtt_put(p);
+	return 0;
+}
+
 int i915_gem_context_open(struct drm_i915_private *i915,
 			  struct drm_file *file)
 {
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct i915_gem_context *ctx;
 
+	mutex_init(&file_priv->vm_lock);
+
 	idr_init(&file_priv->context_idr);
+	idr_init_base(&file_priv->vm_idr, 1);
 
 	mutex_lock(&i915->drm.struct_mutex);
 	ctx = i915_gem_create_context(i915, file_priv);
 	mutex_unlock(&i915->drm.struct_mutex);
 	if (IS_ERR(ctx)) {
 		idr_destroy(&file_priv->context_idr);
+		idr_destroy(&file_priv->vm_idr);
 		return PTR_ERR(ctx);
 	}
 
@@ -654,6 +692,89 @@ void i915_gem_context_close(struct drm_file *file)
 
 	idr_for_each(&file_priv->context_idr, context_idr_cleanup, NULL);
 	idr_destroy(&file_priv->context_idr);
+
+	idr_for_each(&file_priv->vm_idr, vm_idr_cleanup, NULL);
+	idr_destroy(&file_priv->vm_idr);
+
+	mutex_destroy(&file_priv->vm_lock);
+}
+
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file)
+{
+	struct drm_i915_private *i915 = to_i915(dev);
+	struct drm_i915_gem_vm_control *args = data;
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+
+	if (!HAS_FULL_PPGTT(i915))
+		return -ENODEV;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	ppgtt = i915_ppgtt_create(i915, file_priv);
+	if (IS_ERR(ppgtt))
+		return PTR_ERR(ppgtt);
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		goto err_put;
+
+	err = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+	mutex_unlock(&file_priv->vm_lock);
+	if (err < 0)
+		goto err_put;
+
+	GEM_BUG_ON(err == 0); /* reserved for default/unassigned ppgtt */
+	ppgtt->user_handle = err;
+	args->id = err;
+	return 0;
+
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file)
+{
+	struct drm_i915_file_private *file_priv = file->driver_priv;
+	struct drm_i915_gem_vm_control *args = data;
+	struct i915_hw_ppgtt *ppgtt;
+	int err;
+	u32 id;
+
+	if (args->flags)
+		return -EINVAL;
+
+	if (args->extensions)
+		return -EINVAL;
+
+	id = args->id;
+	if (!id)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_remove(&file_priv->vm_idr, id);
+	if (ppgtt) {
+		GEM_BUG_ON(!ppgtt->user_handle);
+		ppgtt->user_handle = 0;
+	}
+
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	i915_ppgtt_put(ppgtt);
+	return 0;
 }
 
 static struct i915_request *
@@ -799,6 +920,120 @@ int i915_gem_switch_to_kernel_context(struct drm_i915_private *i915,
 	return 0;
 }
 
+static int get_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt;
+	int ret;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	/* XXX rcu acquire? */
+	ret = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (ret)
+		return ret;
+
+	ppgtt = i915_ppgtt_get(ctx->ppgtt);
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+	ret = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (ret)
+		goto err_put;
+
+	if (!ppgtt->user_handle) {
+		ret = idr_alloc(&file_priv->vm_idr, ppgtt, 0, 0, GFP_KERNEL);
+		GEM_BUG_ON(!ret);
+		if (ret < 0)
+			goto err_unlock;
+
+		ppgtt->user_handle = ret;
+		i915_ppgtt_get(ppgtt);
+	}
+
+	args->size = 0;
+	args->value = ppgtt->user_handle;
+
+	ret = 0;
+err_unlock:
+	mutex_unlock(&file_priv->vm_lock);
+err_put:
+	i915_ppgtt_put(ppgtt);
+	return ret;
+}
+
+static void set_ppgtt_barrier(void *data)
+{
+	struct i915_hw_ppgtt *old = data;
+
+	i915_ppgtt_close(old);
+	i915_ppgtt_put(old);
+}
+
+static int set_ppgtt(struct i915_gem_context *ctx,
+		     struct drm_i915_gem_context_param *args)
+{
+	struct drm_i915_file_private *file_priv = ctx->file_priv;
+	struct i915_hw_ppgtt *ppgtt, *old;
+	int err;
+
+	if (args->size)
+		return -EINVAL;
+
+	if (upper_32_bits(args->value))
+		return -EINVAL;
+
+	if (!ctx->ppgtt)
+		return -ENODEV;
+
+	err = mutex_lock_interruptible(&file_priv->vm_lock);
+	if (err)
+		return err;
+
+	ppgtt = idr_find(&file_priv->vm_idr, args->value);
+	if (ppgtt) {
+		GEM_BUG_ON(ppgtt->user_handle != args->value);
+		i915_ppgtt_get(ppgtt);
+	}
+	mutex_unlock(&file_priv->vm_lock);
+	if (!ppgtt)
+		return -ENOENT;
+
+	err = mutex_lock_interruptible(&ctx->i915->drm.struct_mutex);
+	if (err)
+		goto out;
+
+	if (ppgtt == ctx->ppgtt)
+		goto unlock;
+
+	/* Teardown the existing obj:vma cache, it will have to be rebuilt. */
+	lut_close(ctx);
+
+	old = __set_ppgtt(ctx, ppgtt);
+
+	/*
+	 * We need to flush any requests using the current ppgtt before
+	 * we release it as the requests do not hold a reference themselves,
+	 * only indirectly through the context.
+	 */
+	err = context_barrier_task(ctx, ALL_ENGINES, set_ppgtt_barrier, old);
+	if (err) {
+		ctx->ppgtt = old;
+		ctx->desc_template = default_desc_template(ctx->i915, old);
+
+		i915_ppgtt_close(ppgtt);
+		i915_ppgtt_put(ppgtt);
+	}
+
+unlock:
+	mutex_unlock(&ctx->i915->drm.struct_mutex);
+
+out:
+	i915_ppgtt_put(ppgtt);
+	return err;
+}
+
 static bool client_is_banned(struct drm_i915_file_private *file_priv)
 {
 	return atomic_read(&file_priv->ban_score) >= I915_CLIENT_SCORE_BANNED;
@@ -973,6 +1208,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = get_sseu(ctx, args);
 		break;
+	case I915_CONTEXT_PARAM_VM:
+		ret = get_ppgtt(ctx, args);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1274,9 +1512,6 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		return -ENOENT;
 
 	switch (args->param) {
-	case I915_CONTEXT_PARAM_BAN_PERIOD:
-		ret = -EINVAL;
-		break;
 	case I915_CONTEXT_PARAM_NO_ZEROMAP:
 		if (args->size)
 			ret = -EINVAL;
@@ -1332,9 +1567,16 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 					I915_USER_PRIORITY(priority);
 		}
 		break;
+
 	case I915_CONTEXT_PARAM_SSEU:
 		ret = set_sseu(ctx, args);
 		break;
+
+	case I915_CONTEXT_PARAM_VM:
+		ret = set_ppgtt(ctx, args);
+		break;
+
+	case I915_CONTEXT_PARAM_BAN_PERIOD:
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 5a32c4b4816f..1e670372892c 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -153,6 +153,11 @@ void i915_gem_context_release(struct kref *ctx_ref);
 struct i915_gem_context *
 i915_gem_context_create_gvt(struct drm_device *dev);
 
+int i915_gem_vm_create_ioctl(struct drm_device *dev, void *data,
+			     struct drm_file *file);
+int i915_gem_vm_destroy_ioctl(struct drm_device *dev, void *data,
+			      struct drm_file *file);
+
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index dac08d9c3fab..d717952cc430 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -2099,10 +2099,21 @@ i915_ppgtt_create(struct drm_i915_private *i915,
 	return ppgtt;
 }
 
-void i915_ppgtt_close(struct i915_address_space *vm)
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt)
 {
-	GEM_BUG_ON(vm->closed);
-	vm->closed = true;
+	GEM_BUG_ON(ppgtt->vm.closed);
+
+	ppgtt->open_count++;
+}
+
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt)
+{
+	GEM_BUG_ON(!ppgtt->open_count);
+	if (--ppgtt->open_count)
+		return;
+
+	GEM_BUG_ON(ppgtt->vm.closed);
+	ppgtt->vm.closed = true;
 }
 
 static void ppgtt_destroy_vma(struct i915_address_space *vm)
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index a47e11e6fc1b..25d5f7682bda 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -391,11 +391,15 @@ struct i915_hw_ppgtt {
 	struct kref ref;
 
 	unsigned long pd_dirty_engines;
+	unsigned int open_count;
+
 	union {
 		struct i915_pml4 pml4;		/* GEN8+ & 48b PPGTT */
 		struct i915_page_directory_pointer pdp;	/* GEN8+ */
 		struct i915_page_directory pd;		/* GEN6-7 */
 	};
+
+	u32 user_handle;
 };
 
 struct gen6_hw_ppgtt {
@@ -606,12 +610,16 @@ int i915_ppgtt_init_hw(struct drm_i915_private *dev_priv);
 void i915_ppgtt_release(struct kref *kref);
 struct i915_hw_ppgtt *i915_ppgtt_create(struct drm_i915_private *dev_priv,
 					struct drm_i915_file_private *fpriv);
-void i915_ppgtt_close(struct i915_address_space *vm);
-static inline void i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
+
+void i915_ppgtt_open(struct i915_hw_ppgtt *ppgtt);
+void i915_ppgtt_close(struct i915_hw_ppgtt *ppgtt);
+
+static inline struct i915_hw_ppgtt *i915_ppgtt_get(struct i915_hw_ppgtt *ppgtt)
 {
-	if (ppgtt)
-		kref_get(&ppgtt->ref);
+	kref_get(&ppgtt->ref);
+	return ppgtt;
 }
+
 static inline void i915_ppgtt_put(struct i915_hw_ppgtt *ppgtt)
 {
 	if (ppgtt)
diff --git a/drivers/gpu/drm/i915/selftests/huge_pages.c b/drivers/gpu/drm/i915/selftests/huge_pages.c
index 1e66cff985f8..0b7740dc18cb 100644
--- a/drivers/gpu/drm/i915/selftests/huge_pages.c
+++ b/drivers/gpu/drm/i915/selftests/huge_pages.c
@@ -1734,7 +1734,6 @@ int i915_gem_huge_page_mock_selftests(void)
 	err = i915_subtests(tests, ppgtt);
 
 out_close:
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 
 out_unlock:
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
index 664ae1428ecc..ddb88c19c499 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
@@ -372,7 +372,8 @@ static int cpu_fill(struct drm_i915_gem_object *obj, u32 value)
 	return 0;
 }
 
-static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
+static noinline int cpu_check(struct drm_i915_gem_object *obj,
+			      unsigned int idx, unsigned int max)
 {
 	unsigned int n, m, needs_flush;
 	int err;
@@ -390,8 +391,10 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (m = 0; m < max; m++) {
 			if (map[m] != m) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], m);
+				pr_err("%pS: Invalid value at object %d page %d/%ld, offset %d/%d: found %x expected %x\n",
+				       __builtin_return_address(0), idx,
+				       n, real_page_count(obj), m, max,
+				       map[m], m);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -399,8 +402,9 @@ static int cpu_check(struct drm_i915_gem_object *obj, unsigned int max)
 
 		for (; m < DW_PER_PAGE; m++) {
 			if (map[m] != STACK_MAGIC) {
-				pr_err("Invalid value at page %d, offset %d: found %x expected %x\n",
-				       n, m, map[m], STACK_MAGIC);
+				pr_err("%pS: Invalid value at object %d page %d, offset %d: found %x expected %x (uninitialised)\n",
+				       __builtin_return_address(0), idx, n, m,
+				       map[m], STACK_MAGIC);
 				err = -EINVAL;
 				goto out_unmap;
 			}
@@ -478,12 +482,8 @@ static unsigned long max_dwords(struct drm_i915_gem_object *obj)
 static int igt_ctx_exec(void *arg)
 {
 	struct drm_i915_private *i915 = arg;
-	struct drm_i915_gem_object *obj = NULL;
-	unsigned long ncontexts, ndwords, dw;
-	struct igt_live_test t;
-	struct drm_file *file;
-	IGT_TIMEOUT(end_time);
-	LIST_HEAD(objects);
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
 	int err = -ENODEV;
 
 	/*
@@ -495,44 +495,169 @@ static int igt_ctx_exec(void *arg)
 	if (!DRIVER_CAPS(i915)->has_logical_contexts)
 		return 0;
 
+	for_each_engine(engine, i915, id) {
+		struct drm_i915_gem_object *obj = NULL;
+		unsigned long ncontexts, ndwords, dw;
+		struct igt_live_test t;
+		struct drm_file *file;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
+
+		if (!intel_engine_can_store_dword(engine))
+			continue;
+
+		if (!engine->context_size)
+			continue; /* No logical context support in HW */
+
+		file = mock_file(i915);
+		if (IS_ERR(file))
+			return PTR_ERR(file);
+
+		mutex_lock(&i915->drm.struct_mutex);
+
+		err = igt_live_test_begin(&t, i915, __func__, engine->name);
+		if (err)
+			goto out_unlock;
+
+		ncontexts = 0;
+		ndwords = 0;
+		dw = 0;
+		while (!time_after(jiffies, end_time)) {
+			struct i915_gem_context *ctx;
+			intel_wakeref_t wakeref;
+
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_unlock;
+			}
+
+			if (!obj) {
+				obj = create_test_object(ctx, file, &objects);
+				if (IS_ERR(obj)) {
+					err = PTR_ERR(obj);
+					goto out_unlock;
+				}
+			}
+
+			with_intel_runtime_pm(i915, wakeref)
+				err = gpu_fill(obj, ctx, engine, dw);
+			if (err) {
+				pr_err("Failed to fill dword %lu [%lu/%lu] with gpu (%s) in ctx %u [full-ppgtt? %s], err=%d\n",
+				       ndwords, dw, max_dwords(obj),
+				       engine->name, ctx->hw_id,
+				       yesno(!!ctx->ppgtt), err);
+				goto out_unlock;
+			}
+
+			if (++dw == max_dwords(obj)) {
+				obj = NULL;
+				dw = 0;
+			}
+
+			ndwords++;
+			ncontexts++;
+		}
+
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
+
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
+
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				break;
+
+			dw += rem;
+		}
+
+out_unlock:
+		if (igt_live_test_end(&t))
+			err = -EIO;
+		mutex_unlock(&i915->drm.struct_mutex);
+
+		mock_file_free(i915, file);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int igt_shared_ctx_exec(void *arg)
+{
+	struct drm_i915_private *i915 = arg;
+	struct i915_gem_context *parent;
+	struct intel_engine_cs *engine;
+	enum intel_engine_id id;
+	struct igt_live_test t;
+	struct drm_file *file;
+	int err = 0;
+
+	/*
+	 * Create a few different contexts with the same mm and write
+	 * through each ctx using the GPU making sure those writes end
+	 * up in the expected pages of our obj.
+	 */
+	if (!DRIVER_CAPS(i915)->has_logical_contexts)
+		return 0;
+
 	file = mock_file(i915);
 	if (IS_ERR(file))
 		return PTR_ERR(file);
 
 	mutex_lock(&i915->drm.struct_mutex);
 
+	parent = i915_gem_create_context(i915, file->driver_priv);
+	if (IS_ERR(parent)) {
+		err = PTR_ERR(parent);
+		goto out_unlock;
+	}
+
+	if (!parent->ppgtt) { /* not full-ppgtt; nothing to share */
+		err = -ENODEV;
+		goto out_unlock;
+	}
+
 	err = igt_live_test_begin(&t, i915, __func__, "");
 	if (err)
 		goto out_unlock;
 
-	ncontexts = 0;
-	ndwords = 0;
-	dw = 0;
-	while (!time_after(jiffies, end_time)) {
-		struct intel_engine_cs *engine;
-		struct i915_gem_context *ctx;
-		unsigned int id;
+	for_each_engine(engine, i915, id) {
+		unsigned long ncontexts, ndwords, dw;
+		struct drm_i915_gem_object *obj = NULL;
+		struct i915_gem_context *ctx = NULL;
+		IGT_TIMEOUT(end_time);
+		LIST_HEAD(objects);
 
-		ctx = i915_gem_create_context(i915, file->driver_priv);
-		if (IS_ERR(ctx)) {
-			err = PTR_ERR(ctx);
-			goto out_unlock;
-		}
+		if (!intel_engine_can_store_dword(engine))
+			continue;
 
-		for_each_engine(engine, i915, id) {
+		dw = 0;
+		ndwords = 0;
+		ncontexts = 0;
+		while (!time_after(jiffies, end_time)) {
 			intel_wakeref_t wakeref;
 
-			if (!engine->context_size)
-				continue; /* No logical context support in HW */
+			if (ctx)
+				__destroy_hw_context(ctx, file->driver_priv);
 
-			if (!intel_engine_can_store_dword(engine))
-				continue;
+			ctx = i915_gem_create_context(i915, file->driver_priv);
+			if (IS_ERR(ctx)) {
+				err = PTR_ERR(ctx);
+				goto out_test;
+			}
+
+			__assign_ppgtt(ctx, parent->ppgtt);
 
 			if (!obj) {
-				obj = create_test_object(ctx, file, &objects);
+				obj = create_test_object(parent, file, &objects);
 				if (IS_ERR(obj)) {
 					err = PTR_ERR(obj);
-					goto out_unlock;
+					goto out_test;
 				}
 			}
 
@@ -544,35 +669,36 @@ static int igt_ctx_exec(void *arg)
 				       ndwords, dw, max_dwords(obj),
 				       engine->name, ctx->hw_id,
 				       yesno(!!ctx->ppgtt), err);
-				goto out_unlock;
+				goto out_test;
 			}
 
 			if (++dw == max_dwords(obj)) {
 				obj = NULL;
 				dw = 0;
 			}
+
 			ndwords++;
+			ncontexts++;
 		}
-		ncontexts++;
-	}
-	pr_info("Submitted %lu contexts (across %u engines), filling %lu dwords\n",
-		ncontexts, RUNTIME_INFO(i915)->num_engines, ndwords);
+		pr_info("Submitted %lu contexts to %s, filling %lu dwords\n",
+			ncontexts, engine->name, ndwords);
 
-	dw = 0;
-	list_for_each_entry(obj, &objects, st_link) {
-		unsigned int rem =
-			min_t(unsigned int, ndwords - dw, max_dwords(obj));
+		ncontexts = dw = 0;
+		list_for_each_entry(obj, &objects, st_link) {
+			unsigned int rem =
+				min_t(unsigned int, ndwords - dw, max_dwords(obj));
 
-		err = cpu_check(obj, rem);
-		if (err)
-			break;
+			err = cpu_check(obj, ncontexts++, rem);
+			if (err)
+				goto out_test;
 
-		dw += rem;
+			dw += rem;
+		}
 	}
-
-out_unlock:
+out_test:
 	if (igt_live_test_end(&t))
 		err = -EIO;
+out_unlock:
 	mutex_unlock(&i915->drm.struct_mutex);
 
 	mock_file_free(i915, file);
@@ -1048,7 +1174,7 @@ static int igt_ctx_readonly(void *arg)
 	struct drm_i915_gem_object *obj = NULL;
 	struct i915_gem_context *ctx;
 	struct i915_hw_ppgtt *ppgtt;
-	unsigned long ndwords, dw;
+	unsigned long idx, ndwords, dw;
 	struct igt_live_test t;
 	struct drm_file *file;
 	I915_RND_STATE(prng);
@@ -1129,6 +1255,7 @@ static int igt_ctx_readonly(void *arg)
 		ndwords, RUNTIME_INFO(i915)->num_engines);
 
 	dw = 0;
+	idx = 0;
 	list_for_each_entry(obj, &objects, st_link) {
 		unsigned int rem =
 			min_t(unsigned int, ndwords - dw, max_dwords(obj));
@@ -1138,7 +1265,7 @@ static int igt_ctx_readonly(void *arg)
 		if (i915_gem_object_is_readonly(obj))
 			num_writes = 0;
 
-		err = cpu_check(obj, num_writes);
+		err = cpu_check(obj, idx++, num_writes);
 		if (err)
 			break;
 
@@ -1723,6 +1850,7 @@ int i915_gem_context_live_selftests(struct drm_i915_private *dev_priv)
 		SUBTEST(igt_ctx_exec),
 		SUBTEST(igt_ctx_readonly),
 		SUBTEST(igt_ctx_sseu),
+		SUBTEST(igt_shared_ctx_exec),
 		SUBTEST(igt_vm_isolation),
 	};
 
diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
index 826fd51c331e..57b3d9867070 100644
--- a/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/selftests/i915_gem_gtt.c
@@ -1020,7 +1020,6 @@ static int exercise_ppgtt(struct drm_i915_private *dev_priv,
 
 	err = func(dev_priv, &ppgtt->vm, 0, ppgtt->vm.total, end_time);
 
-	i915_ppgtt_close(&ppgtt->vm);
 	i915_ppgtt_put(ppgtt);
 out_unlock:
 	mutex_unlock(&dev_priv->drm.struct_mutex);
diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
index 8efa6892c6cd..f90328b21763 100644
--- a/drivers/gpu/drm/i915/selftests/mock_context.c
+++ b/drivers/gpu/drm/i915/selftests/mock_context.c
@@ -54,13 +54,17 @@ mock_context(struct drm_i915_private *i915,
 		goto err_handles;
 
 	if (name) {
+		struct i915_hw_ppgtt *ppgtt;
+
 		ctx->name = kstrdup(name, GFP_KERNEL);
 		if (!ctx->name)
 			goto err_put;
 
-		ctx->ppgtt = mock_ppgtt(i915, name);
-		if (!ctx->ppgtt)
+		ppgtt = mock_ppgtt(i915, name);
+		if (!ppgtt)
 			goto err_put;
+
+		__set_ppgtt(ctx, ppgtt);
 	}
 
 	return ctx;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 39835793722b..0882ccb25ff5 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -341,6 +341,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_PERF_ADD_CONFIG	0x37
 #define DRM_I915_PERF_REMOVE_CONFIG	0x38
 #define DRM_I915_QUERY			0x39
+#define DRM_I915_GEM_VM_CREATE		0x3a
+#define DRM_I915_GEM_VM_DESTROY		0x3b
 /* Must be kept compact -- no holes */
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
@@ -400,6 +402,8 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_PERF_ADD_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_ADD_CONFIG, struct drm_i915_perf_oa_config)
 #define DRM_IOCTL_I915_PERF_REMOVE_CONFIG	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_REMOVE_CONFIG, __u64)
 #define DRM_IOCTL_I915_QUERY			DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_QUERY, struct drm_i915_query)
+#define DRM_IOCTL_I915_GEM_VM_CREATE	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_VM_CREATE, struct drm_i915_gem_vm_control)
+#define DRM_IOCTL_I915_GEM_VM_DESTROY	DRM_IOW (DRM_COMMAND_BASE + DRM_I915_GEM_VM_DESTROY, struct drm_i915_gem_vm_control)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1451,6 +1455,33 @@ struct drm_i915_gem_context_destroy {
 	__u32 pad;
 };
 
+/*
+ * DRM_I915_GEM_VM_CREATE -
+ *
+ * Create a new virtual memory address space (ppGTT) for use within a context
+ * on the same file. Extensions can be provided to configure exactly how the
+ * address space is setup upon creation.
+ *
+ * The id of new VM (bound to the fd) for use with I915_CONTEXT_PARAM_VM is
+ * returned in the outparam @id.
+ *
+ * No flags are defined, with all bits reserved and must be zero.
+ *
+ * An extension chain maybe provided, starting with @extensions, and terminated
+ * by the @next_extension being 0. Currently, no extensions are defined.
+ *
+ * DRM_I915_GEM_VM_DESTROY -
+ *
+ * Destroys a previously created VM id, specified in @id.
+ *
+ * No extensions or flags are allowed currently, and so must be zero.
+ */
+struct drm_i915_gem_vm_control {
+	__u64 extensions;
+	__u32 flags;
+	__u32 id;
+};
+
 struct drm_i915_reg_read {
 	/*
 	 * Register offset.
@@ -1540,7 +1571,19 @@ struct drm_i915_gem_context_param {
  * On creation, all new contexts are marked as recoverable.
  */
 #define I915_CONTEXT_PARAM_RECOVERABLE	0x8
+
+	/*
+	 * The id of the associated virtual memory address space (ppGTT) of
+	 * this context. Can be retrieved and passed to another context
+	 * (on the same fd) for both to use the same ppGTT and so share
+	 * address layouts, and avoid reloading the page tables on context
+	 * switches between themselves.
+	 *
+	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
+	 */
+#define I915_CONTEXT_PARAM_VM		0x9
 /* Must be kept compact -- no holes and well documented */
+
 	__u64 value;
 };
 
-- 
2.20.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* Re: [PATCH 06/13] drm/i915: Allow contexts to share a single timeline across all engines
  2019-03-08 14:12 ` [PATCH 06/13] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
@ 2019-03-08 15:56   ` Tvrtko Ursulin
  0 siblings, 0 replies; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-08 15:56 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> Previously, our view has been always to run the engines independently
> within a context. (Multiple engines happened before we had contexts and
> timelines, so they always operated independently and that behaviour
> persisted into contexts.) However, at the user level the context often
> represents a single timeline (e.g. GL contexts) and userspace must
> ensure that the individual engines are serialised to present that
> ordering to the client (or forgot about this detail entirely and hope no
> one notices - a fair ploy if the client can only directly control one
> engine themselves ;)
> 
> In the next patch, we will want to construct a set of engines that
> operate as one, that have a single timeline interwoven between them, to
> present a single virtual engine to the user. (They submit to the virtual
> engine, then we decide which engine to execute on based.)
> 
> To that end, we want to be able to create contexts which have a single
> timeline (fence context) shared between all engines, rather than multiple
> timelines.

No change log.

> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       | 32 ++++++--
>   drivers/gpu/drm/i915/i915_gem_context_types.h |  2 +
>   drivers/gpu/drm/i915/i915_request.c           | 80 +++++++++++++------
>   drivers/gpu/drm/i915/i915_request.h           |  5 +-
>   drivers/gpu/drm/i915/i915_sw_fence.c          | 39 +++++++--
>   drivers/gpu/drm/i915/i915_sw_fence.h          | 13 ++-
>   drivers/gpu/drm/i915/intel_lrc.c              |  5 +-
>   .../gpu/drm/i915/selftests/i915_gem_context.c | 18 +++--
>   drivers/gpu/drm/i915/selftests/mock_context.c |  2 +-
>   include/uapi/drm/i915_drm.h                   |  3 +-
>   10 files changed, 149 insertions(+), 50 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b41b09f60edd..310892b42b68 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -237,6 +237,9 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>   		it->ops->destroy(it);
>   
> +	if (ctx->timeline)
> +		i915_timeline_put(ctx->timeline);
> +
>   	kfree(ctx->name);
>   	put_pid(ctx->pid);
>   
> @@ -448,12 +451,17 @@ static void __assign_ppgtt(struct i915_gem_context *ctx,
>   
>   static struct i915_gem_context *
>   i915_gem_create_context(struct drm_i915_private *dev_priv,
> -			struct drm_i915_file_private *file_priv)
> +			struct drm_i915_file_private *file_priv,
> +			unsigned int flags)
>   {
>   	struct i915_gem_context *ctx;
>   
>   	lockdep_assert_held(&dev_priv->drm.struct_mutex);
>   
> +	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE &&
> +	    !HAS_EXECLISTS(dev_priv))
> +		return ERR_PTR(-EINVAL);
> +
>   	/* Reap the most stale context */
>   	contexts_free_first(dev_priv);
>   
> @@ -476,6 +484,18 @@ i915_gem_create_context(struct drm_i915_private *dev_priv,
>   		i915_ppgtt_put(ppgtt);
>   	}
>   
> +	if (flags & I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE) {
> +		struct i915_timeline *timeline;
> +
> +		timeline = i915_timeline_create(dev_priv, ctx->name, NULL);
> +		if (IS_ERR(timeline)) {
> +			__destroy_hw_context(ctx, file_priv);
> +			return ERR_CAST(timeline);
> +		}
> +
> +		ctx->timeline = timeline;
> +	}
> +
>   	trace_i915_context_create(ctx);
>   
>   	return ctx;
> @@ -504,7 +524,7 @@ i915_gem_context_create_gvt(struct drm_device *dev)
>   	if (ret)
>   		return ERR_PTR(ret);
>   
> -	ctx = i915_gem_create_context(to_i915(dev), NULL);
> +	ctx = i915_gem_create_context(to_i915(dev), NULL, 0);
>   	if (IS_ERR(ctx))
>   		goto out;
>   
> @@ -540,7 +560,7 @@ i915_gem_context_create_kernel(struct drm_i915_private *i915, int prio)
>   	struct i915_gem_context *ctx;
>   	int err;
>   
> -	ctx = i915_gem_create_context(i915, NULL);
> +	ctx = i915_gem_create_context(i915, NULL, 0);
>   	if (IS_ERR(ctx))
>   		return ctx;
>   
> @@ -672,7 +692,7 @@ int i915_gem_context_open(struct drm_i915_private *i915,
>   	idr_init_base(&file_priv->vm_idr, 1);
>   
>   	mutex_lock(&i915->drm.struct_mutex);
> -	ctx = i915_gem_create_context(i915, file_priv);
> +	ctx = i915_gem_create_context(i915, file_priv, 0);
>   	mutex_unlock(&i915->drm.struct_mutex);
>   	if (IS_ERR(ctx)) {
>   		idr_destroy(&file_priv->context_idr);
> @@ -788,7 +808,7 @@ last_request_on_engine(struct i915_timeline *timeline,
>   
>   	rq = i915_active_request_raw(&timeline->last_request,
>   				     &engine->i915->drm.struct_mutex);
> -	if (rq && rq->engine == engine) {
> +	if (rq && rq->engine->mask & engine->mask) {
>   		GEM_TRACE("last request for %s on engine %s: %llx:%llu\n",
>   			  timeline->name, engine->name,
>   			  rq->fence.context, rq->fence.seqno);
> @@ -1448,7 +1468,7 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
>   	if (ret)
>   		return ret;
>   
> -	ctx = i915_gem_create_context(i915, file_priv);
> +	ctx = i915_gem_create_context(i915, file_priv, args->flags);
>   	mutex_unlock(&dev->struct_mutex);
>   	if (IS_ERR(ctx))
>   		return PTR_ERR(ctx);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
> index 2bf19730eaa9..f8f6e6c960a7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
> @@ -41,6 +41,8 @@ struct i915_gem_context {
>   	/** file_priv: owning file descriptor */
>   	struct drm_i915_file_private *file_priv;
>   
> +	struct i915_timeline *timeline;
> +
>   	/**
>   	 * @ppgtt: unique address space (GTT)
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 9533a85cb0b3..09046a15d218 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -993,6 +993,60 @@ void i915_request_skip(struct i915_request *rq, int error)
>   	memset(vaddr + head, 0, rq->postfix - head);
>   }
>   
> +static struct i915_request *
> +__i915_request_await_timeline(struct i915_request *rq)
> +{
> +	struct i915_timeline *timeline = rq->timeline;
> +	struct i915_request *prev;
> +
> +	/*
> +	 * Dependency tracking and request ordering along the timeline
> +	 * is special cased so that we can eliminate redundant ordering
> +	 * operations while building the request (we know that the timeline
> +	 * itself is order, and here we guarantee it).
> +	 *
> +	 * As we know we will need to emit tracking along the timeline,
> +	 * we embed the hooks into our request struct -- at the cost of
> +	 * having to have specialised no-allocation interfaces (which will
> +	 * be beneficial elsewhere).
> +	 *
> +	 * A second benefit to open-coding i915_request_await_request is
> +	 * that we can apply a slight variant of the rules specialised
> +	 * for timelines that jump between engines (such as virtual engines).
> +	 * If we consider the case of virtual engine, we must emit a dma-fence
> +	 * to prevent scheduling of the second request until the first is
> +	 * complete (to maximise our greedy late load balancing) and this
> +	 * precludes optimising to use semaphores serialisation of a single
> +	 * timeline across engines.
> +	 */
> +	prev = i915_active_request_raw(&timeline->last_request,
> +				       &rq->i915->drm.struct_mutex);
> +	if (prev && !i915_request_completed(prev)) {
> +		if (is_power_of_2(prev->engine->mask | rq->engine->mask))
> +			i915_sw_fence_await_sw_fence(&rq->submit,
> +						     &prev->submit,
> +						     &rq->submitq);
> +		else
> +			__i915_sw_fence_await_dma_fence(&rq->submit,
> +							&prev->fence,
> +							&rq->dmaq);
> +		if (rq->engine->schedule)
> +			__i915_sched_node_add_dependency(&rq->sched,
> +							 &prev->sched,
> +							 &rq->dep,
> +							 0);
> +	}
> +
> +	spin_lock_irq(&timeline->lock);
> +	list_add_tail(&rq->link, &timeline->requests);
> +	spin_unlock_irq(&timeline->lock);
> +
> +	GEM_BUG_ON(timeline->seqno != rq->fence.seqno);
> +	__i915_active_request_set(&timeline->last_request, rq);

Minor but would maybe consider renaming the function since it does more 
than sets up awaits. __i915_request_add_to_timeline?

> +
> +	return prev;
> +}
> +
>   /*
>    * NB: This function is not allowed to fail. Doing so would mean the the
>    * request is not being tracked for completion but the work itself is
> @@ -1037,31 +1091,7 @@ void i915_request_add(struct i915_request *request)
>   	GEM_BUG_ON(IS_ERR(cs));
>   	request->postfix = intel_ring_offset(request, cs);
>   
> -	/*
> -	 * Seal the request and mark it as pending execution. Note that
> -	 * we may inspect this state, without holding any locks, during
> -	 * hangcheck. Hence we apply the barrier to ensure that we do not
> -	 * see a more recent value in the hws than we are tracking.
> -	 */
> -
> -	prev = i915_active_request_raw(&timeline->last_request,
> -				       &request->i915->drm.struct_mutex);
> -	if (prev && !i915_request_completed(prev)) {
> -		i915_sw_fence_await_sw_fence(&request->submit, &prev->submit,
> -					     &request->submitq);
> -		if (engine->schedule)
> -			__i915_sched_node_add_dependency(&request->sched,
> -							 &prev->sched,
> -							 &request->dep,
> -							 0);
> -	}
> -
> -	spin_lock_irq(&timeline->lock);
> -	list_add_tail(&request->link, &timeline->requests);
> -	spin_unlock_irq(&timeline->lock);
> -
> -	GEM_BUG_ON(timeline->seqno != request->fence.seqno);
> -	__i915_active_request_set(&timeline->last_request, request);
> +	prev = __i915_request_await_timeline(request);
>   
>   	list_add_tail(&request->ring_link, &ring->request_list);
>   	if (list_is_first(&request->ring_link, &ring->request_list)) {
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index 8c8fa5010644..cd6c130964cd 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -128,7 +128,10 @@ struct i915_request {
>   	 * It is used by the driver to then queue the request for execution.
>   	 */
>   	struct i915_sw_fence submit;
> -	wait_queue_entry_t submitq;
> +	union {
> +		wait_queue_entry_t submitq;
> +		struct i915_sw_dma_fence_cb dmaq;
> +	};
>   	struct list_head execute_cb;
>   
>   	/*
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.c b/drivers/gpu/drm/i915/i915_sw_fence.c
> index 8d1400d378d7..5387aafd3424 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.c
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.c
> @@ -359,11 +359,6 @@ int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
>   	return __i915_sw_fence_await_sw_fence(fence, signaler, NULL, gfp);
>   }
>   
> -struct i915_sw_dma_fence_cb {
> -	struct dma_fence_cb base;
> -	struct i915_sw_fence *fence;
> -};
> -
>   struct i915_sw_dma_fence_cb_timer {
>   	struct i915_sw_dma_fence_cb base;
>   	struct dma_fence *dma;
> @@ -480,6 +475,40 @@ int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   	return ret;
>   }
>   
> +static void __dma_i915_sw_fence_wake(struct dma_fence *dma,
> +				     struct dma_fence_cb *data)
> +{
> +	struct i915_sw_dma_fence_cb *cb = container_of(data, typeof(*cb), base);
> +
> +	i915_sw_fence_complete(cb->fence);
> +}
> +
> +int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
> +				    struct dma_fence *dma,
> +				    struct i915_sw_dma_fence_cb *cb)
> +{
> +	int ret;
> +
> +	debug_fence_assert(fence);
> +
> +	if (dma_fence_is_signaled(dma))
> +		return 0;
> +
> +	cb->fence = fence;
> +	i915_sw_fence_await(fence);
> +
> +	ret = dma_fence_add_callback(dma, &cb->base, __dma_i915_sw_fence_wake);
> +	if (ret == 0) {
> +		ret = 1;
> +	} else {
> +		i915_sw_fence_complete(fence);
> +		if (ret == -ENOENT) /* fence already signaled */
> +			ret = 0;
> +	}
> +
> +	return ret;
> +}
> +
>   int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    struct reservation_object *resv,
>   				    const struct dma_fence_ops *exclude,
> diff --git a/drivers/gpu/drm/i915/i915_sw_fence.h b/drivers/gpu/drm/i915/i915_sw_fence.h
> index 6dec9e1d1102..9cb5c3b307a6 100644
> --- a/drivers/gpu/drm/i915/i915_sw_fence.h
> +++ b/drivers/gpu/drm/i915/i915_sw_fence.h
> @@ -9,14 +9,13 @@
>   #ifndef _I915_SW_FENCE_H_
>   #define _I915_SW_FENCE_H_
>   
> +#include <linux/dma-fence.h>
>   #include <linux/gfp.h>
>   #include <linux/kref.h>
>   #include <linux/notifier.h> /* for NOTIFY_DONE */
>   #include <linux/wait.h>
>   
>   struct completion;
> -struct dma_fence;
> -struct dma_fence_ops;
>   struct reservation_object;
>   
>   struct i915_sw_fence {
> @@ -68,10 +67,20 @@ int i915_sw_fence_await_sw_fence(struct i915_sw_fence *fence,
>   int i915_sw_fence_await_sw_fence_gfp(struct i915_sw_fence *fence,
>   				     struct i915_sw_fence *after,
>   				     gfp_t gfp);
> +
> +struct i915_sw_dma_fence_cb {
> +	struct dma_fence_cb base;
> +	struct i915_sw_fence *fence;
> +};
> +
> +int __i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
> +				    struct dma_fence *dma,
> +				    struct i915_sw_dma_fence_cb *cb);
>   int i915_sw_fence_await_dma_fence(struct i915_sw_fence *fence,
>   				  struct dma_fence *dma,
>   				  unsigned long timeout,
>   				  gfp_t gfp);
> +
>   int i915_sw_fence_await_reservation(struct i915_sw_fence *fence,
>   				    struct reservation_object *resv,
>   				    const struct dma_fence_ops *exclude,
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 748352d513d6..7b938eaff9c5 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2809,7 +2809,10 @@ populate_lr_context(struct intel_context *ce,
>   
>   static struct i915_timeline *get_timeline(struct i915_gem_context *ctx)
>   {
> -	return i915_timeline_create(ctx->i915, ctx->name, NULL);
> +	if (ctx->timeline)
> +		return i915_timeline_get(ctx->timeline);
> +	else
> +		return i915_timeline_create(ctx->i915, ctx->name, NULL);
>   }
>   
>   static int execlists_context_deferred_alloc(struct intel_context *ce,
> diff --git a/drivers/gpu/drm/i915/selftests/i915_gem_context.c b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> index c4a5cf26992e..3e5e384d00d5 100644
> --- a/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/selftests/i915_gem_context.c
> @@ -76,7 +76,7 @@ static int live_nop_switch(void *arg)
>   	}
>   
>   	for (n = 0; n < nctx; n++) {
> -		ctx[n] = i915_gem_create_context(i915, file->driver_priv);
> +		ctx[n] = i915_gem_create_context(i915, file->driver_priv, 0);
>   		if (IS_ERR(ctx[n])) {
>   			err = PTR_ERR(ctx[n]);
>   			goto out_unlock;
> @@ -526,7 +526,8 @@ static int igt_ctx_exec(void *arg)
>   			struct i915_gem_context *ctx;
>   			intel_wakeref_t wakeref;
>   
> -			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			ctx = i915_gem_create_context(i915,
> +						      file->driver_priv, 0);
>   			if (IS_ERR(ctx)) {
>   				err = PTR_ERR(ctx);
>   				goto out_unlock;
> @@ -611,7 +612,7 @@ static int igt_shared_ctx_exec(void *arg)
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   
> -	parent = i915_gem_create_context(i915, file->driver_priv);
> +	parent = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(parent)) {
>   		err = PTR_ERR(parent);
>   		goto out_unlock;
> @@ -645,7 +646,8 @@ static int igt_shared_ctx_exec(void *arg)
>   			if (ctx)
>   				__destroy_hw_context(ctx, file->driver_priv);
>   
> -			ctx = i915_gem_create_context(i915, file->driver_priv);
> +			ctx = i915_gem_create_context(i915,
> +						      file->driver_priv, 0);
>   			if (IS_ERR(ctx)) {
>   				err = PTR_ERR(ctx);
>   				goto out_unlock;
> @@ -1087,7 +1089,7 @@ __igt_ctx_sseu(struct drm_i915_private *i915,
>   
>   	mutex_lock(&i915->drm.struct_mutex);
>   
> -	ctx = i915_gem_create_context(i915, file->driver_priv);
> +	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx)) {
>   		ret = PTR_ERR(ctx);
>   		goto out_unlock;
> @@ -1197,7 +1199,7 @@ static int igt_ctx_readonly(void *arg)
>   	if (err)
>   		goto out_unlock;
>   
> -	ctx = i915_gem_create_context(i915, file->driver_priv);
> +	ctx = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx)) {
>   		err = PTR_ERR(ctx);
>   		goto out_unlock;
> @@ -1523,13 +1525,13 @@ static int igt_vm_isolation(void *arg)
>   	if (err)
>   		goto out_unlock;
>   
> -	ctx_a = i915_gem_create_context(i915, file->driver_priv);
> +	ctx_a = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx_a)) {
>   		err = PTR_ERR(ctx_a);
>   		goto out_unlock;
>   	}
>   
> -	ctx_b = i915_gem_create_context(i915, file->driver_priv);
> +	ctx_b = i915_gem_create_context(i915, file->driver_priv, 0);
>   	if (IS_ERR(ctx_b)) {
>   		err = PTR_ERR(ctx_b);
>   		goto out_unlock;
> diff --git a/drivers/gpu/drm/i915/selftests/mock_context.c b/drivers/gpu/drm/i915/selftests/mock_context.c
> index f90328b21763..1d6dc2fe36ab 100644
> --- a/drivers/gpu/drm/i915/selftests/mock_context.c
> +++ b/drivers/gpu/drm/i915/selftests/mock_context.c
> @@ -94,7 +94,7 @@ live_context(struct drm_i915_private *i915, struct drm_file *file)
>   {
>   	lockdep_assert_held(&i915->drm.struct_mutex);
>   
> -	return i915_gem_create_context(i915, file->driver_priv);
> +	return i915_gem_create_context(i915, file->driver_priv, 0);
>   }
>   
>   struct i915_gem_context *
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 0db92a4153c8..007d77ff7295 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1454,8 +1454,9 @@ struct drm_i915_gem_context_create_ext {
>   	__u32 ctx_id; /* output: id of new context*/
>   	__u32 flags;
>   #define I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS	(1u << 0)
> +#define I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE	(1u << 1)
>   #define I915_CONTEXT_CREATE_FLAGS_UNKNOWN \
> -	(-(I915_CONTEXT_CREATE_FLAGS_USE_EXTENSIONS << 1))
> +	(-(I915_CONTEXT_CREATE_FLAGS_SINGLE_TIMELINE << 1))
>   	__u64 extensions;
>   };
>   
> 

LGTM.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation
  2019-03-08 14:12 ` [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
@ 2019-03-08 16:13   ` Tvrtko Ursulin
  2019-03-08 16:34     ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-08 16:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> A usecase arose out of handling context recovery in mesa, whereby they
> wish to recreate a context with fresh logical state but preserving all
> other details of the original. Currently, they create a new context and
> iterate over which bits they want to copy across, but it would much more
> convenient if they were able to just pass in a target context to clone
> during creation. This essentially extends the setparam during creation
> to pull the details from a target context instead of the user supplied
> parameters.

This one is not used by media so it will likely have to find a separate 
route upstream.

> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 90 +++++++++++++++++++++++++
>   include/uapi/drm/i915_drm.h             | 14 ++++
>   2 files changed, 104 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 310892b42b68..2cfc68b66944 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1428,8 +1428,98 @@ static int create_setparam(struct i915_user_extension __user *ext, void *data)
>   	return ctx_setparam(data, &local.setparam);
>   }
>   
> +static int clone_sseu(struct i915_gem_context *dst,
> +		      struct i915_gem_context *src)
> +{
> +	const struct intel_sseu default_sseu =
> +		intel_device_default_sseu(dst->i915);
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +
> +	for_each_engine(engine, dst->i915, id) {
> +		struct intel_context *ce;
> +		struct intel_sseu sseu;
> +
> +		ce = intel_context_lookup(src, engine);
> +		if (!ce)
> +			continue;
> +
> +		sseu = ce->sseu;
> +		if (!memcmp(&sseu, &default_sseu, sizeof(sseu)))
> +			continue;
> +
> +		ce = intel_context_pin_lock(dst, engine);
> +		if (IS_ERR(ce))
> +			return PTR_ERR(ce);
> +
> +		ce->sseu = sseu;
> +		intel_context_pin_unlock(ce);
> +	}
> +
> +	return 0;
> +}
> +
> +static int create_clone(struct i915_user_extension __user *ext, void *data)
> +{
> +	struct drm_i915_gem_context_create_ext_clone local;
> +	struct i915_gem_context *dst = data;
> +	struct i915_gem_context *src;
> +	int err;
> +
> +	if (copy_from_user(&local, ext, sizeof(local)))
> +		return -EFAULT;
> +
> +	if (local.flags & I915_CONTEXT_CLONE_UNKNOWN)
> +		return -EINVAL;
> +
> +	if (local.rsvd)
> +		return -EINVAL;
> +
> +	if (local.clone == dst->user_handle) /* good guess! denied. */
> +		return -ENOENT;

:) Good one, but put a more obvious comment like "Cannot clone itself!".

> +
> +	rcu_read_lock();
> +	src = __i915_gem_context_lookup_rcu(dst->file_priv, local.clone);
> +	rcu_read_unlock();
> +	if (!src)
> +		return -ENOENT;
> +
> +	GEM_BUG_ON(src == dst);
> +
> +	if (local.flags & I915_CONTEXT_CLONE_FLAGS)
> +		dst->user_flags = src->user_flags;
> +
> +	if (local.flags & I915_CONTEXT_CLONE_SCHED)
> +		dst->sched = src->sched;
> +
> +	if (local.flags & I915_CONTEXT_CLONE_SSEU) {
> +		err = clone_sseu(dst, src);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (local.flags & I915_CONTEXT_CLONE_TIMELINE && src->timeline) {

Do we want to error out if no timeline and cloning was requested?

> +		if (dst->timeline)
> +			i915_timeline_put(dst->timeline);
> +		dst->timeline = i915_timeline_get(src->timeline);

What prevents a different thread from changing either context in 
parallel and making reference counting go bad?

> +	}
> +
> +	if (local.flags & I915_CONTEXT_CLONE_VM && src->ppgtt) {

Also fail if impossible was requested?

> +		GEM_BUG_ON(dst->ppgtt == src->ppgtt);

Hm... what prevents this? Set_vm extension followed by clone could 
trigger it I think.

> +
> +		if (dst->ppgtt)
> +			i915_ppgtt_put(dst->ppgtt);
> +
> +		dst->ppgtt = i915_ppgtt_get(src->ppgtt);
> +		i915_ppgtt_open(dst->ppgtt);

Also some locking is needed I think to make the exchange atomic.

Could use __assign_ppgtt?

> +	}
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn create_extensions[] = {
>   	[I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
> +	[I915_CONTEXT_CREATE_EXT_CLONE] = create_clone,
>   };
>   
>   static bool client_is_banned(struct drm_i915_file_private *file_priv)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 007d77ff7295..50d154954d5f 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1579,6 +1579,20 @@ struct drm_i915_gem_context_create_ext_setparam {
>   	struct drm_i915_gem_context_param setparam;
>   };
>   
> +struct drm_i915_gem_context_create_ext_clone {
> +#define I915_CONTEXT_CREATE_EXT_CLONE 1
> +	struct i915_user_extension base;
> +	__u32 clone;

id, clone_id, source_id?

> +	__u32 flags;
> +#define I915_CONTEXT_CLONE_FLAGS	(1u << 0)
> +#define I915_CONTEXT_CLONE_SCHED	(1u << 1)
> +#define I915_CONTEXT_CLONE_SSEU		(1u << 2)
> +#define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
> +#define I915_CONTEXT_CLONE_VM		(1u << 4)
> +#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
> +	__u64 rsvd;
> +};
> +
>   struct drm_i915_gem_context_destroy {
>   	__u32 ctx_id;
>   	__u32 pad;
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-08 14:12 ` [PATCH 08/13] drm/i915: Allow a context to define its set of engines Chris Wilson
@ 2019-03-08 16:27   ` Tvrtko Ursulin
  2019-03-08 16:47     ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-08 16:27 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> Over the last few years, we have debated how to extend the user API to
> support an increase in the number of engines, that may be sparse and
> even be heterogeneous within a class (not all video decoders created
> equal). We settled on using (class, instance) tuples to identify a
> specific engine, with an API for the user to construct a map of engines
> to capabilities. Into this picture, we then add a challenge of virtual
> engines; one user engine that maps behind the scenes to any number of
> physical engines. To keep it general, we want the user to have full
> control over that mapping. To that end, we allow the user to constrain a
> context to define the set of engines that it can access, order fully
> controlled by the user via (class, instance). With such precise control
> in context setup, we can continue to use the existing execbuf uABI of
> specifying a single index; only now it doesn't automagically map onto
> the engines, it uses the user defined engine map from the context.
> 
> The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
> execbuf. It's use will be revealed in the next patch.
> 
> v2: Fixup freeing of local on success of get_engines()
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c       | 204 +++++++++++++++++-
>   drivers/gpu/drm/i915/i915_gem_context_types.h |   4 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  22 +-
>   include/uapi/drm/i915_drm.h                   |  42 +++-
>   4 files changed, 259 insertions(+), 13 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 2cfc68b66944..86d9bea6f275 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -101,6 +101,21 @@ static struct i915_global_gem_context {
>   	struct kmem_cache *slab_luts;
>   } global;
>   
> +static struct intel_engine_cs *
> +lookup_user_engine(struct i915_gem_context *ctx,
> +		   unsigned long flags, u16 class, u16 instance)
> +#define LOOKUP_USER_INDEX BIT(0)
> +{
> +	if (flags & LOOKUP_USER_INDEX) {
> +		if (instance >= ctx->nengine)
> +			return NULL;
> +
> +		return ctx->engines[instance];
> +	}
> +
> +	return intel_engine_lookup_user(ctx->i915, class, instance);
> +}
> +
>   struct i915_lut_handle *i915_lut_handle_alloc(void)
>   {
>   	return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
> @@ -234,6 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
>   
> +	kfree(ctx->engines);
> +
>   	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>   		it->ops->destroy(it);
>   
> @@ -1311,9 +1328,9 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	if (user_sseu.flags || user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = intel_engine_lookup_user(i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> +	engine = lookup_user_engine(ctx, 0,
> +				    user_sseu.engine_class,
> +				    user_sseu.engine_instance);
>   	if (!engine)
>   		return -EINVAL;
>   
> @@ -1331,9 +1348,154 @@ static int set_sseu(struct i915_gem_context *ctx,
>   
>   	args->size = sizeof(user_sseu);
>   
> +	return 0;
> +};
> +
> +struct set_engines {
> +	struct i915_gem_context *ctx;
> +	struct intel_engine_cs **engines;
> +	unsigned int nengine;
> +};
> +
> +static const i915_user_extension_fn set_engines__extensions[] = {
> +};
> +
> +static int
> +set_engines(struct i915_gem_context *ctx,
> +	    const struct drm_i915_gem_context_param *args)
> +{
> +	struct i915_context_param_engines __user *user;
> +	struct set_engines set = { .ctx = ctx };
> +	u64 size, extensions;
> +	unsigned int n;
> +	int err;
> +
> +	user = u64_to_user_ptr(args->value);
> +	size = args->size;
> +	if (!size)
> +		goto out;

This prevents a hypothetical extension with empty map data.

> +
> +	BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
> +	if (size < sizeof(*user) || size % sizeof(*user->class_instance))

IS_ALIGNED for the second condition for consistency with the BUILD_BUG_ON?

> +		return -EINVAL;
> +
> +	set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
> +	if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK + 1)

I would prefer we drop the size restriction since it doesn't apply to 
the engine map per se.

> +		return -EINVAL;
> +
> +	set.engines = kmalloc_array(set.nengine,
> +				    sizeof(*set.engines),
> +				    GFP_KERNEL);
> +	if (!set.engines)
> +		return -ENOMEM;
> +
> +	for (n = 0; n < set.nengine; n++) {
> +		u16 class, inst;
> +
> +		if (get_user(class, &user->class_instance[n].engine_class) ||
> +		    get_user(inst, &user->class_instance[n].engine_instance)) {
> +			kfree(set.engines);
> +			return -EFAULT;
> +		}
> +
> +		if (class == (u16)I915_ENGINE_CLASS_INVALID &&
> +		    inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
> +			set.engines[n] = NULL;
> +			continue;
> +		}
> +
> +		set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
> +		if (!set.engines[n]) {
> +			kfree(set.engines);
> +			return -ENOENT;
> +		}
> +	}
> +
> +	err = -EFAULT;
> +	if (!get_user(extensions, &user->extensions))
> +		err = i915_user_extensions(u64_to_user_ptr(extensions),
> +					   set_engines__extensions,
> +					   ARRAY_SIZE(set_engines__extensions),
> +					   &set);
> +	if (err) {
> +		kfree(set.engines);
> +		return err;
> +	}
> +
> +out:
> +	mutex_lock(&ctx->i915->drm.struct_mutex);
> +	kfree(ctx->engines);
> +	ctx->engines = set.engines;
> +	ctx->nengine = set.nengine;
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
>   	return 0;
>   }
>   
> +static int
> +get_engines(struct i915_gem_context *ctx,
> +	    struct drm_i915_gem_context_param *args)
> +{
> +	struct i915_context_param_engines *local;
> +	unsigned int n, count, size;
> +	int err = 0;
> +
> +restart:
> +	count = READ_ONCE(ctx->nengine);
> +	if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
> +		return -ENOMEM; /* unrepresentable! */

Probably overly paranoid since we can't end up with this state set.

> +
> +	size = sizeof(*local) + count * sizeof(*local->class_instance);
> +	if (!args->size) {
> +		args->size = size;
> +		return 0;
> +	}
> +	if (args->size < size)
> +		return -EINVAL;
> +
> +	local = kmalloc(size, GFP_KERNEL);
> +	if (!local)
> +		return -ENOMEM;
> +
> +	if (mutex_lock_interruptible(&ctx->i915->drm.struct_mutex)) {
> +		err = -EINTR;
> +		goto out;
> +	}
> +
> +	if (READ_ONCE(ctx->nengine) != count) {
> +		mutex_unlock(&ctx->i915->drm.struct_mutex);
> +		kfree(local);
> +		goto restart;
> +	}
> +
> +	local->extensions = 0;
> +	for (n = 0; n < count; n++) {
> +		if (ctx->engines[n]) {
> +			local->class_instance[n].engine_class =
> +				ctx->engines[n]->uabi_class;
> +			local->class_instance[n].engine_instance =
> +				ctx->engines[n]->instance;
> +		} else {
> +			local->class_instance[n].engine_class =
> +				I915_ENGINE_CLASS_INVALID;
> +			local->class_instance[n].engine_instance =
> +				I915_ENGINE_CLASS_INVALID_NONE;
> +		}
> +	}
> +
> +	mutex_unlock(&ctx->i915->drm.struct_mutex);
> +
> +	if (copy_to_user(u64_to_user_ptr(args->value), local, size)) {
> +		err = -EFAULT;
> +		goto out;
> +	}
> +	args->size = size;
> +
> +out:
> +	kfree(local);
> +	return err;
> +}
> +
>   static int ctx_setparam(struct i915_gem_context *ctx,
>   			struct drm_i915_gem_context_param *args)
>   {
> @@ -1406,6 +1568,10 @@ static int ctx_setparam(struct i915_gem_context *ctx,
>   		ret = set_ppgtt(ctx, args);
>   		break;
>   
> +	case I915_CONTEXT_PARAM_ENGINES:
> +		ret = set_engines(ctx, args);
> +		break;
> +
>   	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
> @@ -1459,6 +1625,22 @@ static int clone_sseu(struct i915_gem_context *dst,
>   	return 0;
>   }
>   
> +static int clone_engines(struct i915_gem_context *dst,
> +			 struct i915_gem_context *src)
> +{
> +	struct intel_engine_cs **engines;
> +
> +	engines = kmemdup(src->engines,
> +			  sizeof(*src->engines) * src->nengine,
> +			  GFP_KERNEL);
> +	if (!engines)
> +		return -ENOMEM;
> +
> +	dst->engines = engines;
> +	dst->nengine = src->nengine;
> +	return 0;
> +}
> +
>   static int create_clone(struct i915_user_extension __user *ext, void *data)
>   {
>   	struct drm_i915_gem_context_create_ext_clone local;
> @@ -1514,6 +1696,12 @@ static int create_clone(struct i915_user_extension __user *ext, void *data)
>   		i915_ppgtt_open(dst->ppgtt);
>   	}
>   
> +	if (local.flags & I915_CONTEXT_CLONE_ENGINES && src->nengine) {
> +		err = clone_engines(dst, src);
> +		if (err)
> +			return err;
> +	}
> +
>   	return 0;
>   }
>   
> @@ -1632,9 +1820,9 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	if (user_sseu.flags || user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = intel_engine_lookup_user(ctx->i915,
> -					  user_sseu.engine_class,
> -					  user_sseu.engine_instance);
> +	engine = lookup_user_engine(ctx, 0,
> +				    user_sseu.engine_class,
> +				    user_sseu.engine_instance);
>   	if (!engine)
>   		return -EINVAL;
>   
> @@ -1715,6 +1903,10 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
>   		ret = get_ppgtt(ctx, args);
>   		break;
>   
> +	case I915_CONTEXT_PARAM_ENGINES:
> +		ret = get_engines(ctx, args);
> +		break;
> +
>   	case I915_CONTEXT_PARAM_BAN_PERIOD:
>   	default:
>   		ret = -EINVAL;
> diff --git a/drivers/gpu/drm/i915/i915_gem_context_types.h b/drivers/gpu/drm/i915/i915_gem_context_types.h
> index f8f6e6c960a7..8a89f3053f73 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context_types.h
> +++ b/drivers/gpu/drm/i915/i915_gem_context_types.h
> @@ -41,6 +41,8 @@ struct i915_gem_context {
>   	/** file_priv: owning file descriptor */
>   	struct drm_i915_file_private *file_priv;
>   
> +	struct intel_engine_cs **engines;
> +
>   	struct i915_timeline *timeline;
>   
>   	/**
> @@ -110,6 +112,8 @@ struct i915_gem_context {
>   #define CONTEXT_CLOSED			1
>   #define CONTEXT_FORCE_SINGLE_SUBMISSION	2
>   
> +	unsigned int nengine;
> +
>   	/**
>   	 * @hw_id: - unique identifier for the context
>   	 *
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index ee6d301a9627..67e4a0c2ebff 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -2090,13 +2090,23 @@ static const enum intel_engine_id user_ring_map[I915_USER_RINGS + 1] = {
>   };
>   
>   static struct intel_engine_cs *
> -eb_select_engine(struct drm_i915_private *dev_priv,
> +eb_select_engine(struct i915_execbuffer *eb,
>   		 struct drm_file *file,
>   		 struct drm_i915_gem_execbuffer2 *args)
>   {
>   	unsigned int user_ring_id = args->flags & I915_EXEC_RING_MASK;
>   	struct intel_engine_cs *engine;
>   
> +	if (eb->ctx->engines) {
> +		if (user_ring_id >= eb->ctx->nengine) {
> +			DRM_DEBUG("execbuf with unknown ring: %u\n",
> +				  user_ring_id);
> +			return NULL;
> +		}
> +
> +		return eb->ctx->engines[user_ring_id];
> +	}
> +
>   	if (user_ring_id > I915_USER_RINGS) {
>   		DRM_DEBUG("execbuf with unknown ring: %u\n", user_ring_id);
>   		return NULL;
> @@ -2109,11 +2119,11 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   		return NULL;
>   	}
>   
> -	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(dev_priv, VCS1)) {
> +	if (user_ring_id == I915_EXEC_BSD && HAS_ENGINE(eb->i915, VCS1)) {
>   		unsigned int bsd_idx = args->flags & I915_EXEC_BSD_MASK;
>   
>   		if (bsd_idx == I915_EXEC_BSD_DEFAULT) {
> -			bsd_idx = gen8_dispatch_bsd_engine(dev_priv, file);
> +			bsd_idx = gen8_dispatch_bsd_engine(eb->i915, file);
>   		} else if (bsd_idx >= I915_EXEC_BSD_RING1 &&
>   			   bsd_idx <= I915_EXEC_BSD_RING2) {
>   			bsd_idx >>= I915_EXEC_BSD_SHIFT;
> @@ -2124,9 +2134,9 @@ eb_select_engine(struct drm_i915_private *dev_priv,
>   			return NULL;
>   		}
>   
> -		engine = dev_priv->engine[_VCS(bsd_idx)];
> +		engine = eb->i915->engine[_VCS(bsd_idx)];
>   	} else {
> -		engine = dev_priv->engine[user_ring_map[user_ring_id]];
> +		engine = eb->i915->engine[user_ring_map[user_ring_id]];
>   	}
>   
>   	if (!engine) {
> @@ -2336,7 +2346,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   	if (unlikely(err))
>   		goto err_destroy;
>   
> -	eb.engine = eb_select_engine(eb.i915, file, args);
> +	eb.engine = eb_select_engine(&eb, file, args);
>   	if (!eb.engine) {
>   		err = -EINVAL;
>   		goto err_engine;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 50d154954d5f..00147b990e63 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -124,6 +124,8 @@ enum drm_i915_gem_engine_class {
>   	I915_ENGINE_CLASS_INVALID	= -1
>   };
>   
> +#define I915_ENGINE_CLASS_INVALID_NONE -1
> +
>   /**
>    * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
>    *
> @@ -1509,6 +1511,26 @@ struct drm_i915_gem_context_param {
>   	 * See DRM_I915_GEM_VM_CREATE and DRM_I915_GEM_VM_DESTROY.
>   	 */
>   #define I915_CONTEXT_PARAM_VM		0x9
> +
> +/*
> + * I915_CONTEXT_PARAM_ENGINES:
> + *
> + * Bind this context to operate on this subset of available engines. Henceforth,
> + * the I915_EXEC_RING selector for DRM_IOCTL_I915_GEM_EXECBUFFER2 operates as
> + * an index into this array of engines; I915_EXEC_DEFAULT selecting engine[0]
> + * and upwards. Slots 0...N are filled in using the specified (class, instance).
> + * Use
> + *	engine_class: I915_ENGINE_CLASS_INVALID,
> + *	engine_instance: I915_ENGINE_CLASS_INVALID_NONE
> + * to specify a gap in the array that can be filled in later, e.g. by a
> + * virtual engine used for load balancing.
> + *
> + * Setting the number of engines bound to the context to 0, by passing a zero
> + * sized argument, will revert back to default settings.
> + *
> + * See struct i915_context_param_engines.
> + */
> +#define I915_CONTEXT_PARAM_ENGINES	0xa
>   /* Must be kept compact -- no holes and well documented */
>   
>   	__u64 value;
> @@ -1573,6 +1595,23 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +struct i915_context_param_engines {
> +	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
> +
> +	struct {
> +		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> +		__u16 engine_instance;
> +	} class_instance[0];
> +} __attribute__((packed));
> +
> +#define I915_DEFINE_CONTEXT_PARAM_ENGINES(name__, N__) struct { \
> +	__u64 extensions; \
> +	struct { \
> +		__u16 engine_class; \
> +		__u16 engine_instance; \
> +	} class_instance[N__]; \
> +} __attribute__((packed)) name__
> +
>   struct drm_i915_gem_context_create_ext_setparam {
>   #define I915_CONTEXT_CREATE_EXT_SETPARAM 0
>   	struct i915_user_extension base;
> @@ -1589,7 +1628,8 @@ struct drm_i915_gem_context_create_ext_clone {
>   #define I915_CONTEXT_CLONE_SSEU		(1u << 2)
>   #define I915_CONTEXT_CLONE_TIMELINE	(1u << 3)
>   #define I915_CONTEXT_CLONE_VM		(1u << 4)
> -#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_VM << 1)
> +#define I915_CONTEXT_CLONE_ENGINES	(1u << 5)
> +#define I915_CONTEXT_CLONE_UNKNOWN -(I915_CONTEXT_CLONE_ENGINES << 1)
>   	__u64 rsvd;
>   };
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-08 14:12 ` [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
@ 2019-03-08 16:31   ` Tvrtko Ursulin
  2019-03-08 16:57     ` Chris Wilson
  2019-03-08 17:11     ` Chris Wilson
  0 siblings, 2 replies; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-08 16:31 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> Allow the user to specify a local engine index (as opposed to
> class:index) that they can use to refer to a preset engine inside the
> ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
> This will be useful for setting SSEU parameters on virtual engines that
> are local to the context and do not have a valid global class:instance
> lookup.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
>   include/uapi/drm/i915_drm.h             |  3 ++-
>   2 files changed, 22 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 86d9bea6f275..a581c01ffff1 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1313,6 +1313,7 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	struct drm_i915_gem_context_param_sseu user_sseu;
>   	struct intel_engine_cs *engine;
>   	struct intel_sseu sseu;
> +	unsigned long lookup;

Poor 32-bit builds. ;) And in lookup_user_engine as well.

>   	int ret;
>   
>   	if (args->size < sizeof(user_sseu))
> @@ -1325,10 +1326,17 @@ static int set_sseu(struct i915_gem_context *ctx,
>   			   sizeof(user_sseu)))
>   		return -EFAULT;
>   
> -	if (user_sseu.flags || user_sseu.rsvd)
> +	if (user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = lookup_user_engine(ctx, 0,
> +	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
> +		return -EINVAL;
> +
> +	lookup = 0;
> +	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
> +		lookup |= LOOKUP_USER_INDEX;
> +
> +	engine = lookup_user_engine(ctx, lookup,
>   				    user_sseu.engine_class,
>   				    user_sseu.engine_instance);
>   	if (!engine)
> @@ -1807,6 +1815,7 @@ static int get_sseu(struct i915_gem_context *ctx,
>   	struct drm_i915_gem_context_param_sseu user_sseu;
>   	struct intel_engine_cs *engine;
>   	struct intel_context *ce;
> +	unsigned long lookup;
>   
>   	if (args->size == 0)
>   		goto out;
> @@ -1817,10 +1826,17 @@ static int get_sseu(struct i915_gem_context *ctx,
>   			   sizeof(user_sseu)))
>   		return -EFAULT;
>   
> -	if (user_sseu.flags || user_sseu.rsvd)
> +	if (user_sseu.rsvd)
>   		return -EINVAL;
>   
> -	engine = lookup_user_engine(ctx, 0,
> +	if (user_sseu.flags & ~(I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX))
> +		return -EINVAL;
> +
> +	lookup = 0;
> +	if (user_sseu.flags & I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX)
> +		lookup |= LOOKUP_USER_INDEX;
> +
> +	engine = lookup_user_engine(ctx, lookup,
>   				    user_sseu.engine_class,
>   				    user_sseu.engine_instance);
>   	if (!engine)
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 00147b990e63..a609619610f2 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1565,9 +1565,10 @@ struct drm_i915_gem_context_param_sseu {
>   	__u16 engine_instance;
>   
>   	/*
> -	 * Unused for now. Must be cleared to zero.
> +	 * Unknown flags must be cleared to zero.
>   	 */
>   	__u32 flags;
> +#define I915_CONTEXT_SSEU_FLAG_ENGINE_INDEX (1u << 0)
>   
>   	/*
>   	 * Mask of slices to enable for the context. Valid values are a subset
> 

Looks okay. But one more thing is needed:

https://cgit.freedesktop.org/~tursulin/drm-intel/commit/?h=media&id=38266bfe99469de9e13774a13fa641c377988c67

If you don't disagree with this patch feel free to adopt it into your tree.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation
  2019-03-08 16:13   ` Tvrtko Ursulin
@ 2019-03-08 16:34     ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 16:34 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-08 16:13:56)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > A usecase arose out of handling context recovery in mesa, whereby they
> > wish to recreate a context with fresh logical state but preserving all
> > other details of the original. Currently, they create a new context and
> > iterate over which bits they want to copy across, but it would much more
> > convenient if they were able to just pass in a target context to clone
> > during creation. This essentially extends the setparam during creation
> > to pull the details from a target context instead of the user supplied
> > parameters.
> 
> This one is not used by media so it will likely have to find a separate 
> route upstream.

Eh?

However, I do think there is one quite handy usecase:

i915_query -> engines[]
gem_context_set_param(0 /* default context */, ENGINES, engines[]);

Then whenever you want a context,
ctx = gem_context_clone(0, CLONE_ENGINES);
so that you don't have to store your query results, or build a param for
every create.

> > +static int create_clone(struct i915_user_extension __user *ext, void *data)
> > +{
> > +     struct drm_i915_gem_context_create_ext_clone local;
> > +     struct i915_gem_context *dst = data;
> > +     struct i915_gem_context *src;
> > +     int err;
> > +
> > +     if (copy_from_user(&local, ext, sizeof(local)))
> > +             return -EFAULT;
> > +
> > +     if (local.flags & I915_CONTEXT_CLONE_UNKNOWN)
> > +             return -EINVAL;
> > +
> > +     if (local.rsvd)
> > +             return -EINVAL;
> > +
> > +     if (local.clone == dst->user_handle) /* good guess! denied. */
> > +             return -ENOENT;
> 
> :) Good one, but put a more obvious comment like "Cannot clone itself!".
> 
> > +
> > +     rcu_read_lock();
> > +     src = __i915_gem_context_lookup_rcu(dst->file_priv, local.clone);
> > +     rcu_read_unlock();
> > +     if (!src)
> > +             return -ENOENT;
> > +
> > +     GEM_BUG_ON(src == dst);
> > +
> > +     if (local.flags & I915_CONTEXT_CLONE_FLAGS)
> > +             dst->user_flags = src->user_flags;
> > +
> > +     if (local.flags & I915_CONTEXT_CLONE_SCHED)
> > +             dst->sched = src->sched;
> > +
> > +     if (local.flags & I915_CONTEXT_CLONE_SSEU) {
> > +             err = clone_sseu(dst, src);
> > +             if (err)
> > +                     return err;
> > +     }
> > +
> > +     if (local.flags & I915_CONTEXT_CLONE_TIMELINE && src->timeline) {
> 
> Do we want to error out if no timeline and cloning was requested?

No. Because a clone of that situation is no common timeline. Basically
it allows one to ask "give me everything that can be cloned" without
having to worry about what you setup beforehand.

> 
> > +             if (dst->timeline)
> > +                     i915_timeline_put(dst->timeline);
> > +             dst->timeline = i915_timeline_get(src->timeline);
> 
> What prevents a different thread from changing either context in 
> parallel and making reference counting go bad?

It should be that userspace isn't allowed to access dst before the
constructor returns. As hinted earlier, we insert ourselves into the idr
too early. So what should be impossible... isn't quite imposibble.

src->timeline is unchangeable so we don't have to worry about
serialisation there (yet, haven't found anyone to pitch first class
timelines to).

> > +     }
> > +
> > +     if (local.flags & I915_CONTEXT_CLONE_VM && src->ppgtt) {
> 
> Also fail if impossible was requested?

Nope. Because !src->ppgtt implies that dst is already using the same
ppgtt (the aliasing ppgtt).

> > +             GEM_BUG_ON(dst->ppgtt == src->ppgtt);
> 
> Hm... what prevents this? Set_vm extension followed by clone could 
> trigger it I think.

True.

> > +
> > +             if (dst->ppgtt)
> > +                     i915_ppgtt_put(dst->ppgtt);
> > +
> > +             dst->ppgtt = i915_ppgtt_get(src->ppgtt);
> > +             i915_ppgtt_open(dst->ppgtt);
> 
> Also some locking is needed I think to make the exchange atomic.

None required for dest, but we should serialise src (well and you'll
love this, since this is under RCU).

> Could use __assign_ppgtt?

Ah. I knew there should be something.

> >   static const i915_user_extension_fn create_extensions[] = {
> >       [I915_CONTEXT_CREATE_EXT_SETPARAM] = create_setparam,
> > +     [I915_CONTEXT_CREATE_EXT_CLONE] = create_clone,
> >   };
> >   
> >   static bool client_is_banned(struct drm_i915_file_private *file_priv)
> > diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> > index 007d77ff7295..50d154954d5f 100644
> > --- a/include/uapi/drm/i915_drm.h
> > +++ b/include/uapi/drm/i915_drm.h
> > @@ -1579,6 +1579,20 @@ struct drm_i915_gem_context_create_ext_setparam {
> >       struct drm_i915_gem_context_param setparam;
> >   };
> >   
> > +struct drm_i915_gem_context_create_ext_clone {
> > +#define I915_CONTEXT_CREATE_EXT_CLONE 1
> > +     struct i915_user_extension base;
> > +     __u32 clone;
> 
> id, clone_id, source_id?

What is with _id? I think ctx_id is an abomination. Of the selection,
source_id. Or share_id. So clone_id.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* ✗ Fi.CI.BAT: failure for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio (rev2)
  2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
                   ` (15 preceding siblings ...)
  2019-03-08 15:19 ` ✗ Fi.CI.BAT: failure " Patchwork
@ 2019-03-08 16:47 ` Patchwork
  16 siblings, 0 replies; 58+ messages in thread
From: Patchwork @ 2019-03-08 16:47 UTC (permalink / raw)
  To: Chris Wilson; +Cc: intel-gfx

== Series Details ==

Series: series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio (rev2)
URL   : https://patchwork.freedesktop.org/series/57742/
State : failure

== Summary ==

Applying: drm/i915: Suppress the "Failed to idle" warning for gem_eio
Applying: drm/i915: Introduce the i915_user_extension_method
Applying: drm/i915: Introduce a context barrier callback
Applying: drm/i915: Create/destroy VM (ppGTT) for use with contexts
Applying: drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
Using index info to reconstruct a base tree...
M	include/uapi/drm/i915_drm.h
Falling back to patching base and 3-way merge...
Auto-merging include/uapi/drm/i915_drm.h
CONFLICT (content): Merge conflict in include/uapi/drm/i915_drm.h
error: Failed to merge in the changes.
hint: Use 'git am --show-current-patch' to see the failed patch
Patch failed at 0005 drm/i915: Extend CONTEXT_CREATE to set parameters upon construction
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-08 16:27   ` Tvrtko Ursulin
@ 2019-03-08 16:47     ` Chris Wilson
  2019-03-11  9:23       ` Tvrtko Ursulin
  0 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 16:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > Over the last few years, we have debated how to extend the user API to
> > support an increase in the number of engines, that may be sparse and
> > even be heterogeneous within a class (not all video decoders created
> > equal). We settled on using (class, instance) tuples to identify a
> > specific engine, with an API for the user to construct a map of engines
> > to capabilities. Into this picture, we then add a challenge of virtual
> > engines; one user engine that maps behind the scenes to any number of
> > physical engines. To keep it general, we want the user to have full
> > control over that mapping. To that end, we allow the user to constrain a
> > context to define the set of engines that it can access, order fully
> > controlled by the user via (class, instance). With such precise control
> > in context setup, we can continue to use the existing execbuf uABI of
> > specifying a single index; only now it doesn't automagically map onto
> > the engines, it uses the user defined engine map from the context.
> > 
> > The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
> > execbuf. It's use will be revealed in the next patch.
> > 
> > v2: Fixup freeing of local on success of get_engines()
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c       | 204 +++++++++++++++++-
> >   drivers/gpu/drm/i915/i915_gem_context_types.h |   4 +
> >   drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  22 +-
> >   include/uapi/drm/i915_drm.h                   |  42 +++-
> >   4 files changed, 259 insertions(+), 13 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 2cfc68b66944..86d9bea6f275 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -101,6 +101,21 @@ static struct i915_global_gem_context {
> >       struct kmem_cache *slab_luts;
> >   } global;
> >   
> > +static struct intel_engine_cs *
> > +lookup_user_engine(struct i915_gem_context *ctx,
> > +                unsigned long flags, u16 class, u16 instance)
> > +#define LOOKUP_USER_INDEX BIT(0)
> > +{
> > +     if (flags & LOOKUP_USER_INDEX) {
> > +             if (instance >= ctx->nengine)
> > +                     return NULL;
> > +
> > +             return ctx->engines[instance];
> > +     }
> > +
> > +     return intel_engine_lookup_user(ctx->i915, class, instance);
> > +}
> > +
> >   struct i915_lut_handle *i915_lut_handle_alloc(void)
> >   {
> >       return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
> > @@ -234,6 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
> >       release_hw_id(ctx);
> >       i915_ppgtt_put(ctx->ppgtt);
> >   
> > +     kfree(ctx->engines);
> > +
> >       rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
> >               it->ops->destroy(it);
> >   
> > @@ -1311,9 +1328,9 @@ static int set_sseu(struct i915_gem_context *ctx,
> >       if (user_sseu.flags || user_sseu.rsvd)
> >               return -EINVAL;
> >   
> > -     engine = intel_engine_lookup_user(i915,
> > -                                       user_sseu.engine_class,
> > -                                       user_sseu.engine_instance);
> > +     engine = lookup_user_engine(ctx, 0,
> > +                                 user_sseu.engine_class,
> > +                                 user_sseu.engine_instance);
> >       if (!engine)
> >               return -EINVAL;
> >   
> > @@ -1331,9 +1348,154 @@ static int set_sseu(struct i915_gem_context *ctx,
> >   
> >       args->size = sizeof(user_sseu);
> >   
> > +     return 0;
> > +};
> > +
> > +struct set_engines {
> > +     struct i915_gem_context *ctx;
> > +     struct intel_engine_cs **engines;
> > +     unsigned int nengine;
> > +};
> > +
> > +static const i915_user_extension_fn set_engines__extensions[] = {
> > +};
> > +
> > +static int
> > +set_engines(struct i915_gem_context *ctx,
> > +         const struct drm_i915_gem_context_param *args)
> > +{
> > +     struct i915_context_param_engines __user *user;
> > +     struct set_engines set = { .ctx = ctx };
> > +     u64 size, extensions;
> > +     unsigned int n;
> > +     int err;
> > +
> > +     user = u64_to_user_ptr(args->value);
> > +     size = args->size;
> > +     if (!size)
> > +             goto out;
> 
> This prevents a hypothetical extension with empty map data.

No... This is required for resetting and I think that's covered in what
little docs there are. It's the set.nengine==0 test later
that you mean to object to. But we can't do that as that's how we
differentiate between modes at the moment.

We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.

> > +     BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
> > +     if (size < sizeof(*user) || size % sizeof(*user->class_instance))
> 
> IS_ALIGNED for the second condition for consistency with the BUILD_BUG_ON?
> 
> > +             return -EINVAL;
> > +
> > +     set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
> > +     if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK + 1)
> 
> I would prefer we drop the size restriction since it doesn't apply to 
> the engine map per se.

u64 is a limit that will be non-trivial to lift. Marking the limits of
the kernel doesn't restrict it being lifted later.

> > +             return -EINVAL;
> > +
> > +     set.engines = kmalloc_array(set.nengine,
> > +                                 sizeof(*set.engines),
> > +                                 GFP_KERNEL);
> > +     if (!set.engines)
> > +             return -ENOMEM;
> > +
> > +     for (n = 0; n < set.nengine; n++) {
> > +             u16 class, inst;
> > +
> > +             if (get_user(class, &user->class_instance[n].engine_class) ||
> > +                 get_user(inst, &user->class_instance[n].engine_instance)) {
> > +                     kfree(set.engines);
> > +                     return -EFAULT;
> > +             }
> > +
> > +             if (class == (u16)I915_ENGINE_CLASS_INVALID &&
> > +                 inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
> > +                     set.engines[n] = NULL;
> > +                     continue;
> > +             }
> > +
> > +             set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
> > +             if (!set.engines[n]) {
> > +                     kfree(set.engines);
> > +                     return -ENOENT;
> > +             }
> > +     }
> > +
> > +     err = -EFAULT;
> > +     if (!get_user(extensions, &user->extensions))
> > +             err = i915_user_extensions(u64_to_user_ptr(extensions),
> > +                                        set_engines__extensions,
> > +                                        ARRAY_SIZE(set_engines__extensions),
> > +                                        &set);
> > +     if (err) {
> > +             kfree(set.engines);
> > +             return err;
> > +     }
> > +
> > +out:
> > +     mutex_lock(&ctx->i915->drm.struct_mutex);
> > +     kfree(ctx->engines);
> > +     ctx->engines = set.engines;
> > +     ctx->nengine = set.nengine;
> > +     mutex_unlock(&ctx->i915->drm.struct_mutex);
> > +
> >       return 0;
> >   }
> >   
> > +static int
> > +get_engines(struct i915_gem_context *ctx,
> > +         struct drm_i915_gem_context_param *args)
> > +{
> > +     struct i915_context_param_engines *local;
> > +     unsigned int n, count, size;
> > +     int err = 0;
> > +
> > +restart:
> > +     count = READ_ONCE(ctx->nengine);
> > +     if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
> > +             return -ENOMEM; /* unrepresentable! */
> 
> Probably overly paranoid since we can't end up with this state set.

And I thought you wanted many engines! Paranoia around kmalloc/user
oveflows is always useful, because you know someone will send a patch
later (and smatch doesn't really care as it only checks the limits of
types and local constraints).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-08 16:31   ` Tvrtko Ursulin
@ 2019-03-08 16:57     ` Chris Wilson
  2019-03-11  7:14       ` Tvrtko Ursulin
  2019-03-08 17:11     ` Chris Wilson
  1 sibling, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 16:57 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-08 16:31:51)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > Allow the user to specify a local engine index (as opposed to
> > class:index) that they can use to refer to a preset engine inside the
> > ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
> > This will be useful for setting SSEU parameters on virtual engines that
> > are local to the context and do not have a valid global class:instance
> > lookup.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
> >   include/uapi/drm/i915_drm.h             |  3 ++-
> >   2 files changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index 86d9bea6f275..a581c01ffff1 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -1313,6 +1313,7 @@ static int set_sseu(struct i915_gem_context *ctx,
> >       struct drm_i915_gem_context_param_sseu user_sseu;
> >       struct intel_engine_cs *engine;
> >       struct intel_sseu sseu;
> > +     unsigned long lookup;
> 
> Poor 32-bit builds. ;) And in lookup_user_engine as well.

It's an internal flags variable; not a direct part of the uABI. So it's
only use is to control lookup_user_engine() so limiting it to 32b
(natural register width) doesn't seem a like a concern for the
forseeable future? How many different ABI interfacing methods of looking
up an engine do you have planned?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-08 16:31   ` Tvrtko Ursulin
  2019-03-08 16:57     ` Chris Wilson
@ 2019-03-08 17:11     ` Chris Wilson
  2019-03-11  7:16       ` Tvrtko Ursulin
  1 sibling, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-08 17:11 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-08 16:31:51)
> Looks okay. But one more thing is needed:
> 
> https://cgit.freedesktop.org/~tursulin/drm-intel/commit/?h=media&id=38266bfe99469de9e13774a13fa641c377988c67

drm/i915: Allow SSEU configuration to be set on virtual engine

 	/* Only render engine supports RPCS configuration. */
-	if (engine->class != RENDER_CLASS)
+	if (engine->class != RENDER_CLASS &&
+	    !(engine->flags & I915_ENGINE_IS_VIRTUAL &&
+	      ctx->engines[1]->class == RENDER_CLASS))
 		return -ENODEV;

A virtual engine composed of RCS engines would have
engine->class == RENDER_CLASS.

So it's just the engine->id BUG_ON that needs lifting?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-08 16:57     ` Chris Wilson
@ 2019-03-11  7:14       ` Tvrtko Ursulin
  2019-03-11 10:33         ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11  7:14 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 16:57, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-08 16:31:51)
>>
>> On 08/03/2019 14:12, Chris Wilson wrote:
>>> Allow the user to specify a local engine index (as opposed to
>>> class:index) that they can use to refer to a preset engine inside the
>>> ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
>>> This will be useful for setting SSEU parameters on virtual engines that
>>> are local to the context and do not have a valid global class:instance
>>> lookup.
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
>>>    include/uapi/drm/i915_drm.h             |  3 ++-
>>>    2 files changed, 22 insertions(+), 5 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index 86d9bea6f275..a581c01ffff1 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -1313,6 +1313,7 @@ static int set_sseu(struct i915_gem_context *ctx,
>>>        struct drm_i915_gem_context_param_sseu user_sseu;
>>>        struct intel_engine_cs *engine;
>>>        struct intel_sseu sseu;
>>> +     unsigned long lookup;
>>
>> Poor 32-bit builds. ;) And in lookup_user_engine as well.
> 
> It's an internal flags variable; not a direct part of the uABI. So it's
> only use is to control lookup_user_engine() so limiting it to 32b
> (natural register width) doesn't seem a like a concern for the
> forseeable future? How many different ABI interfacing methods of looking
> up an engine do you have planned?

I actually wanted to say we could get away with 32-bits for internal 
representation but said the completely opposite thing. :)

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-08 17:11     ` Chris Wilson
@ 2019-03-11  7:16       ` Tvrtko Ursulin
  2019-03-11 10:31         ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11  7:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 17:11, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-08 16:31:51)
>> Looks okay. But one more thing is needed:
>>
>> https://cgit.freedesktop.org/~tursulin/drm-intel/commit/?h=media&id=38266bfe99469de9e13774a13fa641c377988c67
> 
> drm/i915: Allow SSEU configuration to be set on virtual engine
> 
>   	/* Only render engine supports RPCS configuration. */
> -	if (engine->class != RENDER_CLASS)
> +	if (engine->class != RENDER_CLASS &&
> +	    !(engine->flags & I915_ENGINE_IS_VIRTUAL &&
> +	      ctx->engines[1]->class == RENDER_CLASS))
>   		return -ENODEV;
> 
> A virtual engine composed of RCS engines would have
> engine->class == RENDER_CLASS.
> 
> So it's just the engine->id BUG_ON that needs lifting?

If so then yes. Must be a recent change since I was sure the class was 
set to other before.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-08 16:47     ` Chris Wilson
@ 2019-03-11  9:23       ` Tvrtko Ursulin
  2019-03-11  9:45         ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11  9:23 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 16:47, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
>>
>> On 08/03/2019 14:12, Chris Wilson wrote:
>>> Over the last few years, we have debated how to extend the user API to
>>> support an increase in the number of engines, that may be sparse and
>>> even be heterogeneous within a class (not all video decoders created
>>> equal). We settled on using (class, instance) tuples to identify a
>>> specific engine, with an API for the user to construct a map of engines
>>> to capabilities. Into this picture, we then add a challenge of virtual
>>> engines; one user engine that maps behind the scenes to any number of
>>> physical engines. To keep it general, we want the user to have full
>>> control over that mapping. To that end, we allow the user to constrain a
>>> context to define the set of engines that it can access, order fully
>>> controlled by the user via (class, instance). With such precise control
>>> in context setup, we can continue to use the existing execbuf uABI of
>>> specifying a single index; only now it doesn't automagically map onto
>>> the engines, it uses the user defined engine map from the context.
>>>
>>> The I915_EXEC_DEFAULT slot is left empty, and invalid for use by
>>> execbuf. It's use will be revealed in the next patch.
>>>
>>> v2: Fixup freeing of local on success of get_engines()
>>>
>>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_gem_context.c       | 204 +++++++++++++++++-
>>>    drivers/gpu/drm/i915/i915_gem_context_types.h |   4 +
>>>    drivers/gpu/drm/i915/i915_gem_execbuffer.c    |  22 +-
>>>    include/uapi/drm/i915_drm.h                   |  42 +++-
>>>    4 files changed, 259 insertions(+), 13 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
>>> index 2cfc68b66944..86d9bea6f275 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>>> @@ -101,6 +101,21 @@ static struct i915_global_gem_context {
>>>        struct kmem_cache *slab_luts;
>>>    } global;
>>>    
>>> +static struct intel_engine_cs *
>>> +lookup_user_engine(struct i915_gem_context *ctx,
>>> +                unsigned long flags, u16 class, u16 instance)
>>> +#define LOOKUP_USER_INDEX BIT(0)
>>> +{
>>> +     if (flags & LOOKUP_USER_INDEX) {
>>> +             if (instance >= ctx->nengine)
>>> +                     return NULL;
>>> +
>>> +             return ctx->engines[instance];
>>> +     }
>>> +
>>> +     return intel_engine_lookup_user(ctx->i915, class, instance);
>>> +}
>>> +
>>>    struct i915_lut_handle *i915_lut_handle_alloc(void)
>>>    {
>>>        return kmem_cache_alloc(global.slab_luts, GFP_KERNEL);
>>> @@ -234,6 +249,8 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>>>        release_hw_id(ctx);
>>>        i915_ppgtt_put(ctx->ppgtt);
>>>    
>>> +     kfree(ctx->engines);
>>> +
>>>        rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>>>                it->ops->destroy(it);
>>>    
>>> @@ -1311,9 +1328,9 @@ static int set_sseu(struct i915_gem_context *ctx,
>>>        if (user_sseu.flags || user_sseu.rsvd)
>>>                return -EINVAL;
>>>    
>>> -     engine = intel_engine_lookup_user(i915,
>>> -                                       user_sseu.engine_class,
>>> -                                       user_sseu.engine_instance);
>>> +     engine = lookup_user_engine(ctx, 0,
>>> +                                 user_sseu.engine_class,
>>> +                                 user_sseu.engine_instance);
>>>        if (!engine)
>>>                return -EINVAL;
>>>    
>>> @@ -1331,9 +1348,154 @@ static int set_sseu(struct i915_gem_context *ctx,
>>>    
>>>        args->size = sizeof(user_sseu);
>>>    
>>> +     return 0;
>>> +};
>>> +
>>> +struct set_engines {
>>> +     struct i915_gem_context *ctx;
>>> +     struct intel_engine_cs **engines;
>>> +     unsigned int nengine;
>>> +};
>>> +
>>> +static const i915_user_extension_fn set_engines__extensions[] = {
>>> +};
>>> +
>>> +static int
>>> +set_engines(struct i915_gem_context *ctx,
>>> +         const struct drm_i915_gem_context_param *args)
>>> +{
>>> +     struct i915_context_param_engines __user *user;
>>> +     struct set_engines set = { .ctx = ctx };
>>> +     u64 size, extensions;
>>> +     unsigned int n;
>>> +     int err;
>>> +
>>> +     user = u64_to_user_ptr(args->value);
>>> +     size = args->size;
>>> +     if (!size)
>>> +             goto out;
>>
>> This prevents a hypothetical extension with empty map data.
> 
> No... This is required for resetting and I think that's covered in what
> little docs there are. It's the set.nengine==0 test later
> that you mean to object to. But we can't do that as that's how we
> differentiate between modes at the moment.
> 
> We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.

size == sizeof(struct i915_context_param_engines) could mean reset - 
meaning no map array provided.

Meaning one could reset the map and still pass in extensions.

> 
>>> +     BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
>>> +     if (size < sizeof(*user) || size % sizeof(*user->class_instance))
>>
>> IS_ALIGNED for the second condition for consistency with the BUILD_BUG_ON?
>>
>>> +             return -EINVAL;
>>> +
>>> +     set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
>>> +     if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK + 1)
>>
>> I would prefer we drop the size restriction since it doesn't apply to
>> the engine map per se.
> 
> u64 is a limit that will be non-trivial to lift. Marking the limits of
> the kernel doesn't restrict it being lifted later.

My thinking is that u64 limit applies to the load balancing extension, 
and the 64 engine limit applies to execbuf. Engine map itself is not 
limited. But I guess it is a theoretical/pointless discussion at this point.

> 
>>> +             return -EINVAL;
>>> +
>>> +     set.engines = kmalloc_array(set.nengine,
>>> +                                 sizeof(*set.engines),
>>> +                                 GFP_KERNEL);
>>> +     if (!set.engines)
>>> +             return -ENOMEM;
>>> +
>>> +     for (n = 0; n < set.nengine; n++) {
>>> +             u16 class, inst;
>>> +
>>> +             if (get_user(class, &user->class_instance[n].engine_class) ||
>>> +                 get_user(inst, &user->class_instance[n].engine_instance)) {
>>> +                     kfree(set.engines);
>>> +                     return -EFAULT;
>>> +             }
>>> +
>>> +             if (class == (u16)I915_ENGINE_CLASS_INVALID &&
>>> +                 inst == (u16)I915_ENGINE_CLASS_INVALID_NONE) {
>>> +                     set.engines[n] = NULL;
>>> +                     continue;
>>> +             }
>>> +
>>> +             set.engines[n] = lookup_user_engine(ctx, 0, class, inst);
>>> +             if (!set.engines[n]) {
>>> +                     kfree(set.engines);
>>> +                     return -ENOENT;
>>> +             }
>>> +     }
>>> +
>>> +     err = -EFAULT;
>>> +     if (!get_user(extensions, &user->extensions))
>>> +             err = i915_user_extensions(u64_to_user_ptr(extensions),
>>> +                                        set_engines__extensions,
>>> +                                        ARRAY_SIZE(set_engines__extensions),
>>> +                                        &set);
>>> +     if (err) {
>>> +             kfree(set.engines);
>>> +             return err;
>>> +     }
>>> +
>>> +out:
>>> +     mutex_lock(&ctx->i915->drm.struct_mutex);
>>> +     kfree(ctx->engines);
>>> +     ctx->engines = set.engines;
>>> +     ctx->nengine = set.nengine;
>>> +     mutex_unlock(&ctx->i915->drm.struct_mutex);
>>> +
>>>        return 0;
>>>    }
>>>    
>>> +static int
>>> +get_engines(struct i915_gem_context *ctx,
>>> +         struct drm_i915_gem_context_param *args)
>>> +{
>>> +     struct i915_context_param_engines *local;
>>> +     unsigned int n, count, size;
>>> +     int err = 0;
>>> +
>>> +restart:
>>> +     count = READ_ONCE(ctx->nengine);
>>> +     if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
>>> +             return -ENOMEM; /* unrepresentable! */
>>
>> Probably overly paranoid since we can't end up with this state set.
> 
> And I thought you wanted many engines! Paranoia around kmalloc/user
> oveflows is always useful, because you know someone will send a patch
> later (and smatch doesn't really care as it only checks the limits of
> types and local constraints).

Put a comment on what it is checking then. Why INT_MAX and not U32_MAX btw?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11  9:23       ` Tvrtko Ursulin
@ 2019-03-11  9:45         ` Chris Wilson
  2019-03-11 10:12           ` Tvrtko Ursulin
  2019-03-11 14:45           ` Chris Wilson
  0 siblings, 2 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11  9:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
> 
> On 08/03/2019 16:47, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
> >>
> >> On 08/03/2019 14:12, Chris Wilson wrote:
> >>> +static int
> >>> +set_engines(struct i915_gem_context *ctx,
> >>> +         const struct drm_i915_gem_context_param *args)
> >>> +{
> >>> +     struct i915_context_param_engines __user *user;
> >>> +     struct set_engines set = { .ctx = ctx };
> >>> +     u64 size, extensions;
> >>> +     unsigned int n;
> >>> +     int err;
> >>> +
> >>> +     user = u64_to_user_ptr(args->value);
> >>> +     size = args->size;
> >>> +     if (!size)
> >>> +             goto out;
> >>
> >> This prevents a hypothetical extension with empty map data.
> > 
> > No... This is required for resetting and I think that's covered in what
> > little docs there are. It's the set.nengine==0 test later
> > that you mean to object to. But we can't do that as that's how we
> > differentiate between modes at the moment.
> > 
> > We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
> 
> size == sizeof(struct i915_context_param_engines) could mean reset - 
> meaning no map array provided.

Nah, size=sizeof() => 0 [], size=0 => default map.
 
> Meaning one could reset the map and still pass in extensions.

I missed that you were pointing out we didn't follow the extensions on
resetting.

I'm not sure if that makes sense tbh. The extensions are written around
the concept of applying to the new engines[], and if the use has
explicitly removed the engines[] (distinct from defining a zero array),
what extensions can apply? One hopes they end up -EINVAL. As they should
-EINVAL, I guess it is no harm done to apply them.

> >>> +     BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
> >>> +     if (size < sizeof(*user) || size % sizeof(*user->class_instance))
> >>
> >> IS_ALIGNED for the second condition for consistency with the BUILD_BUG_ON?
> >>
> >>> +             return -EINVAL;
> >>> +
> >>> +     set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
> >>> +     if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK + 1)
> >>
> >> I would prefer we drop the size restriction since it doesn't apply to
> >> the engine map per se.
> > 
> > u64 is a limit that will be non-trivial to lift. Marking the limits of
> > the kernel doesn't restrict it being lifted later.
> 
> My thinking is that u64 limit applies to the load balancing extension, 
> and the 64 engine limit applies to execbuf. Engine map itself is not 
> limited. But I guess it is a theoretical/pointless discussion at this point.

I know what you mean, I'm just looking at that we use u64 around the
uAPI for masks, and u8/unsigned long internally. So even going beyond
BITS_PER_LONG is problematic.

I'm in two minds. Yes, the limit doesn't apply to engines[] itself, but
for practical reasons there is a limit, and until we can remove those,
lifting the restriction here is immaterial :|

> >>> +static int
> >>> +get_engines(struct i915_gem_context *ctx,
> >>> +         struct drm_i915_gem_context_param *args)
> >>> +{
> >>> +     struct i915_context_param_engines *local;
> >>> +     unsigned int n, count, size;
> >>> +     int err = 0;
> >>> +
> >>> +restart:
> >>> +     count = READ_ONCE(ctx->nengine);
> >>> +     if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
> >>> +             return -ENOMEM; /* unrepresentable! */
> >>
> >> Probably overly paranoid since we can't end up with this state set.
> > 
> > And I thought you wanted many engines! Paranoia around kmalloc/user
> > oveflows is always useful, because you know someone will send a patch
> > later (and smatch doesn't really care as it only checks the limits of
> > types and local constraints).
> 
> Put a comment on what it is checking then. Why INT_MAX and not U32_MAX btw?

Vague memories about what gets checked for overflow in drm_malloc_large.
Nowadays, the in-trend is check_mul_overflow() with size_t.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11  9:45         ` Chris Wilson
@ 2019-03-11 10:12           ` Tvrtko Ursulin
  2019-03-11 14:45           ` Chris Wilson
  1 sibling, 0 replies; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 10:12 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/03/2019 09:45, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
>>
>> On 08/03/2019 16:47, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
>>>>
>>>> On 08/03/2019 14:12, Chris Wilson wrote:
>>>>> +static int
>>>>> +set_engines(struct i915_gem_context *ctx,
>>>>> +         const struct drm_i915_gem_context_param *args)
>>>>> +{
>>>>> +     struct i915_context_param_engines __user *user;
>>>>> +     struct set_engines set = { .ctx = ctx };
>>>>> +     u64 size, extensions;
>>>>> +     unsigned int n;
>>>>> +     int err;
>>>>> +
>>>>> +     user = u64_to_user_ptr(args->value);
>>>>> +     size = args->size;
>>>>> +     if (!size)
>>>>> +             goto out;
>>>>
>>>> This prevents a hypothetical extension with empty map data.
>>>
>>> No... This is required for resetting and I think that's covered in what
>>> little docs there are. It's the set.nengine==0 test later
>>> that you mean to object to. But we can't do that as that's how we
>>> differentiate between modes at the moment.
>>>
>>> We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
>>
>> size == sizeof(struct i915_context_param_engines) could mean reset -
>> meaning no map array provided.
> 
> Nah, size=sizeof() => 0 [], size=0 => default map.
>   
>> Meaning one could reset the map and still pass in extensions.
> 
> I missed that you were pointing out we didn't follow the extensions on
> resetting.
> 
> I'm not sure if that makes sense tbh. The extensions are written around
> the concept of applying to the new engines[], and if the use has
> explicitly removed the engines[] (distinct from defining a zero array),
> what extensions can apply? One hopes they end up -EINVAL. As they should
> -EINVAL, I guess it is no harm done to apply them.

Yeah it is hypothetical for now. Just future proofing for if we add some 
engine map extension which makes sense after resetting the map.

>>>>> +     BUILD_BUG_ON(!IS_ALIGNED(sizeof(*user), sizeof(*user->class_instance)));
>>>>> +     if (size < sizeof(*user) || size % sizeof(*user->class_instance))
>>>>
>>>> IS_ALIGNED for the second condition for consistency with the BUILD_BUG_ON?
>>>>
>>>>> +             return -EINVAL;
>>>>> +
>>>>> +     set.nengine = (size - sizeof(*user)) / sizeof(*user->class_instance);
>>>>> +     if (set.nengine == 0 || set.nengine > I915_EXEC_RING_MASK + 1)
>>>>
>>>> I would prefer we drop the size restriction since it doesn't apply to
>>>> the engine map per se.
>>>
>>> u64 is a limit that will be non-trivial to lift. Marking the limits of
>>> the kernel doesn't restrict it being lifted later.
>>
>> My thinking is that u64 limit applies to the load balancing extension,
>> and the 64 engine limit applies to execbuf. Engine map itself is not
>> limited. But I guess it is a theoretical/pointless discussion at this point.
> 
> I know what you mean, I'm just looking at that we use u64 around the
> uAPI for masks, and u8/unsigned long internally. So even going beyond
> BITS_PER_LONG is problematic.
> 
> I'm in two minds. Yes, the limit doesn't apply to engines[] itself, but
> for practical reasons there is a limit, and until we can remove those,
> lifting the restriction here is immaterial :|

Ok.

> 
>>>>> +static int
>>>>> +get_engines(struct i915_gem_context *ctx,
>>>>> +         struct drm_i915_gem_context_param *args)
>>>>> +{
>>>>> +     struct i915_context_param_engines *local;
>>>>> +     unsigned int n, count, size;
>>>>> +     int err = 0;
>>>>> +
>>>>> +restart:
>>>>> +     count = READ_ONCE(ctx->nengine);
>>>>> +     if (count > (INT_MAX - sizeof(*local)) / sizeof(*local->class_instance))
>>>>> +             return -ENOMEM; /* unrepresentable! */
>>>>
>>>> Probably overly paranoid since we can't end up with this state set.
>>>
>>> And I thought you wanted many engines! Paranoia around kmalloc/user
>>> oveflows is always useful, because you know someone will send a patch
>>> later (and smatch doesn't really care as it only checks the limits of
>>> types and local constraints).
>>
>> Put a comment on what it is checking then. Why INT_MAX and not U32_MAX btw?
> 
> Vague memories about what gets checked for overflow in drm_malloc_large.
> Nowadays, the in-trend is check_mul_overflow() with size_t.

Oh that one.. I was thinking about args->size = size assignment (u32). 
Both are 32-bit luckily.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-11  7:16       ` Tvrtko Ursulin
@ 2019-03-11 10:31         ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 10:31 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 07:16:48)
> 
> On 08/03/2019 17:11, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-08 16:31:51)
> >> Looks okay. But one more thing is needed:
> >>
> >> https://cgit.freedesktop.org/~tursulin/drm-intel/commit/?h=media&id=38266bfe99469de9e13774a13fa641c377988c67
> > 
> > drm/i915: Allow SSEU configuration to be set on virtual engine
> > 
> >       /* Only render engine supports RPCS configuration. */
> > -     if (engine->class != RENDER_CLASS)
> > +     if (engine->class != RENDER_CLASS &&
> > +         !(engine->flags & I915_ENGINE_IS_VIRTUAL &&
> > +           ctx->engines[1]->class == RENDER_CLASS))
> >               return -ENODEV;
> > 
> > A virtual engine composed of RCS engines would have
> > engine->class == RENDER_CLASS.
> > 
> > So it's just the engine->id BUG_ON that needs lifting?
> 
> If so then yes. Must be a recent change since I was sure the class was 
> set to other before.

uabi_class = OTHER
internal_class = ACTUAL and used to restrict physical engines to only
belong to the same class (no load balancing across classes today).

I _think_ we had that from the start, at least I remember wanting to do
mixed class load-balancing but realising that I had to filter on class
to avoid engine->emit_* mixups.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[]
  2019-03-11  7:14       ` Tvrtko Ursulin
@ 2019-03-11 10:33         ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 10:33 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 07:14:48)
> 
> On 08/03/2019 16:57, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-08 16:31:51)
> >>
> >> On 08/03/2019 14:12, Chris Wilson wrote:
> >>> Allow the user to specify a local engine index (as opposed to
> >>> class:index) that they can use to refer to a preset engine inside the
> >>> ctx->engine[] array defined by an earlier I915_CONTEXT_PARAM_ENGINES.
> >>> This will be useful for setting SSEU parameters on virtual engines that
> >>> are local to the context and do not have a valid global class:instance
> >>> lookup.
> >>>
> >>> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> >>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>> ---
> >>>    drivers/gpu/drm/i915/i915_gem_context.c | 24 ++++++++++++++++++++----
> >>>    include/uapi/drm/i915_drm.h             |  3 ++-
> >>>    2 files changed, 22 insertions(+), 5 deletions(-)
> >>>
> >>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> >>> index 86d9bea6f275..a581c01ffff1 100644
> >>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> >>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >>> @@ -1313,6 +1313,7 @@ static int set_sseu(struct i915_gem_context *ctx,
> >>>        struct drm_i915_gem_context_param_sseu user_sseu;
> >>>        struct intel_engine_cs *engine;
> >>>        struct intel_sseu sseu;
> >>> +     unsigned long lookup;
> >>
> >> Poor 32-bit builds. ;) And in lookup_user_engine as well.
> > 
> > It's an internal flags variable; not a direct part of the uABI. So it's
> > only use is to control lookup_user_engine() so limiting it to 32b
> > (natural register width) doesn't seem a like a concern for the
> > forseeable future? How many different ABI interfacing methods of looking
> > up an engine do you have planned?
> 
> I actually wanted to say we could get away with 32-bits for internal 
> representation but said the completely opposite thing. :)

Ah, but we use BIT() so my inertia says to use 'unsigned long' unless you
want to have to check for consistent type usage.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/13] drm/i915: Load balancing across a virtual engine
  2019-03-08 14:12 ` [PATCH 10/13] drm/i915: Load balancing across a virtual engine Chris Wilson
@ 2019-03-11 12:47   ` Tvrtko Ursulin
  2019-03-11 13:43     ` Chris Wilson
  2019-03-12  7:52   ` Tvrtko Ursulin
  1 sibling, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 12:47 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> Having allowed the user to define a set of engines that they will want
> to only use, we go one step further and allow them to bind those engines
> into a single virtual instance. Submitting a batch to the virtual engine
> will then forward it to any one of the set in a manner as best to
> distribute load.  The virtual engine has a single timeline across all
> engines (it operates as a single queue), so it is not able to concurrently
> run batches across multiple engines by itself; that is left up to the user
> to submit multiple concurrent batches to multiple queues. Multiple users
> will be load balanced across the system.
> 
> The mechanism used for load balancing in this patch is a late greedy
> balancer. When a request is ready for execution, it is added to each
> engine's queue, and when an engine is ready for its next request it
> claims it from the virtual engine. The first engine to do so, wins, i.e.
> the request is executed at the earliest opportunity (idle moment) in the
> system.
> 
> As not all HW is created equal, the user is still able to skip the
> virtual engine and execute the batch on a specific engine, all within the
> same queue. It will then be executed in order on the correct engine,
> with execution on other virtual engines being moved away due to the load
> detection.
> 
> A couple of areas for potential improvement left!
> 
> - The virtual engine always take priority over equal-priority tasks.
> Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
> and hopefully the virtual and real engines are not then congested (i.e.
> all work is via virtual engines, or all work is to the real engine).
> 
> - We require the breadcrumb irq around every virtual engine request. For
> normal engines, we eliminate the need for the slow round trip via
> interrupt by using the submit fence and queueing in order. For virtual
> engines, we have to allow any job to transfer to a new ring, and cannot
> coalesce the submissions, so require the completion fence instead,
> forcing the persistent use of interrupts.
> 
> - We only drip feed single requests through each virtual engine and onto
> the physical engines, even if there was enough work to fill all ELSP,
> leaving small stalls with an idle CS event at the end of every request.
> Could we be greedy and fill both slots? Being lazy is virtuous for load
> distribution on less-than-full workloads though.
> 
> Other areas of improvement are more general, such as reducing lock
> contention, reducing dispatch overhead, looking at direct submission
> rather than bouncing around tasklets etc.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.h            |   5 +
>   drivers/gpu/drm/i915/i915_gem_context.c    | 153 +++++-
>   drivers/gpu/drm/i915/i915_scheduler.c      |  17 +-
>   drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
>   drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
>   drivers/gpu/drm/i915/intel_lrc.c           | 521 ++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.h           |  11 +
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 +++++++
>   include/uapi/drm/i915_drm.h                |  30 ++
>   9 files changed, 895 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 74a2ddc1b52f..dbcea6e29d48 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -91,4 +91,9 @@ static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
>   	return !atomic_read(&t->count);
>   }
>   
> +static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
> +{
> +	return test_bit(TASKLET_STATE_SCHED, &t->state);
> +}
> +
>   #endif /* __I915_GEM_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index a581c01ffff1..13b79980f7f3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -86,12 +86,16 @@
>    */
>   
>   #include <linux/log2.h>
> +#include <linux/nospec.h>
> +
>   #include <drm/i915_drm.h>
> +
>   #include "i915_drv.h"
>   #include "i915_globals.h"
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
>   #include "intel_lrc_reg.h"
> +#include "intel_lrc.h"
>   #include "intel_workarounds.h"
>   
>   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
> @@ -238,6 +242,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
>   	mutex_unlock(&i915->contexts.mutex);
>   }
>   
> +static void free_engines(struct intel_engine_cs **engines, int count)
> +{
> +	int i;
> +
> +	if (!engines)
> +		return;
> +
> +	/* We own the veng we created; regular engines are ignored */
> +	for (i = 0; i < count; i++)
> +		intel_virtual_engine_destroy(engines[i]);
> +
> +	kfree(engines);
> +}
> +
>   static void i915_gem_context_free(struct i915_gem_context *ctx)
>   {
>   	struct intel_context *it, *n;
> @@ -248,8 +266,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
> -
> -	kfree(ctx->engines);
> +	free_engines(ctx->engines, ctx->nengine);
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>   		it->ops->destroy(it);
> @@ -1359,13 +1376,116 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	return 0;
>   };
>   
> +static int check_user_mbz16(u16 __user *user)
> +{
> +	u16 mbz;
> +
> +	if (get_user(mbz, user))
> +		return -EFAULT;
> +
> +	return mbz ? -EINVAL : 0;
> +}
> +
> +static int check_user_mbz32(u32 __user *user)
> +{
> +	u32 mbz;
> +
> +	if (get_user(mbz, user))
> +		return -EFAULT;
> +
> +	return mbz ? -EINVAL : 0;
> +}
> +
> +static int check_user_mbz64(u64 __user *user)
> +{
> +	u64 mbz;
> +
> +	if (get_user(mbz, user))
> +		return -EFAULT;
> +
> +	return mbz ? -EINVAL : 0;
> +}

Could generate the three with a macro but it would be marginal 
improvement if any.

> +
>   struct set_engines {
>   	struct i915_gem_context *ctx;
>   	struct intel_engine_cs **engines;
>   	unsigned int nengine;
>   };
>   
> +static int
> +set_engines__load_balance(struct i915_user_extension __user *base, void *data)
> +{
> +	struct i915_context_engines_load_balance __user *ext =
> +		container_of_user(base, typeof(*ext), base);
> +	const struct set_engines *set = data;
> +	struct intel_engine_cs *ve;
> +	unsigned int n;
> +	u64 mask;
> +	u16 idx;
> +	int err;
> +
> +	if (!HAS_EXECLISTS(set->ctx->i915))
> +		return -ENODEV;
> +
> +	if (USES_GUC_SUBMISSION(set->ctx->i915))
> +		return -ENODEV; /* not implement yet */
> +
> +	if (get_user(idx, &ext->engine_index))
> +		return -EFAULT;
> +
> +	if (idx >= set->nengine)
> +		return -EINVAL;
> +
> +	idx = array_index_nospec(idx, set->nengine);
> +	if (set->engines[idx])
> +		return -EEXIST;
> +
> +	err = check_user_mbz16(&ext->mbz16);
> +	if (err)
> +		return err;
> +
> +	err = check_user_mbz32(&ext->flags);
> +	if (err)
> +		return err;
> +
> +	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
> +		err = check_user_mbz64(&ext->mbz64[n]);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (get_user(mask, &ext->engines_mask))
> +		return -EFAULT;
> +
> +	mask &= GENMASK_ULL(set->nengine - 1, 0) & ~BIT_ULL(idx);
> +	if (!mask)
> +		return -EINVAL;
> +
> +	if (is_power_of_2(mask)) {
> +		ve = set->engines[__ffs64(mask)];
> +	} else {
> +		struct intel_engine_cs *stack[64];
> +		int bit;
> +
> +		n = 0;
> +		for_each_set_bit(bit, (unsigned long *)&mask, set->nengine)
> +			stack[n++] = set->engines[bit];
> +
> +		ve = intel_execlists_create_virtual(set->ctx, stack, n);
> +	}
> +	if (IS_ERR(ve))
> +		return PTR_ERR(ve);
> +
> +	if (cmpxchg(&set->engines[idx], NULL, ve)) {
> +		intel_virtual_engine_destroy(ve);
> +		return -EEXIST;
> +	}
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn set_engines__extensions[] = {
> +	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
>   };
>   
>   static int
> @@ -1426,13 +1546,13 @@ set_engines(struct i915_gem_context *ctx,
>   					   ARRAY_SIZE(set_engines__extensions),
>   					   &set);
>   	if (err) {
> -		kfree(set.engines);
> +		free_engines(set.engines, set.nengine);
>   		return err;
>   	}
>   
>   out:
>   	mutex_lock(&ctx->i915->drm.struct_mutex);
> -	kfree(ctx->engines);
> +	free_engines(ctx->engines, ctx->nengine);
>   	ctx->engines = set.engines;
>   	ctx->nengine = set.nengine;
>   	mutex_unlock(&ctx->i915->drm.struct_mutex);
> @@ -1637,6 +1757,7 @@ static int clone_engines(struct i915_gem_context *dst,
>   			 struct i915_gem_context *src)
>   {
>   	struct intel_engine_cs **engines;
> +	int i;
>   
>   	engines = kmemdup(src->engines,
>   			  sizeof(*src->engines) * src->nengine,
> @@ -1644,6 +1765,30 @@ static int clone_engines(struct i915_gem_context *dst,
>   	if (!engines)
>   		return -ENOMEM;
>   
> +	/*
> +	 * Virtual engines are singletons; they can only exist
> +	 * inside a single context, because they embed their
> +	 * HW context... As each virtual context implies a single
> +	 * timeline (each engine can only dequeue a single request
> +	 * at any time), it would be surprising for two contexts
> +	 * to use the same engine. So let's create a copy of
> +	 * the virtual engine instead.
> +	 */
> +	for (i = 0; i < src->nengine; i++) {
> +		struct intel_engine_cs *engine = engines[i];
> +
> +		if (!intel_engine_is_virtual(engine))
> +			continue;
> +
> +		engine = intel_execlists_clone_virtual(dst, engine);
> +		if (IS_ERR(engine)) {
> +			free_engines(engines, i);
> +			return PTR_ERR(engine);
> +		}
> +
> +		engines[i] = engine;
> +	}
> +
>   	dst->engines = engines;
>   	dst->nengine = src->nengine;
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index e0f609d01564..bb9819dbe313 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -247,17 +247,25 @@ sched_lock_engine(const struct i915_sched_node *node,
>   		  struct intel_engine_cs *locked,
>   		  struct sched_cache *cache)
>   {
> -	struct intel_engine_cs *engine = node_to_request(node)->engine;
> +	const struct i915_request *rq = node_to_request(node);
> +	struct intel_engine_cs *engine;
>   
>   	GEM_BUG_ON(!locked);
>   
> -	if (engine != locked) {
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +	while (locked != (engine = READ_ONCE(rq->engine))) {
>   		spin_unlock(&locked->timeline.lock);
>   		memset(cache, 0, sizeof(*cache));
>   		spin_lock(&engine->timeline.lock);
> +		locked = engine;
>   	}
>   
> -	return engine;
> +	return locked;

engine == locked at this point, right?

>   }
>   
>   static bool inflight(const struct i915_request *rq,
> @@ -370,8 +378,11 @@ static void __i915_schedule(struct i915_request *rq,
>   		if (prio <= node->attr.priority || node_signaled(node))
>   			continue;
>   
> +		GEM_BUG_ON(node_to_request(node)->engine != engine);
> +
>   		node->attr.priority = prio;
>   		if (!list_empty(&node->link)) {
> +			GEM_BUG_ON(intel_engine_is_virtual(engine));
>   			if (!cache.priolist)
>   				cache.priolist =
>   					i915_sched_lookup_priolist(engine,
> diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
> index 8ff146dc05ba..5e445f145eb1 100644
> --- a/drivers/gpu/drm/i915/i915_timeline_types.h
> +++ b/drivers/gpu/drm/i915/i915_timeline_types.h
> @@ -25,6 +25,7 @@ struct i915_timeline {
>   	spinlock_t lock;
>   #define TIMELINE_CLIENT 0 /* default subclass */
>   #define TIMELINE_ENGINE 1
> +#define TIMELINE_VIRTUAL 2
>   	struct mutex mutex; /* protects the flow of requests */
>   
>   	unsigned int pin_count;
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index b0aa1f0d4e47..d54d2a1840cc 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -216,6 +216,7 @@ struct intel_engine_execlists {
>   	 * @queue: queue of requests, in priority lists
>   	 */
>   	struct rb_root_cached queue;
> +	struct rb_root_cached virtual;
>   
>   	/**
>   	 * @csb_write: control register for Context Switch buffer
> @@ -421,6 +422,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
>   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> +#define I915_ENGINE_IS_VIRTUAL       BIT(4)
>   	unsigned int flags;
>   
>   	/*
> @@ -504,6 +506,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
>   }
>   
> +static inline bool
> +intel_engine_is_virtual(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_IS_VIRTUAL;
> +}
> +
>   #define instdone_slice_mask(dev_priv__) \
>   	(IS_GEN(dev_priv__, 7) ? \
>   	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7b938eaff9c5..0c97e8f30223 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -166,6 +166,28 @@
>   
>   #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
>   
> +struct virtual_engine {
> +	struct intel_engine_cs base;
> +
> +	struct intel_context context;
> +	struct kref kref;
> +	struct rcu_head rcu;
> +
> +	struct i915_request *request;
> +	struct ve_node {
> +		struct rb_node rb;
> +		int prio;
> +	} nodes[I915_NUM_ENGINES];

Please comment the fields at list in the above block.

> +
> +	unsigned int count;
> +	struct intel_engine_cs *siblings[0];
> +};
> +
> +static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> +{
> +	return container_of(engine, struct virtual_engine, base);
> +}
> +
>   static int execlists_context_deferred_alloc(struct intel_context *ce,
>   					    struct intel_engine_cs *engine);
>   static void execlists_init_reg_state(u32 *reg_state,
> @@ -235,7 +257,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   }
>   
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
> -				const struct i915_request *rq)
> +				const struct i915_request *rq,
> +				struct rb_node *rb)
>   {
>   	int last_prio;
>   
> @@ -270,6 +293,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	    rq_prio(list_next_entry(rq, link)) > last_prio)
>   		return true;
>   
> +	if (rb) { /* XXX virtual precedence */
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		bool preempt = false;
> +
> +		if (engine == ve->siblings[0]) { /* only preempt one sibling */

Why always siblings[0] ?

> +			spin_lock(&ve->base.timeline.lock);
> +			if (ve->request)
> +				preempt = rq_prio(ve->request) > last_prio;
> +			spin_unlock(&ve->base.timeline.lock);
> +		}
> +
> +		if (preempt)
> +			return preempt;
> +	}
> +
>   	/*
>   	 * If the inflight context did not trigger the preemption, then maybe
>   	 * it was the set of queued requests? Pick the highest priority in
> @@ -388,6 +427,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry_safe_reverse(rq, rn,
>   					 &engine->timeline.requests,
>   					 link) {
> +		struct intel_engine_cs *owner;
> +
>   		if (i915_request_completed(rq))
>   			break;
>   
> @@ -396,14 +437,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   
>   		GEM_BUG_ON(rq->hw_context->active);
>   
> -		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> -		if (rq_prio(rq) != prio) {
> -			prio = rq_prio(rq);
> -			pl = i915_sched_lookup_priolist(engine, prio);
> -		}
> -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +		owner = rq->hw_context->engine;
> +		if (likely(owner == engine)) {
> +			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> +			if (rq_prio(rq) != prio) {
> +				prio = rq_prio(rq);
> +				pl = i915_sched_lookup_priolist(engine, prio);
> +			}
> +			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +
> +			list_add(&rq->sched.link, pl);
> +		} else {
> +			if (__i915_request_has_started(rq))
> +				rq->sched.attr.priority |= ACTIVE_PRIORITY;
>   
> -		list_add(&rq->sched.link, pl);
> +			rq->engine = owner;
> +			owner->submit_request(rq);
> +		}

What's happening here - put some comment in please.

>   
>   		active = rq;
>   	}
> @@ -665,6 +715,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
>   						  execlists));
>   }
>   
> +static void virtual_update_register_offsets(u32 *regs,
> +					    struct intel_engine_cs *engine)
> +{
> +	u32 base = engine->mmio_base;
> +
> +	regs[CTX_CONTEXT_CONTROL] =
> +		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
> +	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
> +	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
> +	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
> +	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
> +
> +	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
> +	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
> +	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
> +	regs[CTX_SECOND_BB_HEAD_U] =
> +		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
> +	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
> +	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
> +
> +	regs[CTX_CTX_TIMESTAMP] =
> +		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
> +	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
> +	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
> +	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
> +	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
> +	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
> +	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
> +	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
> +	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
> +
> +	if (engine->class == RENDER_CLASS) {
> +		regs[CTX_RCS_INDIRECT_CTX] =
> +			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
> +		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
> +			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
> +		regs[CTX_BB_PER_CTX_PTR] =
> +			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
> +
> +		regs[CTX_R_PWR_CLK_STATE] =
> +			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
> +	}
> +}
> +
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -697,6 +791,28 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> +	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +		struct intel_engine_cs *active;
> +
> +		if (!rq) {
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&execlists->virtual);
> +			continue;
> +		}

Probably a good place to comment how each physical engine sees all veng 
requests and needs to unlink if someone else dequeued.

This relies on setting ve->request to NULL propagating to all CPUs as 
soon as cleared I think. Becuase all tasklets will be under different 
engine->timeline.lock, and the clear is under the VE->timeline.lock but 
this isn't.

If one CPU does not see the clear, it will skip removing the entry from 
the rbtree. But then it will take the VE->timeline.lock a bit down and 
fix up. Okay I think.

> +
> +		active = READ_ONCE(ve->context.active);
> +		if (active && active != engine) {
> +			rb = rb_next(rb);
> +			continue;
> +		}

What's happening here (a comment please)?

> +
> +		break;
> +	}
> +
>   	if (last) {
>   		/*
>   		 * Don't resubmit or switch until all outstanding
> @@ -718,7 +834,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
>   			return;
>   
> -		if (need_preempt(engine, last)) {
> +		if (need_preempt(engine, last, rb)) {
>   			inject_preempt_context(engine);
>   			return;
>   		}
> @@ -758,6 +874,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		last->tail = last->wa_tail;
>   	}
>   
> +	while (rb) { /* XXX virtual is always taking precedence */
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq;
> +
> +		spin_lock(&ve->base.timeline.lock);

This is under the physical engine timeline lock, so isn't nested 
allocation needed?

> +
> +		rq = ve->request;
> +		if (unlikely(!rq)) { /* lost the race to a sibling */
> +			spin_unlock(&ve->base.timeline.lock);
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&execlists->virtual);
> +			continue;
> +		}
> +
> +		if (rq_prio(rq) >= queue_prio(execlists)) {
> +			if (last && !can_merge_rq(last, rq)) {
> +				spin_unlock(&ve->base.timeline.lock);
> +				return; /* leave this rq for another engine */
> +			}
> +
> +			GEM_BUG_ON(rq->engine != &ve->base);
> +			ve->request = NULL;
> +			ve->base.execlists.queue_priority_hint = INT_MIN;

Why set to INT_MIN? Can't there be queued requests after this one?

> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +
> +			GEM_BUG_ON(rq->hw_context != &ve->context);
> +			rq->engine = engine;
> +
> +			if (engine != ve->siblings[0]) {
> +				u32 *regs = ve->context.lrc_reg_state;
> +				unsigned int n;
> +
> +				GEM_BUG_ON(READ_ONCE(ve->context.active));
> +				virtual_update_register_offsets(regs, engine);
> +
> +				/*
> +				 * Move the bound engine to the top of the list
> +				 * for future execution. We then kick this
> +				 * tasklet first before checking others, so that
> +				 * we preferentially reuse this set of bound
> +				 * registers.
> +				 */
> +				for (n = 1; n < ve->count; n++) {
> +					if (ve->siblings[n] == engine) {
> +						swap(ve->siblings[n],
> +						     ve->siblings[0]);
> +						break;
> +					}
> +				}
> +
> +				GEM_BUG_ON(ve->siblings[0] != engine);
> +			}
> +
> +			__i915_request_submit(rq);
> +			trace_i915_request_in(rq, port_index(port, execlists));
> +			submit = true;
> +			last = rq;
> +		}
> +
> +		spin_unlock(&ve->base.timeline.lock);
> +		break;
> +	}
> +
>   	while ((rb = rb_first_cached(&execlists->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
> @@ -2904,6 +3086,304 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
>   	}
>   }
>   
> +static void __virtual_engine_free(struct rcu_head *rcu)
> +{
> +	struct virtual_engine *ve = container_of(rcu, typeof(*ve), rcu);
> +
> +	kfree(ve);
> +}
> +
> +static void virtual_engine_free(struct kref *kref)
> +{
> +	struct virtual_engine *ve = container_of(kref, typeof(*ve), kref);
> +	unsigned int n;
> +
> +	GEM_BUG_ON(ve->request);
> +	GEM_BUG_ON(ve->context.active);
> +
> +	for (n = 0; n < ve->count; n++) {
> +		struct intel_engine_cs *sibling = ve->siblings[n];
> +		struct rb_node *node = &ve->nodes[sibling->id].rb;
> +
> +		if (RB_EMPTY_NODE(node))
> +			continue;
> +
> +		spin_lock_irq(&sibling->timeline.lock);
> +
> +		if (!RB_EMPTY_NODE(node))
> +			rb_erase_cached(node, &sibling->execlists.virtual);

There can only be one queued request?

> +
> +		spin_unlock_irq(&sibling->timeline.lock);
> +	}
> +	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));

Why would this not fire since virtual_engine_free can get called from 
set_param at any time?

> +
> +	if (ve->context.state)
> +		__execlists_context_fini(&ve->context);

And here why it can't be in use?

> +
> +	i915_timeline_fini(&ve->base.timeline);
> +	call_rcu(&ve->rcu, __virtual_engine_free);
> +}
> +
> +static void virtual_context_unpin(struct intel_context *ce)
> +{
> +	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> +
> +	execlists_context_unpin(ce);
> +
> +	kref_put(&ve->kref, virtual_engine_free);
> +}
> +
> +static void virtual_engine_initial_hint(struct virtual_engine *ve)
> +{
> +	int swp;
> +
> +	/*
> +	 * Pick a random sibling on starting to help spread the load around.
> +	 *
> +	 * New contexts are typically created with exactly the same order
> +	 * of siblings, and often started in batches. Due to the way we iterate
> +	 * the array of sibling when submitting requests, sibling[0] is
> +	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
> +	 * randomised across the system, we also help spread the load by the
> +	 * first engine we inspect being different each time.
> +	 *
> +	 * NB This does not force us to execute on this engine, it will just
> +	 * typically be the first we inspect for submission.
> +	 */
> +	swp = prandom_u32_max(ve->count);
> +	if (!swp)
> +		return;

Was randon better than round robin? Although yeah, it is local to each 
engine map so global rr or random pick would be a more complicated 
implementation best left for later if needed.

> +
> +	swap(ve->siblings[swp], ve->siblings[0]);
> +	virtual_update_register_offsets(ve->context.lrc_reg_state,
> +					ve->siblings[0]);
> +}
> +
> +static int virtual_context_pin(struct intel_context *ce)
> +{
> +	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> +	int err;
> +
> +	/* Note: we must use a real engine class for setting up reg state */
> +	err = __execlists_context_pin(ce, ve->siblings[0]);
> +	if (err)
> +		return err;
> +
> +	virtual_engine_initial_hint(ve);
> +
> +	kref_get(&ve->kref);
> +	return 0;
> +}
> +
> +static const struct intel_context_ops virtual_context_ops = {
> +	.pin = virtual_context_pin,
> +	.unpin = virtual_context_unpin,
> +};
> +
> +static void virtual_submission_tasklet(unsigned long data)
> +{
> +	struct virtual_engine * const ve = (struct virtual_engine *)data;
> +	unsigned int n;
> +	int prio;
> +
> +	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> +	if (prio == INT_MIN)
> +		return;
> +
> +	local_irq_disable();
> +	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> +		struct intel_engine_cs *sibling = ve->siblings[n];
> +		struct ve_node * const node = &ve->nodes[sibling->id];
> +		struct rb_node **parent, *rb;
> +		bool first;
> +
> +		spin_lock(&sibling->timeline.lock);
> +
> +		if (!RB_EMPTY_NODE(&node->rb)) {
> +			first = rb_first_cached(&sibling->execlists.virtual) == &node->rb;
> +			if (prio == node->prio || (prio > node->prio && first))
> +				goto submit_engine;
> +
> +			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
> +		}

What does this block do?

> +
> +		rb = NULL;
> +		first = true;
> +		parent = &sibling->execlists.virtual.rb_root.rb_node;
> +		while (*parent) {
> +			struct ve_node *other;
> +
> +			rb = *parent;
> +			other = rb_entry(rb, typeof(*other), rb);
> +			if (prio > other->prio) {
> +				parent = &rb->rb_left;
> +			} else {
> +				parent = &rb->rb_right;
> +				first = false;
> +			}
> +		}
> +
> +		rb_link_node(&node->rb, rb, parent);
> +		rb_insert_color_cached(&node->rb,
> +				       &sibling->execlists.virtual,
> +				       first);
> +
> +submit_engine:
> +		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> +		node->prio = prio;
> +		if (first && prio > sibling->execlists.queue_priority_hint) {
> +			sibling->execlists.queue_priority_hint = prio;
> +			tasklet_hi_schedule(&sibling->execlists.tasklet);
> +		}
> +
> +		spin_unlock(&sibling->timeline.lock);
> +	}
> +	local_irq_enable();
> +}
> +
> +static void virtual_submit_request(struct i915_request *request)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(request->engine);
> +
> +	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
> +
> +	GEM_BUG_ON(ve->request);
> +	ve->base.execlists.queue_priority_hint = rq_prio(request);
> +	WRITE_ONCE(ve->request, request);
> +
> +	tasklet_schedule(&ve->base.execlists.tasklet);

Not tasklet_hi_schedule like we otherwise do?

> +}
> +
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count)
> +{
> +	struct virtual_engine *ve;
> +	unsigned int n;
> +	int err;
> +
> +	if (!count)
> +		return ERR_PTR(-EINVAL);
> +
> +	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> +	if (!ve)
> +		return ERR_PTR(-ENOMEM);
> +
> +	kref_init(&ve->kref);
> +	rcu_head_init(&ve->rcu);
> +	ve->base.i915 = ctx->i915;
> +	ve->base.id = -1;
> +	ve->base.class = OTHER_CLASS;
> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> +
> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> +
> +	err = i915_timeline_init(ctx->i915,
> +				 &ve->base.timeline,
> +				 ve->base.name,
> +				 NULL);
> +	if (err)
> +		goto err_put;
> +	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
> +
> +	ve->base.cops = &virtual_context_ops;
> +	ve->base.request_alloc = execlists_request_alloc;
> +
> +	ve->base.schedule = i915_schedule;
> +	ve->base.submit_request = virtual_submit_request;
> +
> +	ve->base.execlists.queue_priority_hint = INT_MIN;
> +	tasklet_init(&ve->base.execlists.tasklet,
> +		     virtual_submission_tasklet,
> +		     (unsigned long)ve);
> +
> +	intel_context_init(&ve->context, ctx, &ve->base);
> +
> +	for (n = 0; n < count; n++) {
> +		struct intel_engine_cs *sibling = siblings[n];
> +
> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> +		if (sibling->mask & ve->base.mask)
> +			continue;

Eliminate duplicates? Need or want?

> +
> +		if (sibling->execlists.tasklet.func != execlists_submission_tasklet) {
> +			err = -ENODEV; > +			goto err_put;

Is this intended to prevent making virtual engines from virtual engines 
or more? Would it make sense to put an explicit is_virtual check for 
clarity? (Since in the extension processing it doesn't check - or maybe 
it would be good to check there early?)

> +		}
> +
> +		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
> +		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
> +
> +		ve->siblings[ve->count++] = sibling;
> +		ve->base.mask |= sibling->mask;
> +

/* Allow only same engine class. */

> +		if (ve->base.class != OTHER_CLASS) {
> +			if (ve->base.class != sibling->class) {
> +				err = -EINVAL;
> +				goto err_put;
> +			}
> +			continue;
> +		}
> +
> +		ve->base.class = sibling->class;
> +		snprintf(ve->base.name, sizeof(ve->base.name),
> +			 "v%dx%d", ve->base.class, count);

Do we want to go for unique names? Like instead of count have an unique 
monotonically increasing counter at the end? Or maybe 
virt<unique>:<class>:<sibling-count>? Might need increasing the engine 
name buffer.

> +		ve->base.context_size = sibling->context_size;
> +
> +		ve->base.emit_bb_start = sibling->emit_bb_start;
> +		ve->base.emit_flush = sibling->emit_flush;
> +		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> +		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
> +		ve->base.emit_fini_breadcrumb_dw =
> +			sibling->emit_fini_breadcrumb_dw;
> +	}
> +
> +	/* gracefully replace a degenerate virtual engine */
> +	if (is_power_of_2(ve->base.mask)) {
> +		struct intel_engine_cs *actual = ve->siblings[0];
> +		virtual_engine_free(&ve->kref);
> +		return actual;
> +	}

Why the term degenerate?

Also, is it possible at this stage? Higher level code will avoid 
creating a veng with only one engine.

> +
> +	__intel_context_insert(ctx, &ve->base, &ve->context);
> +	return &ve->base;
> +
> +err_put:
> +	virtual_engine_free(&ve->kref);
> +	return ERR_PTR(err);
> +}
> +
> +struct intel_engine_cs *
> +intel_execlists_clone_virtual(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *src)
> +{
> +	struct virtual_engine *se = to_virtual_engine(src);
> +	struct intel_engine_cs *dst;
> +
> +	dst = intel_execlists_create_virtual(ctx,
> +					     se->siblings,
> +					     se->count);
> +	if (IS_ERR(dst))
> +		return dst;
> +
> +	return dst;
> +}
> +
> +void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(engine);
> +
> +	if (!engine || !intel_engine_is_virtual(engine))
> +		return;
> +
> +	__intel_context_remove(&ve->context);
> +
> +	kref_put(&ve->kref, virtual_engine_free);
> +}
> +
>   void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   				   struct drm_printer *m,
>   				   void (*show_request)(struct drm_printer *m,
> @@ -2961,6 +3441,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   		show_request(m, last, "\t\tQ ");
>   	}
>   
> +	last = NULL;
> +	count = 0;
> +	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +
> +		if (rq) {
> +			if (count++ < max - 1)
> +				show_request(m, rq, "\t\tV ");
> +			else
> +				last = rq;
> +		}
> +	}
> +	if (last) {
> +		if (count > max) {
> +			drm_printf(m,
> +				   "\t\t...skipping %d virtual requests...\n",
> +				   count - max);
> +		}
> +		show_request(m, last, "\t\tV ");
> +	}
> +
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f1aec8a6986f..9d90dc68e02b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,6 +112,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   							const char *prefix),
>   				   unsigned int max);
>   
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count);
> +
> +struct intel_engine_cs *
> +intel_execlists_clone_virtual(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *src);
> +
> +void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
> +
>   u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
>   
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index d61520ea03c1..4b8a339529d1 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -10,6 +10,7 @@
>   
>   #include "../i915_selftest.h"
>   #include "igt_flush_test.h"
> +#include "igt_live_test.h"
>   #include "igt_spinner.h"
>   #include "i915_random.h"
>   
> @@ -1060,6 +1061,169 @@ static int live_preempt_smoke(void *arg)
>   	return err;
>   }
>   
> +static int nop_virtual_engine(struct drm_i915_private *i915,
> +			      struct intel_engine_cs **siblings,
> +			      unsigned int nsibling,
> +			      unsigned int nctx,
> +			      unsigned int flags)
> +#define CHAIN BIT(0)
> +{
> +	IGT_TIMEOUT(end_time);
> +	struct i915_request *request[16];
> +	struct i915_gem_context *ctx[16];
> +	struct intel_engine_cs *ve[16];
> +	unsigned long n, prime, nc;
> +	struct igt_live_test t;
> +	ktime_t times[2] = {};
> +	int err;
> +
> +	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
> +
> +	for (n = 0; n < nctx; n++) {
> +		ctx[n] = kernel_context(i915);
> +		if (!ctx[n])
> +			return -ENOMEM;
> +
> +		ve[n] = intel_execlists_create_virtual(ctx[n],
> +						       siblings, nsibling);
> +		if (IS_ERR(ve[n]))
> +			return PTR_ERR(ve[n]);
> +	}
> +
> +	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
> +	if (err)
> +		goto out;
> +
> +	for_each_prime_number_from(prime, 1, 8192) {
> +		times[1] = ktime_get_raw();
> +
> +		if (flags & CHAIN) {
> +			for (nc = 0; nc < nctx; nc++) {
> +				for (n = 0; n < prime; n++) {
> +					request[nc] =
> +						i915_request_alloc(ve[nc], ctx[nc]);
> +					if (IS_ERR(request[nc])) {
> +						err = PTR_ERR(request[nc]);
> +						goto out;
> +					}
> +
> +					i915_request_add(request[nc]);
> +				}
> +			}
> +		} else {
> +			for (n = 0; n < prime; n++) {
> +				for (nc = 0; nc < nctx; nc++) {
> +					request[nc] =
> +						i915_request_alloc(ve[nc], ctx[nc]);
> +					if (IS_ERR(request[nc])) {
> +						err = PTR_ERR(request[nc]);
> +						goto out;
> +					}
> +
> +					i915_request_add(request[nc]);
> +				}
> +			}
> +		}
> +
> +		for (nc = 0; nc < nctx; nc++) {
> +			if (i915_request_wait(request[nc],
> +					      I915_WAIT_LOCKED,
> +					      HZ / 10) < 0) {
> +				pr_err("%s(%s): wait for %llx:%lld timed out\n",
> +				       __func__, ve[0]->name,
> +				       request[nc]->fence.context,
> +				       request[nc]->fence.seqno);
> +
> +				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
> +					  __func__, ve[0]->name,
> +					  request[nc]->fence.context,
> +					  request[nc]->fence.seqno);
> +				GEM_TRACE_DUMP();
> +				i915_gem_set_wedged(i915);
> +				break;
> +			}
> +		}
> +
> +		times[1] = ktime_sub(ktime_get_raw(), times[1]);
> +		if (prime == 1)
> +			times[0] = times[1];
> +
> +		if (__igt_timeout(end_time, NULL))
> +			break;
> +	}
> +
> +	err = igt_live_test_end(&t);
> +	if (err)
> +		goto out;
> +
> +	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
> +		nctx, ve[0]->name, ktime_to_ns(times[0]),
> +		prime, div64_u64(ktime_to_ns(times[1]), prime));
> +
> +out:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	for (nc = 0; nc < nctx; nc++) {
> +		intel_virtual_engine_destroy(ve[nc]);
> +		kernel_context_close(ctx[nc]);
> +	}
> +	return err;
> +}
> +
> +static int live_virtual_engine(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	unsigned int class, inst;
> +	int err = -ENODEV;
> +
> +	if (USES_GUC_SUBMISSION(i915))
> +		return 0;
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +
> +	for_each_engine(engine, i915, id) {
> +		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
> +		if (err) {
> +			pr_err("Failed to wrap engine %s: err=%d\n",
> +			       engine->name, err);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> +		int nsibling, n;
> +
> +		nsibling = 0;
> +		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
> +			if (!i915->engine_class[class][inst])
> +				break;
> +
> +			siblings[nsibling++] = i915->engine_class[class][inst];
> +		}
> +		if (nsibling < 2)
> +			continue;
> +
> +		for (n = 1; n <= nsibling + 1; n++) {
> +			err = nop_virtual_engine(i915, siblings, nsibling,
> +						 n, 0);
> +			if (err)
> +				goto out_unlock;
> +		}
> +
> +		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
> +		if (err)
> +			goto out_unlock;
> +	}
> +
> +out_unlock:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +}
> +
>   int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -1071,6 +1235,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
> +		SUBTEST(live_virtual_engine),
>   	};
>   
>   	if (!HAS_EXECLISTS(i915))
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a609619610f2..592b02676044 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -125,6 +125,7 @@ enum drm_i915_gem_engine_class {
>   };
>   
>   #define I915_ENGINE_CLASS_INVALID_NONE -1
> +#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
>   
>   /**
>    * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
> @@ -1596,8 +1597,37 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +/*
> + * i915_context_engines_load_balance:
> + *
> + * Enable load balancing across this set of engines.
> + *
> + * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
> + * used will proxy the execbuffer request onto one of the set of engines
> + * in such a way as to distribute the load evenly across the set.
> + *
> + * The set of engines must be compatible (e.g. the same HW class) as they
> + * will share the same logical GPU context and ring.
> + *
> + * To intermix rendering with the virtual engine and direct rendering onto
> + * the backing engines (bypassing the load balancing proxy), the context must
> + * be defined to use a single timeline for all engines.
> + */
> +struct i915_context_engines_load_balance {
> +	struct i915_user_extension base;
> +
> +	__u16 engine_index;

Missing doc here.

Any need for mbz16 or could make the index u32?

> +	__u16 mbz16; /* reserved for future use; must be zero */
> +	__u32 flags; /* all undefined flags must be zero */
> +
> +	__u64 engines_mask; /* selection mask of engines[] */
> +
> +	__u64 mbz64[4]; /* reserved for future use; must be zero */
> +};
> +
>   struct i915_context_param_engines {
>   	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
> +#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
>   
>   	struct {
>   		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 11/13] drm/i915: Extend execution fence to support a callback
  2019-03-08 14:12 ` [PATCH 11/13] drm/i915: Extend execution fence to support a callback Chris Wilson
@ 2019-03-11 13:09   ` Tvrtko Ursulin
  2019-03-11 14:22     ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 13:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx



On 08/03/2019 14:12, Chris Wilson wrote:
> In the next patch, we will want to configure the slave request
> depending on which physical engine the master request is executed on.
> For this, we introduce a callback from the execute fence to convey this
> information.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_request.c | 84 +++++++++++++++++++++++++++--
>   drivers/gpu/drm/i915/i915_request.h |  4 ++
>   2 files changed, 83 insertions(+), 5 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 09046a15d218..5527ab22dbf2 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -38,6 +38,8 @@ struct execute_cb {
>   	struct list_head link;
>   	struct irq_work work;
>   	struct i915_sw_fence *fence;
> +	void (*hook)(struct i915_request *rq, struct dma_fence *signal);
> +	struct i915_request *signal;
>   };
>   
>   static struct i915_global_request {
> @@ -343,6 +345,17 @@ static void irq_execute_cb(struct irq_work *wrk)
>   	kmem_cache_free(global.slab_execute_cbs, cb);
>   }
>   
> +static void irq_execute_cb_hook(struct irq_work *wrk)
> +{
> +	struct execute_cb *cb = container_of(wrk, typeof(*cb), work);
> +
> +	cb->hook(container_of(cb->fence, struct i915_request, submit),
> +		 &cb->signal->fence);
> +	i915_request_put(cb->signal);
> +
> +	irq_execute_cb(wrk);
> +}
> +
>   static void __notify_execute_cb(struct i915_request *rq)
>   {
>   	struct execute_cb *cb;
> @@ -369,14 +382,19 @@ static void __notify_execute_cb(struct i915_request *rq)
>   }
>   
>   static int
> -i915_request_await_execution(struct i915_request *rq,
> -			     struct i915_request *signal,
> -			     gfp_t gfp)
> +__i915_request_await_execution(struct i915_request *rq,
> +			       struct i915_request *signal,
> +			       void (*hook)(struct i915_request *rq,
> +					    struct dma_fence *signal),
> +			       gfp_t gfp)
>   {
>   	struct execute_cb *cb;
>   
> -	if (i915_request_is_active(signal))
> +	if (i915_request_is_active(signal)) {
> +		if (hook)
> +			hook(rq, &signal->fence);
>   		return 0;
> +	}
>   
>   	cb = kmem_cache_alloc(global.slab_execute_cbs, gfp);
>   	if (!cb)
> @@ -386,8 +404,18 @@ i915_request_await_execution(struct i915_request *rq,
>   	i915_sw_fence_await(cb->fence);
>   	init_irq_work(&cb->work, irq_execute_cb);
>   
> +	if (hook) {
> +		cb->hook = hook;
> +		cb->signal = i915_request_get(signal);
> +		cb->work.func = irq_execute_cb_hook;
> +	}
> +
>   	spin_lock_irq(&signal->lock);
>   	if (i915_request_is_active(signal)) {
> +		if (hook) {
> +			hook(rq, &signal->fence);
> +			i915_request_put(signal);
> +		}
>   		i915_sw_fence_complete(cb->fence);
>   		kmem_cache_free(global.slab_execute_cbs, cb);
>   	} else {
> @@ -790,7 +818,7 @@ emit_semaphore_wait(struct i915_request *to,
>   		return err;
>   
>   	/* Only submit our spinner after the signaler is running! */
> -	err = i915_request_await_execution(to, from, gfp);
> +	err = __i915_request_await_execution(to, from, NULL, gfp);
>   	if (err)
>   		return err;
>   
> @@ -910,6 +938,52 @@ i915_request_await_dma_fence(struct i915_request *rq, struct dma_fence *fence)
>   	return 0;
>   }
>   
> +int
> +i915_request_await_execution(struct i915_request *rq,
> +			     struct dma_fence *fence,
> +			     void (*hook)(struct i915_request *rq,
> +					  struct dma_fence *signal))
> +{
> +	struct dma_fence **child = &fence;
> +	unsigned int nchild = 1;
> +	int ret;
> +
> +	if (dma_fence_is_array(fence)) {
> +		struct dma_fence_array *array = to_dma_fence_array(fence);
> +
> +		/* XXX Error for signal-on-any fence arrays */

Unfinished?

> +
> +		child = array->fences;
> +		nchild = array->num_fences;
> +		GEM_BUG_ON(!nchild);
> +	}
> +
> +	do {
> +		fence = *child++;
> +		if (test_bit(DMA_FENCE_FLAG_SIGNALED_BIT, &fence->flags))
> +			continue;
> +
> +		/*
> +		 * We don't squash repeated fence dependencies here as we
> +		 * want to run our callback in all cases.
> +		 */
> +
> +		if (dma_fence_is_i915(fence))
> +			ret = __i915_request_await_execution(rq,
> +							     to_request(fence),
> +							     hook,
> +							     I915_FENCE_GFP);
> +		else
> +			ret = i915_sw_fence_await_dma_fence(&rq->submit, fence,
> +							    I915_FENCE_TIMEOUT,
> +							    GFP_KERNEL);
> +		if (ret < 0)
> +			return ret;
> +	} while (--nchild);
> +
> +	return 0;
> +}
> +
>   /**
>    * i915_request_await_object - set this request to (async) wait upon a bo
>    * @to: request we are wishing to use
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index cd6c130964cd..d4f6b2940130 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -265,6 +265,10 @@ int i915_request_await_object(struct i915_request *to,
>   			      bool write);
>   int i915_request_await_dma_fence(struct i915_request *rq,
>   				 struct dma_fence *fence);
> +int i915_request_await_execution(struct i915_request *rq,
> +				 struct dma_fence *fence,
> +				 void (*hook)(struct i915_request *rq,
> +					      struct dma_fence *signal));
>   
>   void i915_request_add(struct i915_request *rq);
>   
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 12/13] drm/i915/execlists: Virtual engine bonding
  2019-03-08 14:12 ` [PATCH 12/13] drm/i915/execlists: Virtual engine bonding Chris Wilson
@ 2019-03-11 13:38   ` Tvrtko Ursulin
  2019-03-11 14:30     ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 13:38 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> Some users require that when a master batch is executed on one particular
> engine, a companion batch is run simultaneously on a specific slave
> engine. For this purpose, we introduce virtual engine bonding, allowing
> maps of master:slaves to be constructed to constrain which physical
> engines a virtual engine may select given a fence on a master engine.
> 
> For the moment, we continue to ignore the issue of preemption deferring
> the master request for later. Ideally, we would like to then also remove
> the slave and run something else rather than have it stall the pipeline.
> With load balancing, we should be able to move workload around it, but
> there is a similar stall on the master pipeline while it may wait for
> the slave to be executed. At the cost of more latency for the bonded
> request, it may be interesting to launch both on their engines in
> lockstep. (Bubbles abound.)
> 
> Opens: Also what about bonding an engine as its own master? It doesn't
> break anything internally, so allow the silliness.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>   drivers/gpu/drm/i915/i915_gem_context.c    |  50 ++++++
>   drivers/gpu/drm/i915/i915_request.c        |   1 +
>   drivers/gpu/drm/i915/i915_request.h        |   1 +
>   drivers/gpu/drm/i915/intel_engine_types.h  |   7 +
>   drivers/gpu/drm/i915/intel_lrc.c           | 111 ++++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.h           |   4 +
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 167 +++++++++++++++++++++
>   include/uapi/drm/i915_drm.h                |  22 +++
>   8 files changed, 363 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 13b79980f7f3..0d86306497b8 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -1484,8 +1484,58 @@ set_engines__load_balance(struct i915_user_extension __user *base, void *data)
>   	return 0;
>   }
>   
> +static int
> +set_engines__bond(struct i915_user_extension __user *base, void *data)
> +{
> +	struct i915_context_engines_bond __user *ext =
> +		container_of_user(base, typeof(*ext), base);
> +	const struct set_engines *set = data;
> +	struct intel_engine_cs *master;
> +	u32 class, instance, siblings;
> +	u16 idx;
> +	int err;
> +
> +	if (get_user(idx, &ext->engine_index))
> +		return -EFAULT;
> +
> +	if (idx >= set->nengine)
> +		return -EINVAL;
> +
> +	idx = array_index_nospec(idx, set->nengine);
> +	if (!set->engines[idx])
> +		return -EINVAL;
> +
> +	/*
> +	 * A non-virtual engine has 0 siblings to choose between; and submit
> +	 * fence will always be directed to the one engine.
> +	 */
> +	if (!intel_engine_is_virtual(set->engines[idx]))
> +		return 0;
> +
> +	err = check_user_mbz16(&ext->mbz);
> +	if (err)
> +		return err;
> +
> +	if (get_user(class, &ext->master_class))
> +		return -EFAULT;
> +
> +	if (get_user(instance, &ext->master_instance))
> +		return -EFAULT;
> +
> +	master = intel_engine_lookup_user(set->ctx->i915, class, instance);
> +	if (!master)
> +		return -EINVAL;
> +
> +	if (get_user(siblings, &ext->sibling_mask))
> +		return -EFAULT;
> +
> +	return intel_virtual_engine_attach_bond(set->engines[idx],
> +						master, siblings);
> +}
> +
>   static const i915_user_extension_fn set_engines__extensions[] = {
>   	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
> +	[I915_CONTEXT_ENGINES_EXT_BOND] = set_engines__bond,
>   };
>   
>   static int
> diff --git a/drivers/gpu/drm/i915/i915_request.c b/drivers/gpu/drm/i915/i915_request.c
> index 5527ab22dbf2..0caf31de2b98 100644
> --- a/drivers/gpu/drm/i915/i915_request.c
> +++ b/drivers/gpu/drm/i915/i915_request.c
> @@ -743,6 +743,7 @@ i915_request_alloc(struct intel_engine_cs *engine, struct i915_gem_context *ctx)
>   	rq->batch = NULL;
>   	rq->capture_list = NULL;
>   	rq->waitboost = false;
> +	rq->execution_mask = ~0u;
>   
>   	/*
>   	 * Reserve space in the ring buffer for all the commands required to
> diff --git a/drivers/gpu/drm/i915/i915_request.h b/drivers/gpu/drm/i915/i915_request.h
> index d4f6b2940130..862b25930de0 100644
> --- a/drivers/gpu/drm/i915/i915_request.h
> +++ b/drivers/gpu/drm/i915/i915_request.h
> @@ -145,6 +145,7 @@ struct i915_request {
>   	 */
>   	struct i915_sched_node sched;
>   	struct i915_dependency dep;
> +	unsigned int execution_mask;
>   
>   	/*
>   	 * A convenience pointer to the current breadcrumb value stored in
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index d54d2a1840cc..6dfcf5cc08c1 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -382,6 +382,13 @@ struct intel_engine_cs {
>   	 */
>   	void		(*submit_request)(struct i915_request *rq);
>   
> +	/*
> +	 * Called on signaling of a SUBMIT_FENCE, passing along the signaling
> +	 * request down to the bonded pairs.
> +	 */
> +	void            (*bond_execute)(struct i915_request *rq,
> +					struct dma_fence *signal);
> +
>   	/*
>   	 * Call when the priority on a request has changed and it and its
>   	 * dependencies may need rescheduling. Note the request itself may
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 0c97e8f30223..f06312d185af 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -179,6 +179,12 @@ struct virtual_engine {
>   		int prio;
>   	} nodes[I915_NUM_ENGINES];
>   
> +	struct ve_bond {
> +		struct intel_engine_cs *master;
> +		unsigned int sibling_mask;
> +	} *bonds;
> +	unsigned int nbond;
> +
>   	unsigned int count;
>   	struct intel_engine_cs *siblings[0];
>   };
> @@ -3183,6 +3189,7 @@ static const struct intel_context_ops virtual_context_ops = {
>   static void virtual_submission_tasklet(unsigned long data)
>   {
>   	struct virtual_engine * const ve = (struct virtual_engine *)data;
> +	unsigned int mask;
>   	unsigned int n;
>   	int prio;
>   
> @@ -3191,12 +3198,30 @@ static void virtual_submission_tasklet(unsigned long data)
>   		return;
>   
>   	local_irq_disable();
> +
> +	mask = 0;
> +	spin_lock(&ve->base.timeline.lock);
> +	if (ve->request)
> +		mask = ve->request->execution_mask;
> +	spin_unlock(&ve->base.timeline.lock);
> +
>   	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
>   		struct intel_engine_cs *sibling = ve->siblings[n];
>   		struct ve_node * const node = &ve->nodes[sibling->id];
>   		struct rb_node **parent, *rb;
>   		bool first;
> 

/* Check if request can be executed on this sibling. Because it cannot 
be known in advance (at insert to queue time).. ? */

> +		if (unlikely(!(mask & sibling->mask))) {
> +			if (!RB_EMPTY_NODE(&node->rb)) {
> +				spin_lock(&sibling->timeline.lock);
> +				rb_erase_cached(&node->rb,
> +						&sibling->execlists.virtual);
> +				RB_CLEAR_NODE(&node->rb);
> +				spin_unlock(&sibling->timeline.lock);
> +			}
> +			continue;
> +		}
> +
>   		spin_lock(&sibling->timeline.lock);
>   
>   		if (!RB_EMPTY_NODE(&node->rb)) {
> @@ -3254,6 +3279,30 @@ static void virtual_submit_request(struct i915_request *request)
>   	tasklet_schedule(&ve->base.execlists.tasklet);
>   }
>   
> +static struct ve_bond *
> +virtual_find_bond(struct virtual_engine *ve, struct intel_engine_cs *master)
> +{
> +	int i;
> +
> +	for (i = 0; i < ve->nbond; i++) {
> +		if (ve->bonds[i].master == master)
> +			return &ve->bonds[i];
> +	}
> +
> +	return NULL;
> +}
> +
> +static void
> +virtual_bond_execute(struct i915_request *rq, struct dma_fence *signal)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(rq->engine);
> +	struct ve_bond *bond;
> +
> +	bond = virtual_find_bond(ve, to_request(signal)->engine);
> +	if (bond) /* XXX serialise with rq->lock? */
> +		rq->execution_mask &= bond->sibling_mask;
> +}
> +
>   struct intel_engine_cs *
>   intel_execlists_create_virtual(struct i915_gem_context *ctx,
>   			       struct intel_engine_cs **siblings,
> @@ -3294,6 +3343,7 @@ intel_execlists_create_virtual(struct i915_gem_context *ctx,
>   
>   	ve->base.schedule = i915_schedule;
>   	ve->base.submit_request = virtual_submit_request;
> +	ve->base.bond_execute = virtual_bond_execute;
>   
>   	ve->base.execlists.queue_priority_hint = INT_MIN;
>   	tasklet_init(&ve->base.execlists.tasklet,
> @@ -3369,9 +3419,70 @@ intel_execlists_clone_virtual(struct i915_gem_context *ctx,
>   	if (IS_ERR(dst))
>   		return dst;
>   
> +	if (se->nbond) {
> +		struct virtual_engine *de = to_virtual_engine(dst);
> +
> +		de->bonds = kmemdup(se->bonds,
> +				    sizeof(*se->bonds) * se->nbond,
> +				    GFP_KERNEL);
> +		if (!de->bonds) {
> +			intel_virtual_engine_destroy(dst);
> +			return ERR_PTR(-ENOMEM);
> +		}
> +
> +		de->nbond = se->nbond;
> +	}
> +
>   	return dst;
>   }
>   
> +static unsigned long
> +virtual_execution_mask(struct virtual_engine *ve, unsigned long mask)
> +{
> +	unsigned long emask = 0;
> +	int bit;
> +
> +	for_each_set_bit(bit, &mask, ve->count)
> +		emask |= ve->siblings[bit]->mask;
> +
> +	return emask;
> +}
> +
> +int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> +				     struct intel_engine_cs *master,
> +				     unsigned long mask)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(engine);
> +	struct ve_bond *bond;
> +
> +	if (mask >> ve->count)
> +		return -EINVAL;
> +
> +	mask = virtual_execution_mask(ve, mask);
> +	if (!mask)
> +		return -EINVAL;
> +
> +	bond = virtual_find_bond(ve, master);
> +	if (bond) {
> +		bond->sibling_mask |= mask;
> +		return 0;
> +	}
> +
> +	bond = krealloc(ve->bonds,
> +			sizeof(*bond) * (ve->nbond + 1),
> +			GFP_KERNEL);
> +	if (!bond)
> +		return -ENOMEM;
> +
> +	bond[ve->nbond].master = master;
> +	bond[ve->nbond].sibling_mask = mask;
> +
> +	ve->bonds = bond;
> +	ve->nbond++;
> +
> +	return 0;
> +}
> +
>   void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
>   {

I think there should be a hunk in here to free ve->bonds.

>   	struct virtual_engine *ve = to_virtual_engine(engine);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 9d90dc68e02b..77b85648045a 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -121,6 +121,10 @@ struct intel_engine_cs *
>   intel_execlists_clone_virtual(struct i915_gem_context *ctx,
>   			      struct intel_engine_cs *src);
>   
> +int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> +				     struct intel_engine_cs *master,
> +				     unsigned long mask);
> +
>   void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
>   
>   u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index 4b8a339529d1..a7de7a8fc24a 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -13,6 +13,7 @@
>   #include "igt_live_test.h"
>   #include "igt_spinner.h"
>   #include "i915_random.h"
> +#include "lib_sw_fence.h"
>   
>   #include "mock_context.h"
>   
> @@ -1224,6 +1225,171 @@ static int live_virtual_engine(void *arg)
>   	return err;
>   }
>   
> +static int bond_virtual_engine(struct drm_i915_private *i915,
> +			       unsigned int class,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int nsibling,
> +			       unsigned int flags)
> +#define BOND_SCHEDULE BIT(0)
> +{
> +	struct intel_engine_cs *master;
> +	struct i915_gem_context *ctx;
> +	struct i915_request *rq[16];
> +	enum intel_engine_id id;
> +	unsigned long n;
> +	int err;
> +
> +	GEM_BUG_ON(nsibling >= ARRAY_SIZE(rq) - 1);
> +
> +	ctx = kernel_context(i915);
> +	if (!ctx)
> +		return -ENOMEM;
> +
> +	err = 0;
> +	rq[0] = ERR_PTR(-ENOMEM);
> +	for_each_engine(master, i915, id) {
> +		struct i915_sw_fence fence;
> +
> +		if (master->class == class)
> +			continue;
> +
> +		rq[0] = i915_request_alloc(master, ctx);
> +		if (IS_ERR(rq[0])) {
> +			err = PTR_ERR(rq[0]);
> +			goto out;
> +		}
> +
> +		if (flags & BOND_SCHEDULE)
> +			onstack_fence_init(&fence);
> +
> +		i915_request_get(rq[0]);
> +		i915_request_add(rq[0]);
> +
> +		for (n = 0; n < nsibling; n++) {
> +			struct intel_engine_cs *engine;
> +
> +			engine = intel_execlists_create_virtual(ctx,
> +								siblings,
> +								nsibling);
> +			if (IS_ERR(engine)) {
> +				err = PTR_ERR(engine);
> +				goto out;
> +			}
> +
> +			err = intel_virtual_engine_attach_bond(engine,
> +							       master,
> +							       BIT(n));
> +			if (err) {
> +				intel_virtual_engine_destroy(engine);
> +				goto out;
> +			}
> +
> +			rq[n + 1] = i915_request_alloc(engine, ctx);
> +			if (IS_ERR(rq[n + 1])) {
> +				err = PTR_ERR(rq[n + 1]);
> +				intel_virtual_engine_destroy(engine);
> +				goto out;
> +			}
> +			i915_request_get(rq[n + 1]);
> +
> +			err = i915_request_await_execution(rq[n + 1],
> +							   &rq[0]->fence,
> +							   engine->bond_execute);
> +			i915_request_add(rq[n + 1]);
> +			intel_virtual_engine_destroy(engine);
> +			if (err < 0)
> +				goto out;
> +		}
> +		rq[n + 1] = ERR_PTR(-EINVAL);
> +
> +		if (flags & BOND_SCHEDULE)
> +			onstack_fence_fini(&fence);

The idea of this fence is to delay rq[0]? But I don't see it passed in 
anywhere? I wanted to suggest to check before onstack_fence_fini that 
rq[0], or any other request, haven't been submitted/completed yet.

> +
> +		for (n = 0; n < nsibling; n++) {
> +			if (i915_request_wait(rq[n + 1],
> +					      I915_WAIT_LOCKED,
> +					      MAX_SCHEDULE_TIMEOUT) < 0) {
> +				err = -EIO;
> +				goto out;
> +			}
> +
> +			if (rq[n + 1]->engine != siblings[n]) {
> +				pr_err("Bonded request did not execute on target engine: expected %s, used %s; master was %s\n",
> +				       siblings[n]->name,
> +				       rq[n + 1]->engine->name,
> +				       rq[0]->engine->name);
> +				err = -EINVAL;
> +				goto out;
> +			}
> +		}
> +
> +		for (n = 0; !IS_ERR(rq[n]); n++)
> +			i915_request_put(rq[n]);
> +		rq[0] = ERR_PTR(-ENOMEM);
> +	}
> +
> +out:
> +	for (n = 0; !IS_ERR(rq[n]); n++)
> +		i915_request_put(rq[n]);
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	kernel_context_close(ctx);
> +	return err;
> +}
> +
> +static int live_virtual_bond(void *arg)
> +{
> +	static const struct phase {
> +		const char *name;
> +		unsigned int flags;
> +	} phases[] = {
> +		{ "", 0 },
> +		{ "schedule", BOND_SCHEDULE },
> +		{ },
> +	};
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> +	unsigned int class, inst;
> +	int err = 0;
> +
> +	if (USES_GUC_SUBMISSION(i915))
> +		return 0;

One day no one will remember to add missing coverage but we don't have 
selftest skips.. task for an intern.

> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +
> +	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> +		const struct phase *p;
> +		int nsibling;
> +
> +		nsibling = 0;
> +		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
> +			if (!i915->engine_class[class][inst])
> +				break;
> +
> +			GEM_BUG_ON(nsibling == ARRAY_SIZE(siblings));
> +			siblings[nsibling++] = i915->engine_class[class][inst];
> +		}
> +		if (nsibling < 2)
> +			continue;
> +
> +		for (p = phases; p->name; p++) {
> +			err = bond_virtual_engine(i915,
> +						  class, siblings, nsibling,
> +						  p->flags);
> +			if (err) {
> +				pr_err("%s(%s): failed class=%d, nsibling=%d, err=%d\n",
> +				       __func__, p->name, class, nsibling, err);
> +				goto out_unlock;
> +			}
> +		}
> +	}
> +
> +out_unlock:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +}
> +
>   int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -1236,6 +1402,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
>   		SUBTEST(live_virtual_engine),
> +		SUBTEST(live_virtual_bond),
>   	};
>   
>   	if (!HAS_EXECLISTS(i915))
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 592b02676044..94e72ae954a0 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -1530,6 +1530,10 @@ struct drm_i915_gem_context_param {
>    * sized argument, will revert back to default settings.
>    *
>    * See struct i915_context_param_engines.
> + *
> + * Extensions:
> + *   i915_context_engines_load_balance (I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE)
> + *   i915_context_engines_bond (I915_CONTEXT_ENGINES_EXT_BOND)
>    */
>   #define I915_CONTEXT_PARAM_ENGINES	0xa
>   /* Must be kept compact -- no holes and well documented */
> @@ -1625,9 +1629,27 @@ struct i915_context_engines_load_balance {
>   	__u64 mbz64[4]; /* reserved for future use; must be zero */
>   };
>   
> +/*
> + * i915_context_engines_bond:
> + *
> + */
> +struct i915_context_engines_bond {
> +	struct i915_user_extension base;
> +
> +	__u16 engine_index;
> +	__u16 mbz;
> +
> +	__u16 master_class;
> +	__u16 master_instance;
> +
> +	__u64 sibling_mask;
> +	__u64 flags; /* all undefined flags must be zero */
> +};

Need some api doc for it all.

We are not in danger of any compiler added padding due not leaving any 
holes, right? So no need to define this as packed?

> +
>   struct i915_context_param_engines {
>   	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
>   #define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
> +#define I915_CONTEXT_ENGINES_EXT_BOND 1
>   
>   	struct {
>   		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> 

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 13/13] drm/i915: Allow specification of parallel execbuf
  2019-03-08 14:12 ` [PATCH 13/13] drm/i915: Allow specification of parallel execbuf Chris Wilson
@ 2019-03-11 13:40   ` Tvrtko Ursulin
  0 siblings, 0 replies; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 13:40 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> There is a desire to split a task onto two engines and have them run at
> the same time, e.g. scanline interleaving to spread the workload evenly.
> Through the use of the out-fence from the first execbuf, we can
> coordinate secondary execbuf to only become ready simultaneously with
> the first, so that with all things idle the second execbufs are executed
> in parallel with the first. The key difference here between the new
> EXEC_FENCE_SUBMIT and the existing EXEC_FENCE_IN is that the in-fence
> waits for the completion of the first request (so that all of its
> rendering results are visible to the second execbuf, the more common
> userspace fence requirement).
> 
> Since we only have a single input fence slot, userspace cannot mix an
> in-fence and a submit-fence. It has to use one or the other! This is not
> such a harsh requirement, since by virtue of the submit-fence, the
> secondary execbuf inherit all of the dependencies from the first
> request, and for the application the dependencies should be common
> between the primary and secondary execbuf.
> 
> Suggested-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Testcase: igt/gem_exec_fence/parallel
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.c            |  1 +
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 25 +++++++++++++++++++++-
>   include/uapi/drm/i915_drm.h                | 17 ++++++++++++++-
>   3 files changed, 41 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 93e41c937d96..afdfced262e6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -421,6 +421,7 @@ static int i915_getparam_ioctl(struct drm_device *dev, void *data,
>   	case I915_PARAM_HAS_EXEC_CAPTURE:
>   	case I915_PARAM_HAS_EXEC_BATCH_FIRST:
>   	case I915_PARAM_HAS_EXEC_FENCE_ARRAY:
> +	case I915_PARAM_HAS_EXEC_SUBMIT_FENCE:
>   		/* For the time being all of these are always true;
>   		 * if some supported hardware does not have one of these
>   		 * features this value needs to be provided from
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 67e4a0c2ebff..8f14ea41d4e7 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -2285,6 +2285,7 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   {
>   	struct i915_execbuffer eb;
>   	struct dma_fence *in_fence = NULL;
> +	struct dma_fence *exec_fence = NULL;
>   	struct sync_file *out_fence = NULL;
>   	intel_wakeref_t wakeref;
>   	int out_fence_fd = -1;
> @@ -2328,11 +2329,24 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   			return -EINVAL;
>   	}
>   
> +	if (args->flags & I915_EXEC_FENCE_SUBMIT) {
> +		if (in_fence) {
> +			err = -EINVAL;
> +			goto err_in_fence;
> +		}
> +
> +		exec_fence = sync_file_get_fence(lower_32_bits(args->rsvd2));
> +		if (!exec_fence) {
> +			err = -EINVAL;
> +			goto err_in_fence;
> +		}
> +	}
> +
>   	if (args->flags & I915_EXEC_FENCE_OUT) {
>   		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
>   		if (out_fence_fd < 0) {
>   			err = out_fence_fd;
> -			goto err_in_fence;
> +			goto err_exec_fence;
>   		}
>   	}
>   
> @@ -2464,6 +2478,13 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   			goto err_request;
>   	}
>   
> +	if (exec_fence) {
> +		err = i915_request_await_execution(eb.request, exec_fence,
> +						   eb.engine->bond_execute);
> +		if (err < 0)
> +			goto err_request;
> +	}
> +
>   	if (fences) {
>   		err = await_fence_array(&eb, fences);
>   		if (err)
> @@ -2524,6 +2545,8 @@ i915_gem_do_execbuffer(struct drm_device *dev,
>   err_out_fence:
>   	if (out_fence_fd != -1)
>   		put_unused_fd(out_fence_fd);
> +err_exec_fence:
> +	dma_fence_put(exec_fence);
>   err_in_fence:
>   	dma_fence_put(in_fence);
>   	return err;
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 94e72ae954a0..a6cfd1232537 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -591,6 +591,12 @@ typedef struct drm_i915_irq_wait {
>    */
>   #define I915_PARAM_MMAP_GTT_COHERENT	52
>   
> +/*
> + * Query whether DRM_I915_GEM_EXECBUFFER2 supports coordination of parallel
> + * execution through use of explicit fence support.
> + * See I915_EXEC_FENCE_OUT and I915_EXEC_FENCE_SUBMIT.
> + */
> +#define I915_PARAM_HAS_EXEC_SUBMIT_FENCE 53
>   /* Must be kept compact -- no holes and well documented */
>   
>   typedef struct drm_i915_getparam {
> @@ -1113,7 +1119,16 @@ struct drm_i915_gem_execbuffer2 {
>    */
>   #define I915_EXEC_FENCE_ARRAY   (1<<19)
>   
> -#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_ARRAY<<1))
> +/*
> + * Setting I915_EXEC_FENCE_SUBMIT implies that lower_32_bits(rsvd2) represent
> + * a sync_file fd to wait upon (in a nonblocking manner) prior to executing
> + * the batch.
> + *
> + * Returns -EINVAL if the sync_file fd cannot be found.
> + */
> +#define I915_EXEC_FENCE_SUBMIT		(1 << 20)
> +
> +#define __I915_EXEC_UNKNOWN_FLAGS (-(I915_EXEC_FENCE_SUBMIT << 1))
>   
>   #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
>   #define i915_execbuffer2_set_context_id(eb2, context) \
> 

Simple enough, LGTM.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/13] drm/i915: Load balancing across a virtual engine
  2019-03-11 12:47   ` Tvrtko Ursulin
@ 2019-03-11 13:43     ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 13:43 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 12:47:30)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > Having allowed the user to define a set of engines that they will want
> > to only use, we go one step further and allow them to bind those engines
> > into a single virtual instance. Submitting a batch to the virtual engine
> > will then forward it to any one of the set in a manner as best to
> > distribute load.  The virtual engine has a single timeline across all
> > engines (it operates as a single queue), so it is not able to concurrently
> > run batches across multiple engines by itself; that is left up to the user
> > to submit multiple concurrent batches to multiple queues. Multiple users
> > will be load balanced across the system.
> > 
> > The mechanism used for load balancing in this patch is a late greedy
> > balancer. When a request is ready for execution, it is added to each
> > engine's queue, and when an engine is ready for its next request it
> > claims it from the virtual engine. The first engine to do so, wins, i.e.
> > the request is executed at the earliest opportunity (idle moment) in the
> > system.
> > 
> > As not all HW is created equal, the user is still able to skip the
> > virtual engine and execute the batch on a specific engine, all within the
> > same queue. It will then be executed in order on the correct engine,
> > with execution on other virtual engines being moved away due to the load
> > detection.
> > 
> > A couple of areas for potential improvement left!
> > 
> > - The virtual engine always take priority over equal-priority tasks.
> > Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
> > and hopefully the virtual and real engines are not then congested (i.e.
> > all work is via virtual engines, or all work is to the real engine).
> > 
> > - We require the breadcrumb irq around every virtual engine request. For
> > normal engines, we eliminate the need for the slow round trip via
> > interrupt by using the submit fence and queueing in order. For virtual
> > engines, we have to allow any job to transfer to a new ring, and cannot
> > coalesce the submissions, so require the completion fence instead,
> > forcing the persistent use of interrupts.
> > 
> > - We only drip feed single requests through each virtual engine and onto
> > the physical engines, even if there was enough work to fill all ELSP,
> > leaving small stalls with an idle CS event at the end of every request.
> > Could we be greedy and fill both slots? Being lazy is virtuous for load
> > distribution on less-than-full workloads though.
> > 
> > Other areas of improvement are more general, such as reducing lock
> > contention, reducing dispatch overhead, looking at direct submission
> > rather than bouncing around tasklets etc.
> > 
> > Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> > Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > ---
> >   drivers/gpu/drm/i915/i915_gem.h            |   5 +
> >   drivers/gpu/drm/i915/i915_gem_context.c    | 153 +++++-
> >   drivers/gpu/drm/i915/i915_scheduler.c      |  17 +-
> >   drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
> >   drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
> >   drivers/gpu/drm/i915/intel_lrc.c           | 521 ++++++++++++++++++++-
> >   drivers/gpu/drm/i915/intel_lrc.h           |  11 +
> >   drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 +++++++
> >   include/uapi/drm/i915_drm.h                |  30 ++
> >   9 files changed, 895 insertions(+), 16 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> > index 74a2ddc1b52f..dbcea6e29d48 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.h
> > +++ b/drivers/gpu/drm/i915/i915_gem.h
> > @@ -91,4 +91,9 @@ static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
> >       return !atomic_read(&t->count);
> >   }
> >   
> > +static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
> > +{
> > +     return test_bit(TASKLET_STATE_SCHED, &t->state);
> > +}
> > +
> >   #endif /* __I915_GEM_H__ */
> > diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> > index a581c01ffff1..13b79980f7f3 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_context.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> > @@ -86,12 +86,16 @@
> >    */
> >   
> >   #include <linux/log2.h>
> > +#include <linux/nospec.h>
> > +
> >   #include <drm/i915_drm.h>
> > +
> >   #include "i915_drv.h"
> >   #include "i915_globals.h"
> >   #include "i915_trace.h"
> >   #include "i915_user_extensions.h"
> >   #include "intel_lrc_reg.h"
> > +#include "intel_lrc.h"
> >   #include "intel_workarounds.h"
> >   
> >   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
> > @@ -238,6 +242,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
> >       mutex_unlock(&i915->contexts.mutex);
> >   }
> >   
> > +static void free_engines(struct intel_engine_cs **engines, int count)
> > +{
> > +     int i;
> > +
> > +     if (!engines)
> > +             return;
> > +
> > +     /* We own the veng we created; regular engines are ignored */
> > +     for (i = 0; i < count; i++)
> > +             intel_virtual_engine_destroy(engines[i]);
> > +
> > +     kfree(engines);
> > +}
> > +
> >   static void i915_gem_context_free(struct i915_gem_context *ctx)
> >   {
> >       struct intel_context *it, *n;
> > @@ -248,8 +266,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
> >   
> >       release_hw_id(ctx);
> >       i915_ppgtt_put(ctx->ppgtt);
> > -
> > -     kfree(ctx->engines);
> > +     free_engines(ctx->engines, ctx->nengine);
> >   
> >       rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
> >               it->ops->destroy(it);
> > @@ -1359,13 +1376,116 @@ static int set_sseu(struct i915_gem_context *ctx,
> >       return 0;
> >   };
> >   
> > +static int check_user_mbz16(u16 __user *user)
> > +{
> > +     u16 mbz;
> > +
> > +     if (get_user(mbz, user))
> > +             return -EFAULT;
> > +
> > +     return mbz ? -EINVAL : 0;
> > +}
> > +
> > +static int check_user_mbz32(u32 __user *user)
> > +{
> > +     u32 mbz;
> > +
> > +     if (get_user(mbz, user))
> > +             return -EFAULT;
> > +
> > +     return mbz ? -EINVAL : 0;
> > +}
> > +
> > +static int check_user_mbz64(u64 __user *user)
> > +{
> > +     u64 mbz;
> > +
> > +     if (get_user(mbz, user))
> > +             return -EFAULT;
> > +
> > +     return mbz ? -EINVAL : 0;
> > +}
> 
> Could generate the three with a macro but it would be marginal 
> improvement if any.
> 
> > +
> >   struct set_engines {
> >       struct i915_gem_context *ctx;
> >       struct intel_engine_cs **engines;
> >       unsigned int nengine;
> >   };
> >   
> > +static int
> > +set_engines__load_balance(struct i915_user_extension __user *base, void *data)
> > +{
> > +     struct i915_context_engines_load_balance __user *ext =
> > +             container_of_user(base, typeof(*ext), base);
> > +     const struct set_engines *set = data;
> > +     struct intel_engine_cs *ve;
> > +     unsigned int n;
> > +     u64 mask;
> > +     u16 idx;
> > +     int err;
> > +
> > +     if (!HAS_EXECLISTS(set->ctx->i915))
> > +             return -ENODEV;
> > +
> > +     if (USES_GUC_SUBMISSION(set->ctx->i915))
> > +             return -ENODEV; /* not implement yet */
> > +
> > +     if (get_user(idx, &ext->engine_index))
> > +             return -EFAULT;
> > +
> > +     if (idx >= set->nengine)
> > +             return -EINVAL;
> > +
> > +     idx = array_index_nospec(idx, set->nengine);
> > +     if (set->engines[idx])
> > +             return -EEXIST;
> > +
> > +     err = check_user_mbz16(&ext->mbz16);
> > +     if (err)
> > +             return err;
> > +
> > +     err = check_user_mbz32(&ext->flags);
> > +     if (err)
> > +             return err;
> > +
> > +     for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
> > +             err = check_user_mbz64(&ext->mbz64[n]);
> > +             if (err)
> > +                     return err;
> > +     }
> > +
> > +     if (get_user(mask, &ext->engines_mask))
> > +             return -EFAULT;
> > +
> > +     mask &= GENMASK_ULL(set->nengine - 1, 0) & ~BIT_ULL(idx);
> > +     if (!mask)
> > +             return -EINVAL;
> > +
> > +     if (is_power_of_2(mask)) {
> > +             ve = set->engines[__ffs64(mask)];
> > +     } else {
> > +             struct intel_engine_cs *stack[64];
> > +             int bit;
> > +
> > +             n = 0;
> > +             for_each_set_bit(bit, (unsigned long *)&mask, set->nengine)
> > +                     stack[n++] = set->engines[bit];
> > +
> > +             ve = intel_execlists_create_virtual(set->ctx, stack, n);
> > +     }
> > +     if (IS_ERR(ve))
> > +             return PTR_ERR(ve);
> > +
> > +     if (cmpxchg(&set->engines[idx], NULL, ve)) {
> > +             intel_virtual_engine_destroy(ve);
> > +             return -EEXIST;
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> >   static const i915_user_extension_fn set_engines__extensions[] = {
> > +     [I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
> >   };
> >   
> >   static int
> > @@ -1426,13 +1546,13 @@ set_engines(struct i915_gem_context *ctx,
> >                                          ARRAY_SIZE(set_engines__extensions),
> >                                          &set);
> >       if (err) {
> > -             kfree(set.engines);
> > +             free_engines(set.engines, set.nengine);
> >               return err;
> >       }
> >   
> >   out:
> >       mutex_lock(&ctx->i915->drm.struct_mutex);
> > -     kfree(ctx->engines);
> > +     free_engines(ctx->engines, ctx->nengine);
> >       ctx->engines = set.engines;
> >       ctx->nengine = set.nengine;
> >       mutex_unlock(&ctx->i915->drm.struct_mutex);
> > @@ -1637,6 +1757,7 @@ static int clone_engines(struct i915_gem_context *dst,
> >                        struct i915_gem_context *src)
> >   {
> >       struct intel_engine_cs **engines;
> > +     int i;
> >   
> >       engines = kmemdup(src->engines,
> >                         sizeof(*src->engines) * src->nengine,
> > @@ -1644,6 +1765,30 @@ static int clone_engines(struct i915_gem_context *dst,
> >       if (!engines)
> >               return -ENOMEM;
> >   
> > +     /*
> > +      * Virtual engines are singletons; they can only exist
> > +      * inside a single context, because they embed their
> > +      * HW context... As each virtual context implies a single
> > +      * timeline (each engine can only dequeue a single request
> > +      * at any time), it would be surprising for two contexts
> > +      * to use the same engine. So let's create a copy of
> > +      * the virtual engine instead.
> > +      */
> > +     for (i = 0; i < src->nengine; i++) {
> > +             struct intel_engine_cs *engine = engines[i];
> > +
> > +             if (!intel_engine_is_virtual(engine))
> > +                     continue;
> > +
> > +             engine = intel_execlists_clone_virtual(dst, engine);
> > +             if (IS_ERR(engine)) {
> > +                     free_engines(engines, i);
> > +                     return PTR_ERR(engine);
> > +             }
> > +
> > +             engines[i] = engine;
> > +     }
> > +
> >       dst->engines = engines;
> >       dst->nengine = src->nengine;
> >       return 0;
> > diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> > index e0f609d01564..bb9819dbe313 100644
> > --- a/drivers/gpu/drm/i915/i915_scheduler.c
> > +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> > @@ -247,17 +247,25 @@ sched_lock_engine(const struct i915_sched_node *node,
> >                 struct intel_engine_cs *locked,
> >                 struct sched_cache *cache)
> >   {
> > -     struct intel_engine_cs *engine = node_to_request(node)->engine;
> > +     const struct i915_request *rq = node_to_request(node);
> > +     struct intel_engine_cs *engine;
> >   
> >       GEM_BUG_ON(!locked);
> >   
> > -     if (engine != locked) {
> > +     /*
> > +      * Virtual engines complicate acquiring the engine timeline lock,
> > +      * as their rq->engine pointer is not stable until under that
> > +      * engine lock. The simple ploy we use is to take the lock then
> > +      * check that the rq still belongs to the newly locked engine.
> > +      */
> > +     while (locked != (engine = READ_ONCE(rq->engine))) {
> >               spin_unlock(&locked->timeline.lock);
> >               memset(cache, 0, sizeof(*cache));
> >               spin_lock(&engine->timeline.lock);
> > +             locked = engine;
> >       }
> >   
> > -     return engine;
> > +     return locked;
> 
> engine == locked at this point, right?

Yess.
> 
> >   }
> >   
> >   static bool inflight(const struct i915_request *rq,
> > @@ -370,8 +378,11 @@ static void __i915_schedule(struct i915_request *rq,
> >               if (prio <= node->attr.priority || node_signaled(node))
> >                       continue;
> >   
> > +             GEM_BUG_ON(node_to_request(node)->engine != engine);
> > +
> >               node->attr.priority = prio;
> >               if (!list_empty(&node->link)) {
> > +                     GEM_BUG_ON(intel_engine_is_virtual(engine));
> >                       if (!cache.priolist)
> >                               cache.priolist =
> >                                       i915_sched_lookup_priolist(engine,
> > diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
> > index 8ff146dc05ba..5e445f145eb1 100644
> > --- a/drivers/gpu/drm/i915/i915_timeline_types.h
> > +++ b/drivers/gpu/drm/i915/i915_timeline_types.h
> > @@ -25,6 +25,7 @@ struct i915_timeline {
> >       spinlock_t lock;
> >   #define TIMELINE_CLIENT 0 /* default subclass */
> >   #define TIMELINE_ENGINE 1
> > +#define TIMELINE_VIRTUAL 2
> >       struct mutex mutex; /* protects the flow of requests */
> >   
> >       unsigned int pin_count;
> > diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> > index b0aa1f0d4e47..d54d2a1840cc 100644
> > --- a/drivers/gpu/drm/i915/intel_engine_types.h
> > +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> > @@ -216,6 +216,7 @@ struct intel_engine_execlists {
> >        * @queue: queue of requests, in priority lists
> >        */
> >       struct rb_root_cached queue;
> > +     struct rb_root_cached virtual;
> >   
> >       /**
> >        * @csb_write: control register for Context Switch buffer
> > @@ -421,6 +422,7 @@ struct intel_engine_cs {
> >   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
> >   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
> >   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> > +#define I915_ENGINE_IS_VIRTUAL       BIT(4)
> >       unsigned int flags;
> >   
> >       /*
> > @@ -504,6 +506,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
> >       return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
> >   }
> >   
> > +static inline bool
> > +intel_engine_is_virtual(const struct intel_engine_cs *engine)
> > +{
> > +     return engine->flags & I915_ENGINE_IS_VIRTUAL;
> > +}
> > +
> >   #define instdone_slice_mask(dev_priv__) \
> >       (IS_GEN(dev_priv__, 7) ? \
> >        1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> > diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> > index 7b938eaff9c5..0c97e8f30223 100644
> > --- a/drivers/gpu/drm/i915/intel_lrc.c
> > +++ b/drivers/gpu/drm/i915/intel_lrc.c
> > @@ -166,6 +166,28 @@
> >   
> >   #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
> >   
> > +struct virtual_engine {
> > +     struct intel_engine_cs base;
> > +
> > +     struct intel_context context;
> > +     struct kref kref;
> > +     struct rcu_head rcu;
> > +
> > +     struct i915_request *request;
> > +     struct ve_node {
> > +             struct rb_node rb;
> > +             int prio;
> > +     } nodes[I915_NUM_ENGINES];
> 
> Please comment the fields at list in the above block.

The request, and elements of someone else's priority tree? They really are
not that interesting.

> 
> > +
> > +     unsigned int count;
> > +     struct intel_engine_cs *siblings[0];
> > +};
> > +
> > +static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> > +{
> > +     return container_of(engine, struct virtual_engine, base);
> > +}
> > +
> >   static int execlists_context_deferred_alloc(struct intel_context *ce,
> >                                           struct intel_engine_cs *engine);
> >   static void execlists_init_reg_state(u32 *reg_state,
> > @@ -235,7 +257,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
> >   }
> >   
> >   static inline bool need_preempt(const struct intel_engine_cs *engine,
> > -                             const struct i915_request *rq)
> > +                             const struct i915_request *rq,
> > +                             struct rb_node *rb)
> >   {
> >       int last_prio;
> >   
> > @@ -270,6 +293,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
> >           rq_prio(list_next_entry(rq, link)) > last_prio)
> >               return true;
> >   
> > +     if (rb) { /* XXX virtual precedence */
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +             bool preempt = false;
> > +
> > +             if (engine == ve->siblings[0]) { /* only preempt one sibling */
> 
> Why always siblings[0] ?

It's the one most likely currently active; why do a preempt-to-idle cycle
on them all? We're explicitly making the engine idle in order to put our
request in the queue. Just feels like a waste -- the marginal cost that
sibling[0] has more important work than the veng and so missing the
opportunity to preempt sibling[1], meh.

Now, with preempt-to-busy, we may favour the alternate strategy, since
we will consume the preemption before our other siblings can evaluate
this request.

> > @@ -396,14 +437,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
> >   
> >               GEM_BUG_ON(rq->hw_context->active);
> >   
> > -             GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > -             if (rq_prio(rq) != prio) {
> > -                     prio = rq_prio(rq);
> > -                     pl = i915_sched_lookup_priolist(engine, prio);
> > -             }
> > -             GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> > +             owner = rq->hw_context->engine;
> > +             if (likely(owner == engine)) {
> > +                     GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> > +                     if (rq_prio(rq) != prio) {
> > +                             prio = rq_prio(rq);
> > +                             pl = i915_sched_lookup_priolist(engine, prio);
> > +                     }
> > +                     GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> > +
> > +                     list_add(&rq->sched.link, pl);
> > +             } else {
> > +                     if (__i915_request_has_started(rq))
> > +                             rq->sched.attr.priority |= ACTIVE_PRIORITY;
> >   
> > -             list_add(&rq->sched.link, pl);
> > +                     rq->engine = owner;
> > +                     owner->submit_request(rq);
> > +             }
> 
> What's happening here - put some comment in please.

We're unwinding the incomplete request. What's confusing? Previously we
open-coded the resubmit but obviously can't do that across a veng if we
want to be able to support switching across siblings.

> > +     for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +             struct i915_request *rq = READ_ONCE(ve->request);
> > +             struct intel_engine_cs *active;
> > +
> > +             if (!rq) { /* already taken by another sibling */
> > +                     rb_erase_cached(rb, &execlists->virtual);
> > +                     RB_CLEAR_NODE(rb);
> > +                     rb = rb_first_cached(&execlists->virtual);
> > +                     continue;
> > +             }
> 
> Probably a good place to comment how each physical engine sees all veng 
> requests and needs to unlink if someone else dequeued.
> 
> This relies on setting ve->request to NULL propagating to all CPUs as 
> soon as cleared I think. Becuase all tasklets will be under different 
> engine->timeline.lock, and the clear is under the VE->timeline.lock but 
> this isn't.
> 
> If one CPU does not see the clear, it will skip removing the entry from 
> the rbtree. But then it will take the VE->timeline.lock a bit down and 
> fix up. Okay I think.

Exactly. Optimistic reads under READ_ONCE; and restart if locked
confirmation fails.

> > +
> > +             active = READ_ONCE(ve->context.active);
> > +             if (active && active != engine) {
> > +                     rb = rb_next(rb);
> > +                     continue;
> > +             }
> 
> What's happening here (a comment please)?

This needs a comment? I thought this would be clear; if the context is
currently still residing another engine we can't modify it as it will be
overwritten by the context save.
> 
> > +
> > +             break;
> > +     }
> > +
> >       if (last) {
> >               /*
> >                * Don't resubmit or switch until all outstanding
> > @@ -718,7 +834,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
> >                       return;
> >   
> > -             if (need_preempt(engine, last)) {
> > +             if (need_preempt(engine, last, rb)) {
> >                       inject_preempt_context(engine);
> >                       return;
> >               }
> > @@ -758,6 +874,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
> >               last->tail = last->wa_tail;
> >       }
> >   
> > +     while (rb) { /* XXX virtual is always taking precedence */
> > +             struct virtual_engine *ve =
> > +                     rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> > +             struct i915_request *rq;
> > +
> > +             spin_lock(&ve->base.timeline.lock);
> 
> This is under the physical engine timeline lock, so isn't nested 
> allocation needed?

Hence why ve and engine have different lock subclasses.

> > +
> > +             rq = ve->request;
> > +             if (unlikely(!rq)) { /* lost the race to a sibling */
> > +                     spin_unlock(&ve->base.timeline.lock);
> > +                     rb_erase_cached(rb, &execlists->virtual);
> > +                     RB_CLEAR_NODE(rb);
> > +                     rb = rb_first_cached(&execlists->virtual);
> > +                     continue;
> > +             }
> > +
> > +             if (rq_prio(rq) >= queue_prio(execlists)) {
> > +                     if (last && !can_merge_rq(last, rq)) {
> > +                             spin_unlock(&ve->base.timeline.lock);
> > +                             return; /* leave this rq for another engine */
> > +                     }
> > +
> > +                     GEM_BUG_ON(rq->engine != &ve->base);
> > +                     ve->request = NULL;
> > +                     ve->base.execlists.queue_priority_hint = INT_MIN;
> 
> Why set to INT_MIN? Can't there be queued requests after this one?

No. We can reduce the overhead of veng by allowing multiple requests in
the pipeline, but it comes at the cost of load spreading and poor
utilisation with multiple clients. So we stick with the late greedy
scheme, and only decide where to put each request at the point it is
ready to run.

> > +             if (!RB_EMPTY_NODE(node))
> > +                     rb_erase_cached(node, &sibling->execlists.virtual);
> 
> There can only be one queued request?

Yes, Highlander.

> > +
> > +             spin_unlock_irq(&sibling->timeline.lock);
> > +     }
> > +     GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> 
> Why would this not fire since virtual_engine_free can get called from 
> set_param at any time?
> 
> > +
> > +     if (ve->context.state)
> > +             __execlists_context_fini(&ve->context);
> 
> And here why it can't be in use?

Because we take a ref while active.

> > +static void virtual_engine_initial_hint(struct virtual_engine *ve)
> > +{
> > +     int swp;
> > +
> > +     /*
> > +      * Pick a random sibling on starting to help spread the load around.
> > +      *
> > +      * New contexts are typically created with exactly the same order
> > +      * of siblings, and often started in batches. Due to the way we iterate
> > +      * the array of sibling when submitting requests, sibling[0] is
> > +      * prioritised for dequeuing. If we make sure that sibling[0] is fairly
> > +      * randomised across the system, we also help spread the load by the
> > +      * first engine we inspect being different each time.
> > +      *
> > +      * NB This does not force us to execute on this engine, it will just
> > +      * typically be the first we inspect for submission.
> > +      */
> > +     swp = prandom_u32_max(ve->count);
> > +     if (!swp)
> > +             return;
> 
> Was randon better than round robin? Although yeah, it is local to each 
> engine map so global rr or random pick would be a more complicated 
> implementation best left for later if needed.

It makes no difference at all. It just looks cool, and I felt the
comments about there being an implicit bias based on sibling index
useful, but overall that bias is lost in the noise.

> > +     local_irq_disable();
> > +     for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> > +             struct intel_engine_cs *sibling = ve->siblings[n];
> > +             struct ve_node * const node = &ve->nodes[sibling->id];
> > +             struct rb_node **parent, *rb;
> > +             bool first;
> > +
> > +             spin_lock(&sibling->timeline.lock);
> > +
> > +             if (!RB_EMPTY_NODE(&node->rb)) {
> > +                     first = rb_first_cached(&sibling->execlists.virtual) == &node->rb;
> > +                     if (prio == node->prio || (prio > node->prio && first))
> > +                             goto submit_engine;
> > +
> > +                     rb_erase_cached(&node->rb, &sibling->execlists.virtual);
> > +             }
> 
> What does this block do?

Try to avoid rebalancing the tree if we can reuse the slot from last
time as its engine hasn't seen that we've already run the previous
request to completion.

> > +
> > +             rb = NULL;
> > +             first = true;
> > +             parent = &sibling->execlists.virtual.rb_root.rb_node;
> > +             while (*parent) {
> > +                     struct ve_node *other;
> > +
> > +                     rb = *parent;
> > +                     other = rb_entry(rb, typeof(*other), rb);
> > +                     if (prio > other->prio) {
> > +                             parent = &rb->rb_left;
> > +                     } else {
> > +                             parent = &rb->rb_right;
> > +                             first = false;
> > +                     }
> > +             }
> > +
> > +             rb_link_node(&node->rb, rb, parent);
> > +             rb_insert_color_cached(&node->rb,
> > +                                    &sibling->execlists.virtual,
> > +                                    first);
> > +
> > +submit_engine:
> > +             GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> > +             node->prio = prio;
> > +             if (first && prio > sibling->execlists.queue_priority_hint) {
> > +                     sibling->execlists.queue_priority_hint = prio;
> > +                     tasklet_hi_schedule(&sibling->execlists.tasklet);
> > +             }
> > +
> > +             spin_unlock(&sibling->timeline.lock);
> > +     }
> > +     local_irq_enable();
> > +}
> > +
> > +static void virtual_submit_request(struct i915_request *request)
> > +{
> > +     struct virtual_engine *ve = to_virtual_engine(request->engine);
> > +
> > +     GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
> > +
> > +     GEM_BUG_ON(ve->request);
> > +     ve->base.execlists.queue_priority_hint = rq_prio(request);
> > +     WRITE_ONCE(ve->request, request);
> > +
> > +     tasklet_schedule(&ve->base.execlists.tasklet);
> 
> Not tasklet_hi_schedule like we otherwise do?

I didn't feel like this was as high priority. But it will prevent the
interrupt from marking it as high priority, so probably not wise after
all.

> > +     for (n = 0; n < count; n++) {
> > +             struct intel_engine_cs *sibling = siblings[n];
> > +
> > +             GEM_BUG_ON(!is_power_of_2(sibling->mask));
> > +             if (sibling->mask & ve->base.mask)
> > +                     continue;
> 
> Eliminate duplicates? Need or want?

Eliminating duplicates because it has no impact on load balancing
decisions and just adds to the arrays.

Having 2 vcs0 and 1 vcs1 does not make selecting vcs0 twice as likely.
Which engine is used is always on a first come, first served basis
(discounting the same hystersis for pinned context images).

> > +
> > +             if (sibling->execlists.tasklet.func != execlists_submission_tasklet) {
> > +                     err = -ENODEV; > +                      goto err_put;
> 
> Is this intended to prevent making virtual engines from virtual engines 
> or more? Would it make sense to put an explicit is_virtual check for 
> clarity? (Since in the extension processing it doesn't check - or maybe 
> it would be good to check there early?)

It really depends on the lock subclass.

This is the extension processing. This is where we validate the
combination of engines chosen make sense...

> > +             }
> > +
> > +             GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
> > +             RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
> > +
> > +             ve->siblings[ve->count++] = sibling;
> > +             ve->base.mask |= sibling->mask;
> > +
> 
> /* Allow only same engine class. */
> 
> > +             if (ve->base.class != OTHER_CLASS) {
> > +                     if (ve->base.class != sibling->class) {
> > +                             err = -EINVAL;
> > +                             goto err_put;
> > +                     }
> > +                     continue;
> > +             }
> > +
> > +             ve->base.class = sibling->class;
> > +             snprintf(ve->base.name, sizeof(ve->base.name),
> > +                      "v%dx%d", ve->base.class, count);
> 
> Do we want to go for unique names? Like instead of count have an unique 
> monotonically increasing counter at the end? Or maybe 
> virt<unique>:<class>:<sibling-count>? Might need increasing the engine 
> name buffer.

Not really, as at the moment, the veng name only escapes via the
selftests. Especially not a meaningless uuid, when you have the ctx
name/pid around.

We could add it to debugfs/contexts_info, but there it should be clear
each is limited to a context.

So I haven't a clear use for a better name, and propose to not worry
until we have something to disambiguate.

> > +             ve->base.context_size = sibling->context_size;
> > +
> > +             ve->base.emit_bb_start = sibling->emit_bb_start;
> > +             ve->base.emit_flush = sibling->emit_flush;
> > +             ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> > +             ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
> > +             ve->base.emit_fini_breadcrumb_dw =
> > +                     sibling->emit_fini_breadcrumb_dw;
> > +     }
> > +
> > +     /* gracefully replace a degenerate virtual engine */
> > +     if (is_power_of_2(ve->base.mask)) {
> > +             struct intel_engine_cs *actual = ve->siblings[0];
> > +             virtual_engine_free(&ve->kref);
> > +             return actual;
> > +     }
> 
> Why the term degenerate?

"In mathematics, a degenerate case is a limiting case in which an element
of a class of objects is qualitatively different from the rest of the
class and hence belongs to another, usually simpler, class."

The question though is whether such a replacement is truly transparent.
And if it isn't, then we can't. (It would be a pity as veng add
substantially to the overhead in request processing).

Per-ctx-engine settings (such as sseu) probably should allow
differentiation between veng(rcs0) and rcs0. If that is desired, I think
it would be simpler to allow each instance of rcs0 in the engines[] to
have their own logical state (struct intel_context). Hmm.

> Also, is it possible at this stage? Higher level code will avoid 
> creating a veng with only one engine.

We have more complete checks here.

> > +struct i915_context_engines_load_balance {
> > +     struct i915_user_extension base;
> > +
> > +     __u16 engine_index;
> 
> Missing doc here.
> 
> Any need for mbz16 or could make the index u32?

You want how many engines! per veng! Nightmare scaling. And almost
certainly needs numa-awareness, like a true scheduler.

I was thinking the spare bits could come in handy later, and u8 requires
more mbz slots.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 11/13] drm/i915: Extend execution fence to support a callback
  2019-03-11 13:09   ` Tvrtko Ursulin
@ 2019-03-11 14:22     ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 14:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 13:09:11)
> 
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > +int
> > +i915_request_await_execution(struct i915_request *rq,
> > +                          struct dma_fence *fence,
> > +                          void (*hook)(struct i915_request *rq,
> > +                                       struct dma_fence *signal))
> > +{
> > +     struct dma_fence **child = &fence;
> > +     unsigned int nchild = 1;
> > +     int ret;
> > +
> > +     if (dma_fence_is_array(fence)) {
> > +             struct dma_fence_array *array = to_dma_fence_array(fence);
> > +
> > +             /* XXX Error for signal-on-any fence arrays */
> 
> Unfinished?

Patches to identify signal-on-any fence arrays are still in the aether.
It doesn't really matter since they can't reach us via sync-file yet.
But they exist in the dma-fence landscape, so I keep on copying this
warning around.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 12/13] drm/i915/execlists: Virtual engine bonding
  2019-03-11 13:38   ` Tvrtko Ursulin
@ 2019-03-11 14:30     ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 14:30 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 13:38:52)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > @@ -3191,12 +3198,30 @@ static void virtual_submission_tasklet(unsigned long data)
> >               return;
> >   
> >       local_irq_disable();
> > +
> > +     mask = 0;
> > +     spin_lock(&ve->base.timeline.lock);
> > +     if (ve->request)
> > +             mask = ve->request->execution_mask;
> > +     spin_unlock(&ve->base.timeline.lock);
> > +
> >       for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> >               struct intel_engine_cs *sibling = ve->siblings[n];
> >               struct ve_node * const node = &ve->nodes[sibling->id];
> >               struct rb_node **parent, *rb;
> >               bool first;
> > 
> 
> /* Check if request can be executed on this sibling. Because it cannot 
> be known in advance (at insert to queue time).. ? */

This is insert-to-queue. You mean i915_request_add()? You definitely
don't know which engines all of your partners will be running on at that
point.

> > +int intel_virtual_engine_attach_bond(struct intel_engine_cs *engine,
> > +                                  struct intel_engine_cs *master,
> > +                                  unsigned long mask)
> > +{
> > +     struct virtual_engine *ve = to_virtual_engine(engine);
> > +     struct ve_bond *bond;
> > +
> > +     if (mask >> ve->count)
> > +             return -EINVAL;
> > +
> > +     mask = virtual_execution_mask(ve, mask);
> > +     if (!mask)
> > +             return -EINVAL;
> > +
> > +     bond = virtual_find_bond(ve, master);
> > +     if (bond) {
> > +             bond->sibling_mask |= mask;
> > +             return 0;
> > +     }
> > +
> > +     bond = krealloc(ve->bonds,
> > +                     sizeof(*bond) * (ve->nbond + 1),
> > +                     GFP_KERNEL);
> > +     if (!bond)
> > +             return -ENOMEM;
> > +
> > +     bond[ve->nbond].master = master;
> > +     bond[ve->nbond].sibling_mask = mask;
> > +
> > +     ve->bonds = bond;
> > +     ve->nbond++;
> > +
> > +     return 0;
> > +}
> > +
> >   void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
> >   {
> 
> I think there should be a hunk in here to free ve->bonds.

Might be sensible indeed.

> > +     err = 0;
> > +     rq[0] = ERR_PTR(-ENOMEM);
> > +     for_each_engine(master, i915, id) {
> > +             struct i915_sw_fence fence;
> > +
> > +             if (master->class == class)
> > +                     continue;
> > +
> > +             rq[0] = i915_request_alloc(master, ctx);
> > +             if (IS_ERR(rq[0])) {
> > +                     err = PTR_ERR(rq[0]);
> > +                     goto out;
> > +             }
> > +
> > +             if (flags & BOND_SCHEDULE)
> > +                     onstack_fence_init(&fence);
> > +
> > +             i915_request_get(rq[0]);
> > +             i915_request_add(rq[0]);
> > +
> > +             for (n = 0; n < nsibling; n++) {
> > +                     struct intel_engine_cs *engine;
> > +
> > +                     engine = intel_execlists_create_virtual(ctx,
> > +                                                             siblings,
> > +                                                             nsibling);
> > +                     if (IS_ERR(engine)) {
> > +                             err = PTR_ERR(engine);
> > +                             goto out;
> > +                     }
> > +
> > +                     err = intel_virtual_engine_attach_bond(engine,
> > +                                                            master,
> > +                                                            BIT(n));
> > +                     if (err) {
> > +                             intel_virtual_engine_destroy(engine);
> > +                             goto out;
> > +                     }
> > +
> > +                     rq[n + 1] = i915_request_alloc(engine, ctx);
> > +                     if (IS_ERR(rq[n + 1])) {
> > +                             err = PTR_ERR(rq[n + 1]);
> > +                             intel_virtual_engine_destroy(engine);
> > +                             goto out;
> > +                     }
> > +                     i915_request_get(rq[n + 1]);
> > +
> > +                     err = i915_request_await_execution(rq[n + 1],
> > +                                                        &rq[0]->fence,
> > +                                                        engine->bond_execute);
> > +                     i915_request_add(rq[n + 1]);
> > +                     intel_virtual_engine_destroy(engine);
> > +                     if (err < 0)
> > +                             goto out;
> > +             }
> > +             rq[n + 1] = ERR_PTR(-EINVAL);
> > +
> > +             if (flags & BOND_SCHEDULE)
> > +                     onstack_fence_fini(&fence);
> 
> The idea of this fence is to delay rq[0]? But I don't see it passed in 
> anywhere? I wanted to suggest to check before onstack_fence_fini that 
> rq[0], or any other request, haven't been submitted/completed yet.

Aye. It's meant to be added just after onstack_fence_init().

> > +/*
> > + * i915_context_engines_bond:
> > + *
> > + */
> > +struct i915_context_engines_bond {
> > +     struct i915_user_extension base;
> > +
> > +     __u16 engine_index;
> > +     __u16 mbz;
> > +
> > +     __u16 master_class;
> > +     __u16 master_instance;
> > +
> > +     __u64 sibling_mask;
> > +     __u64 flags; /* all undefined flags must be zero */
> > +};
> 
> Need some api doc for it all.

I knew there was a patch that I kept skipping.

> We are not in danger of any compiler added padding due not leaving any 
> holes, right? So no need to define this as packed?

This is naturally aligned to u64, and that's the largest type used, so
no holes and no padding, and no silly games with variable length arrays.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11  9:45         ` Chris Wilson
  2019-03-11 10:12           ` Tvrtko Ursulin
@ 2019-03-11 14:45           ` Chris Wilson
  2019-03-11 16:16             ` Tvrtko Ursulin
  1 sibling, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 14:45 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Chris Wilson (2019-03-11 09:45:17)
> Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
> > 
> > On 08/03/2019 16:47, Chris Wilson wrote:
> > > Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
> > >>
> > >> On 08/03/2019 14:12, Chris Wilson wrote:
> > >>> +static int
> > >>> +set_engines(struct i915_gem_context *ctx,
> > >>> +         const struct drm_i915_gem_context_param *args)
> > >>> +{
> > >>> +     struct i915_context_param_engines __user *user;
> > >>> +     struct set_engines set = { .ctx = ctx };
> > >>> +     u64 size, extensions;
> > >>> +     unsigned int n;
> > >>> +     int err;
> > >>> +
> > >>> +     user = u64_to_user_ptr(args->value);
> > >>> +     size = args->size;
> > >>> +     if (!size)
> > >>> +             goto out;
> > >>
> > >> This prevents a hypothetical extension with empty map data.
> > > 
> > > No... This is required for resetting and I think that's covered in what
> > > little docs there are. It's the set.nengine==0 test later
> > > that you mean to object to. But we can't do that as that's how we
> > > differentiate between modes at the moment.
> > > 
> > > We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
> > 
> > size == sizeof(struct i915_context_param_engines) could mean reset - 
> > meaning no map array provided.
> 
> Nah, size=sizeof() => 0 [], size=0 => default map.
>  
> > Meaning one could reset the map and still pass in extensions.
> 
> I missed that you were pointing out we didn't follow the extensions on
> resetting.
> 
> I'm not sure if that makes sense tbh. The extensions are written around
> the concept of applying to the new engines[], and if the use has
> explicitly removed the engines[] (distinct from defining a zero array),
> what extensions can apply? One hopes they end up -EINVAL. As they should
> -EINVAL, I guess it is no harm done to apply them.

Ok, the problem with the size=0 case is that quite obviously there is no
extension chain to follow. (That was silly of me.)

I think
	.size = 0 => reset to default
and
	.size = sizeof(arg) => 0 engines ([])
	.size = sizeof(arg) + N*sizeof(*class_instance) => N engines ([N])
make the most logical sense, which does imply that if you want to apply
extension options to ctx->engines[] you need to specify them.

And that also implies that if we have an extension that may make sense
to the default setup, then either we say creating the engine[] map is
compulsory, or we don't use a set-engines extension for that.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11 14:45           ` Chris Wilson
@ 2019-03-11 16:16             ` Tvrtko Ursulin
  2019-03-11 16:22               ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 16:16 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/03/2019 14:45, Chris Wilson wrote:
> Quoting Chris Wilson (2019-03-11 09:45:17)
>> Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
>>>
>>> On 08/03/2019 16:47, Chris Wilson wrote:
>>>> Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
>>>>>
>>>>> On 08/03/2019 14:12, Chris Wilson wrote:
>>>>>> +static int
>>>>>> +set_engines(struct i915_gem_context *ctx,
>>>>>> +         const struct drm_i915_gem_context_param *args)
>>>>>> +{
>>>>>> +     struct i915_context_param_engines __user *user;
>>>>>> +     struct set_engines set = { .ctx = ctx };
>>>>>> +     u64 size, extensions;
>>>>>> +     unsigned int n;
>>>>>> +     int err;
>>>>>> +
>>>>>> +     user = u64_to_user_ptr(args->value);
>>>>>> +     size = args->size;
>>>>>> +     if (!size)
>>>>>> +             goto out;
>>>>>
>>>>> This prevents a hypothetical extension with empty map data.
>>>>
>>>> No... This is required for resetting and I think that's covered in what
>>>> little docs there are. It's the set.nengine==0 test later
>>>> that you mean to object to. But we can't do that as that's how we
>>>> differentiate between modes at the moment.
>>>>
>>>> We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
>>>
>>> size == sizeof(struct i915_context_param_engines) could mean reset -
>>> meaning no map array provided.
>>
>> Nah, size=sizeof() => 0 [], size=0 => default map.
>>   
>>> Meaning one could reset the map and still pass in extensions.
>>
>> I missed that you were pointing out we didn't follow the extensions on
>> resetting.
>>
>> I'm not sure if that makes sense tbh. The extensions are written around
>> the concept of applying to the new engines[], and if the use has
>> explicitly removed the engines[] (distinct from defining a zero array),
>> what extensions can apply? One hopes they end up -EINVAL. As they should
>> -EINVAL, I guess it is no harm done to apply them.
> 
> Ok, the problem with the size=0 case is that quite obviously there is no
> extension chain to follow. (That was silly of me.)
> 
> I think
> 	.size = 0 => reset to default
> and
> 	.size = sizeof(arg) => 0 engines ([])

What is the difference between these two?

> 	.size = sizeof(arg) + N*sizeof(*class_instance) => N engines ([N])
> make the most logical sense, which does imply that if you want to apply
> extension options to ctx->engines[] you need to specify them.
> 
> And that also implies that if we have an extension that may make sense
> to the default setup, then either we say creating the engine[] map is
> compulsory, or we don't use a set-engines extension for that.

Hm.. load_balance is extension of set_engines. If we wanted to go crazy 
we could also support it directly from set_param(param=LOAD_BALANCE)?

I am not saying it makes sense, just thinking out loud.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11 16:16             ` Tvrtko Ursulin
@ 2019-03-11 16:22               ` Chris Wilson
  2019-03-11 16:34                 ` Tvrtko Ursulin
  0 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 16:22 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 16:16:27)
> 
> On 11/03/2019 14:45, Chris Wilson wrote:
> > Quoting Chris Wilson (2019-03-11 09:45:17)
> >> Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
> >>>
> >>> On 08/03/2019 16:47, Chris Wilson wrote:
> >>>> Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
> >>>>>
> >>>>> On 08/03/2019 14:12, Chris Wilson wrote:
> >>>>>> +static int
> >>>>>> +set_engines(struct i915_gem_context *ctx,
> >>>>>> +         const struct drm_i915_gem_context_param *args)
> >>>>>> +{
> >>>>>> +     struct i915_context_param_engines __user *user;
> >>>>>> +     struct set_engines set = { .ctx = ctx };
> >>>>>> +     u64 size, extensions;
> >>>>>> +     unsigned int n;
> >>>>>> +     int err;
> >>>>>> +
> >>>>>> +     user = u64_to_user_ptr(args->value);
> >>>>>> +     size = args->size;
> >>>>>> +     if (!size)
> >>>>>> +             goto out;
> >>>>>
> >>>>> This prevents a hypothetical extension with empty map data.
> >>>>
> >>>> No... This is required for resetting and I think that's covered in what
> >>>> little docs there are. It's the set.nengine==0 test later
> >>>> that you mean to object to. But we can't do that as that's how we
> >>>> differentiate between modes at the moment.
> >>>>
> >>>> We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
> >>>
> >>> size == sizeof(struct i915_context_param_engines) could mean reset -
> >>> meaning no map array provided.
> >>
> >> Nah, size=sizeof() => 0 [], size=0 => default map.
> >>   
> >>> Meaning one could reset the map and still pass in extensions.
> >>
> >> I missed that you were pointing out we didn't follow the extensions on
> >> resetting.
> >>
> >> I'm not sure if that makes sense tbh. The extensions are written around
> >> the concept of applying to the new engines[], and if the use has
> >> explicitly removed the engines[] (distinct from defining a zero array),
> >> what extensions can apply? One hopes they end up -EINVAL. As they should
> >> -EINVAL, I guess it is no harm done to apply them.
> > 
> > Ok, the problem with the size=0 case is that quite obviously there is no
> > extension chain to follow. (That was silly of me.)
> > 
> > I think
> >       .size = 0 => reset to default
> > and
> >       .size = sizeof(arg) => 0 engines ([])
> 
> What is the difference between these two?

One uses the legacy ring mask, and the other is a context with no
engines.

> >       .size = sizeof(arg) + N*sizeof(*class_instance) => N engines ([N])
> > make the most logical sense, which does imply that if you want to apply
> > extension options to ctx->engines[] you need to specify them.
> > 
> > And that also implies that if we have an extension that may make sense
> > to the default setup, then either we say creating the engine[] map is
> > compulsory, or we don't use a set-engines extension for that.
> 
> Hm.. load_balance is extension of set_engines. If we wanted to go crazy 
> we could also support it directly from set_param(param=LOAD_BALANCE)?

My thinking was that it only made sense as a constructor property. (Back
before we hooked set-parameter for constructor properties). I still like
how set-engines + extensions has turned out. (But haven't coded up the
alternatives, so who knows.)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11 16:22               ` Chris Wilson
@ 2019-03-11 16:34                 ` Tvrtko Ursulin
  2019-03-11 16:52                   ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-11 16:34 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 11/03/2019 16:22, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-11 16:16:27)
>>
>> On 11/03/2019 14:45, Chris Wilson wrote:
>>> Quoting Chris Wilson (2019-03-11 09:45:17)
>>>> Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
>>>>>
>>>>> On 08/03/2019 16:47, Chris Wilson wrote:
>>>>>> Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
>>>>>>>
>>>>>>> On 08/03/2019 14:12, Chris Wilson wrote:
>>>>>>>> +static int
>>>>>>>> +set_engines(struct i915_gem_context *ctx,
>>>>>>>> +         const struct drm_i915_gem_context_param *args)
>>>>>>>> +{
>>>>>>>> +     struct i915_context_param_engines __user *user;
>>>>>>>> +     struct set_engines set = { .ctx = ctx };
>>>>>>>> +     u64 size, extensions;
>>>>>>>> +     unsigned int n;
>>>>>>>> +     int err;
>>>>>>>> +
>>>>>>>> +     user = u64_to_user_ptr(args->value);
>>>>>>>> +     size = args->size;
>>>>>>>> +     if (!size)
>>>>>>>> +             goto out;
>>>>>>>
>>>>>>> This prevents a hypothetical extension with empty map data.
>>>>>>
>>>>>> No... This is required for resetting and I think that's covered in what
>>>>>> little docs there are. It's the set.nengine==0 test later
>>>>>> that you mean to object to. But we can't do that as that's how we
>>>>>> differentiate between modes at the moment.
>>>>>>
>>>>>> We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
>>>>>
>>>>> size == sizeof(struct i915_context_param_engines) could mean reset -
>>>>> meaning no map array provided.
>>>>
>>>> Nah, size=sizeof() => 0 [], size=0 => default map.
>>>>    
>>>>> Meaning one could reset the map and still pass in extensions.
>>>>
>>>> I missed that you were pointing out we didn't follow the extensions on
>>>> resetting.
>>>>
>>>> I'm not sure if that makes sense tbh. The extensions are written around
>>>> the concept of applying to the new engines[], and if the use has
>>>> explicitly removed the engines[] (distinct from defining a zero array),
>>>> what extensions can apply? One hopes they end up -EINVAL. As they should
>>>> -EINVAL, I guess it is no harm done to apply them.
>>>
>>> Ok, the problem with the size=0 case is that quite obviously there is no
>>> extension chain to follow. (That was silly of me.)
>>>
>>> I think
>>>        .size = 0 => reset to default
>>> and
>>>        .size = sizeof(arg) => 0 engines ([])
>>
>> What is the difference between these two?
> 
> One uses the legacy ring mask, and the other is a context with no
> engines.

The latter is hypothetical, no? Current patches don't have this ability 
AFAIR. Why would we want this?

>>>        .size = sizeof(arg) + N*sizeof(*class_instance) => N engines ([N])
>>> make the most logical sense, which does imply that if you want to apply
>>> extension options to ctx->engines[] you need to specify them.
>>>
>>> And that also implies that if we have an extension that may make sense
>>> to the default setup, then either we say creating the engine[] map is
>>> compulsory, or we don't use a set-engines extension for that.
>>
>> Hm.. load_balance is extension of set_engines. If we wanted to go crazy
>> we could also support it directly from set_param(param=LOAD_BALANCE)?
> 
> My thinking was that it only made sense as a constructor property. (Back
> before we hooked set-parameter for constructor properties). I still like
> how set-engines + extensions has turned out. (But haven't coded up the
> alternatives, so who knows.)

Everyone if fine with it so lets not change it much. :)

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 08/13] drm/i915: Allow a context to define its set of engines
  2019-03-11 16:34                 ` Tvrtko Ursulin
@ 2019-03-11 16:52                   ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-11 16:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-11 16:34:41)
> 
> On 11/03/2019 16:22, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-11 16:16:27)
> >>
> >> On 11/03/2019 14:45, Chris Wilson wrote:
> >>> Quoting Chris Wilson (2019-03-11 09:45:17)
> >>>> Quoting Tvrtko Ursulin (2019-03-11 09:23:44)
> >>>>>
> >>>>> On 08/03/2019 16:47, Chris Wilson wrote:
> >>>>>> Quoting Tvrtko Ursulin (2019-03-08 16:27:22)
> >>>>>>>
> >>>>>>> On 08/03/2019 14:12, Chris Wilson wrote:
> >>>>>>>> +static int
> >>>>>>>> +set_engines(struct i915_gem_context *ctx,
> >>>>>>>> +         const struct drm_i915_gem_context_param *args)
> >>>>>>>> +{
> >>>>>>>> +     struct i915_context_param_engines __user *user;
> >>>>>>>> +     struct set_engines set = { .ctx = ctx };
> >>>>>>>> +     u64 size, extensions;
> >>>>>>>> +     unsigned int n;
> >>>>>>>> +     int err;
> >>>>>>>> +
> >>>>>>>> +     user = u64_to_user_ptr(args->value);
> >>>>>>>> +     size = args->size;
> >>>>>>>> +     if (!size)
> >>>>>>>> +             goto out;
> >>>>>>>
> >>>>>>> This prevents a hypothetical extension with empty map data.
> >>>>>>
> >>>>>> No... This is required for resetting and I think that's covered in what
> >>>>>> little docs there are. It's the set.nengine==0 test later
> >>>>>> that you mean to object to. But we can't do that as that's how we
> >>>>>> differentiate between modes at the moment.
> >>>>>>
> >>>>>> We could use ctx->nengine = 0 and ctx->engines = ZERO_PTR.
> >>>>>
> >>>>> size == sizeof(struct i915_context_param_engines) could mean reset -
> >>>>> meaning no map array provided.
> >>>>
> >>>> Nah, size=sizeof() => 0 [], size=0 => default map.
> >>>>    
> >>>>> Meaning one could reset the map and still pass in extensions.
> >>>>
> >>>> I missed that you were pointing out we didn't follow the extensions on
> >>>> resetting.
> >>>>
> >>>> I'm not sure if that makes sense tbh. The extensions are written around
> >>>> the concept of applying to the new engines[], and if the use has
> >>>> explicitly removed the engines[] (distinct from defining a zero array),
> >>>> what extensions can apply? One hopes they end up -EINVAL. As they should
> >>>> -EINVAL, I guess it is no harm done to apply them.
> >>>
> >>> Ok, the problem with the size=0 case is that quite obviously there is no
> >>> extension chain to follow. (That was silly of me.)
> >>>
> >>> I think
> >>>        .size = 0 => reset to default
> >>> and
> >>>        .size = sizeof(arg) => 0 engines ([])
> >>
> >> What is the difference between these two?
> > 
> > One uses the legacy ring mask, and the other is a context with no
> > engines.
> 
> The latter is hypothetical, no? Current patches don't have this ability 
> AFAIR. Why would we want this?

They did the moment you asked for it. Do you not recall? :-p
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/13] drm/i915: Load balancing across a virtual engine
  2019-03-08 14:12 ` [PATCH 10/13] drm/i915: Load balancing across a virtual engine Chris Wilson
  2019-03-11 12:47   ` Tvrtko Ursulin
@ 2019-03-12  7:52   ` Tvrtko Ursulin
  2019-03-12  8:56     ` Chris Wilson
  1 sibling, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-12  7:52 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 08/03/2019 14:12, Chris Wilson wrote:
> Having allowed the user to define a set of engines that they will want
> to only use, we go one step further and allow them to bind those engines
> into a single virtual instance. Submitting a batch to the virtual engine
> will then forward it to any one of the set in a manner as best to
> distribute load.  The virtual engine has a single timeline across all
> engines (it operates as a single queue), so it is not able to concurrently
> run batches across multiple engines by itself; that is left up to the user
> to submit multiple concurrent batches to multiple queues. Multiple users
> will be load balanced across the system.
> 
> The mechanism used for load balancing in this patch is a late greedy
> balancer. When a request is ready for execution, it is added to each
> engine's queue, and when an engine is ready for its next request it
> claims it from the virtual engine. The first engine to do so, wins, i.e.
> the request is executed at the earliest opportunity (idle moment) in the
> system.
> 
> As not all HW is created equal, the user is still able to skip the
> virtual engine and execute the batch on a specific engine, all within the
> same queue. It will then be executed in order on the correct engine,
> with execution on other virtual engines being moved away due to the load
> detection.
> 
> A couple of areas for potential improvement left!
> 
> - The virtual engine always take priority over equal-priority tasks.
> Mostly broken up by applying FQ_CODEL rules for prioritising new clients,
> and hopefully the virtual and real engines are not then congested (i.e.
> all work is via virtual engines, or all work is to the real engine).
> 
> - We require the breadcrumb irq around every virtual engine request. For
> normal engines, we eliminate the need for the slow round trip via
> interrupt by using the submit fence and queueing in order. For virtual
> engines, we have to allow any job to transfer to a new ring, and cannot
> coalesce the submissions, so require the completion fence instead,
> forcing the persistent use of interrupts.
> 
> - We only drip feed single requests through each virtual engine and onto
> the physical engines, even if there was enough work to fill all ELSP,
> leaving small stalls with an idle CS event at the end of every request.
> Could we be greedy and fill both slots? Being lazy is virtuous for load
> distribution on less-than-full workloads though.
> 
> Other areas of improvement are more general, such as reducing lock
> contention, reducing dispatch overhead, looking at direct submission
> rather than bouncing around tasklets etc.
> 
> Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_gem.h            |   5 +
>   drivers/gpu/drm/i915/i915_gem_context.c    | 153 +++++-
>   drivers/gpu/drm/i915/i915_scheduler.c      |  17 +-
>   drivers/gpu/drm/i915/i915_timeline_types.h |   1 +
>   drivers/gpu/drm/i915/intel_engine_types.h  |   8 +
>   drivers/gpu/drm/i915/intel_lrc.c           | 521 ++++++++++++++++++++-
>   drivers/gpu/drm/i915/intel_lrc.h           |  11 +
>   drivers/gpu/drm/i915/selftests/intel_lrc.c | 165 +++++++
>   include/uapi/drm/i915_drm.h                |  30 ++
>   9 files changed, 895 insertions(+), 16 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem.h b/drivers/gpu/drm/i915/i915_gem.h
> index 74a2ddc1b52f..dbcea6e29d48 100644
> --- a/drivers/gpu/drm/i915/i915_gem.h
> +++ b/drivers/gpu/drm/i915/i915_gem.h
> @@ -91,4 +91,9 @@ static inline bool __tasklet_is_enabled(const struct tasklet_struct *t)
>   	return !atomic_read(&t->count);
>   }
>   
> +static inline bool __tasklet_is_scheduled(struct tasklet_struct *t)
> +{
> +	return test_bit(TASKLET_STATE_SCHED, &t->state);
> +}
> +
>   #endif /* __I915_GEM_H__ */
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index a581c01ffff1..13b79980f7f3 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -86,12 +86,16 @@
>    */
>   
>   #include <linux/log2.h>
> +#include <linux/nospec.h>
> +
>   #include <drm/i915_drm.h>
> +
>   #include "i915_drv.h"
>   #include "i915_globals.h"
>   #include "i915_trace.h"
>   #include "i915_user_extensions.h"
>   #include "intel_lrc_reg.h"
> +#include "intel_lrc.h"
>   #include "intel_workarounds.h"
>   
>   #define ALL_L3_SLICES(dev) (1 << NUM_L3_SLICES(dev)) - 1
> @@ -238,6 +242,20 @@ static void release_hw_id(struct i915_gem_context *ctx)
>   	mutex_unlock(&i915->contexts.mutex);
>   }
>   
> +static void free_engines(struct intel_engine_cs **engines, int count)
> +{
> +	int i;
> +
> +	if (!engines)
> +		return;
> +
> +	/* We own the veng we created; regular engines are ignored */
> +	for (i = 0; i < count; i++)
> +		intel_virtual_engine_destroy(engines[i]);
> +
> +	kfree(engines);
> +}
> +
>   static void i915_gem_context_free(struct i915_gem_context *ctx)
>   {
>   	struct intel_context *it, *n;
> @@ -248,8 +266,7 @@ static void i915_gem_context_free(struct i915_gem_context *ctx)
>   
>   	release_hw_id(ctx);
>   	i915_ppgtt_put(ctx->ppgtt);
> -
> -	kfree(ctx->engines);
> +	free_engines(ctx->engines, ctx->nengine);
>   
>   	rbtree_postorder_for_each_entry_safe(it, n, &ctx->hw_contexts, node)
>   		it->ops->destroy(it);
> @@ -1359,13 +1376,116 @@ static int set_sseu(struct i915_gem_context *ctx,
>   	return 0;
>   };
>   
> +static int check_user_mbz16(u16 __user *user)
> +{
> +	u16 mbz;
> +
> +	if (get_user(mbz, user))
> +		return -EFAULT;
> +
> +	return mbz ? -EINVAL : 0;
> +}
> +
> +static int check_user_mbz32(u32 __user *user)
> +{
> +	u32 mbz;
> +
> +	if (get_user(mbz, user))
> +		return -EFAULT;
> +
> +	return mbz ? -EINVAL : 0;
> +}
> +
> +static int check_user_mbz64(u64 __user *user)
> +{
> +	u64 mbz;
> +
> +	if (get_user(mbz, user))
> +		return -EFAULT;
> +
> +	return mbz ? -EINVAL : 0;
> +}
> +
>   struct set_engines {
>   	struct i915_gem_context *ctx;
>   	struct intel_engine_cs **engines;
>   	unsigned int nengine;
>   };
>   
> +static int
> +set_engines__load_balance(struct i915_user_extension __user *base, void *data)
> +{
> +	struct i915_context_engines_load_balance __user *ext =
> +		container_of_user(base, typeof(*ext), base);
> +	const struct set_engines *set = data;
> +	struct intel_engine_cs *ve;
> +	unsigned int n;
> +	u64 mask;
> +	u16 idx;
> +	int err;
> +
> +	if (!HAS_EXECLISTS(set->ctx->i915))
> +		return -ENODEV;
> +
> +	if (USES_GUC_SUBMISSION(set->ctx->i915))
> +		return -ENODEV; /* not implement yet */

Didn't it used to be that you were checking for single timeline flag 
somewhere around here? Now you allow multi-timeline map with a virtual 
engine slot?

Regards,

Tvrtko

> +
> +	if (get_user(idx, &ext->engine_index))
> +		return -EFAULT;
> +
> +	if (idx >= set->nengine)
> +		return -EINVAL;
> +
> +	idx = array_index_nospec(idx, set->nengine);
> +	if (set->engines[idx])
> +		return -EEXIST;
> +
> +	err = check_user_mbz16(&ext->mbz16);
> +	if (err)
> +		return err;
> +
> +	err = check_user_mbz32(&ext->flags);
> +	if (err)
> +		return err;
> +
> +	for (n = 0; n < ARRAY_SIZE(ext->mbz64); n++) {
> +		err = check_user_mbz64(&ext->mbz64[n]);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (get_user(mask, &ext->engines_mask))
> +		return -EFAULT;
> +
> +	mask &= GENMASK_ULL(set->nengine - 1, 0) & ~BIT_ULL(idx);
> +	if (!mask)
> +		return -EINVAL;
> +
> +	if (is_power_of_2(mask)) {
> +		ve = set->engines[__ffs64(mask)];
> +	} else {
> +		struct intel_engine_cs *stack[64];
> +		int bit;
> +
> +		n = 0;
> +		for_each_set_bit(bit, (unsigned long *)&mask, set->nengine)
> +			stack[n++] = set->engines[bit];
> +
> +		ve = intel_execlists_create_virtual(set->ctx, stack, n);
> +	}
> +	if (IS_ERR(ve))
> +		return PTR_ERR(ve);
> +
> +	if (cmpxchg(&set->engines[idx], NULL, ve)) {
> +		intel_virtual_engine_destroy(ve);
> +		return -EEXIST;
> +	}
> +
> +	return 0;
> +}
> +
>   static const i915_user_extension_fn set_engines__extensions[] = {
> +	[I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE] = set_engines__load_balance,
>   };
>   
>   static int
> @@ -1426,13 +1546,13 @@ set_engines(struct i915_gem_context *ctx,
>   					   ARRAY_SIZE(set_engines__extensions),
>   					   &set);
>   	if (err) {
> -		kfree(set.engines);
> +		free_engines(set.engines, set.nengine);
>   		return err;
>   	}
>   
>   out:
>   	mutex_lock(&ctx->i915->drm.struct_mutex);
> -	kfree(ctx->engines);
> +	free_engines(ctx->engines, ctx->nengine);
>   	ctx->engines = set.engines;
>   	ctx->nengine = set.nengine;
>   	mutex_unlock(&ctx->i915->drm.struct_mutex);
> @@ -1637,6 +1757,7 @@ static int clone_engines(struct i915_gem_context *dst,
>   			 struct i915_gem_context *src)
>   {
>   	struct intel_engine_cs **engines;
> +	int i;
>   
>   	engines = kmemdup(src->engines,
>   			  sizeof(*src->engines) * src->nengine,
> @@ -1644,6 +1765,30 @@ static int clone_engines(struct i915_gem_context *dst,
>   	if (!engines)
>   		return -ENOMEM;
>   
> +	/*
> +	 * Virtual engines are singletons; they can only exist
> +	 * inside a single context, because they embed their
> +	 * HW context... As each virtual context implies a single
> +	 * timeline (each engine can only dequeue a single request
> +	 * at any time), it would be surprising for two contexts
> +	 * to use the same engine. So let's create a copy of
> +	 * the virtual engine instead.
> +	 */
> +	for (i = 0; i < src->nengine; i++) {
> +		struct intel_engine_cs *engine = engines[i];
> +
> +		if (!intel_engine_is_virtual(engine))
> +			continue;
> +
> +		engine = intel_execlists_clone_virtual(dst, engine);
> +		if (IS_ERR(engine)) {
> +			free_engines(engines, i);
> +			return PTR_ERR(engine);
> +		}
> +
> +		engines[i] = engine;
> +	}
> +
>   	dst->engines = engines;
>   	dst->nengine = src->nengine;
>   	return 0;
> diff --git a/drivers/gpu/drm/i915/i915_scheduler.c b/drivers/gpu/drm/i915/i915_scheduler.c
> index e0f609d01564..bb9819dbe313 100644
> --- a/drivers/gpu/drm/i915/i915_scheduler.c
> +++ b/drivers/gpu/drm/i915/i915_scheduler.c
> @@ -247,17 +247,25 @@ sched_lock_engine(const struct i915_sched_node *node,
>   		  struct intel_engine_cs *locked,
>   		  struct sched_cache *cache)
>   {
> -	struct intel_engine_cs *engine = node_to_request(node)->engine;
> +	const struct i915_request *rq = node_to_request(node);
> +	struct intel_engine_cs *engine;
>   
>   	GEM_BUG_ON(!locked);
>   
> -	if (engine != locked) {
> +	/*
> +	 * Virtual engines complicate acquiring the engine timeline lock,
> +	 * as their rq->engine pointer is not stable until under that
> +	 * engine lock. The simple ploy we use is to take the lock then
> +	 * check that the rq still belongs to the newly locked engine.
> +	 */
> +	while (locked != (engine = READ_ONCE(rq->engine))) {
>   		spin_unlock(&locked->timeline.lock);
>   		memset(cache, 0, sizeof(*cache));
>   		spin_lock(&engine->timeline.lock);
> +		locked = engine;
>   	}
>   
> -	return engine;
> +	return locked;
>   }
>   
>   static bool inflight(const struct i915_request *rq,
> @@ -370,8 +378,11 @@ static void __i915_schedule(struct i915_request *rq,
>   		if (prio <= node->attr.priority || node_signaled(node))
>   			continue;
>   
> +		GEM_BUG_ON(node_to_request(node)->engine != engine);
> +
>   		node->attr.priority = prio;
>   		if (!list_empty(&node->link)) {
> +			GEM_BUG_ON(intel_engine_is_virtual(engine));
>   			if (!cache.priolist)
>   				cache.priolist =
>   					i915_sched_lookup_priolist(engine,
> diff --git a/drivers/gpu/drm/i915/i915_timeline_types.h b/drivers/gpu/drm/i915/i915_timeline_types.h
> index 8ff146dc05ba..5e445f145eb1 100644
> --- a/drivers/gpu/drm/i915/i915_timeline_types.h
> +++ b/drivers/gpu/drm/i915/i915_timeline_types.h
> @@ -25,6 +25,7 @@ struct i915_timeline {
>   	spinlock_t lock;
>   #define TIMELINE_CLIENT 0 /* default subclass */
>   #define TIMELINE_ENGINE 1
> +#define TIMELINE_VIRTUAL 2
>   	struct mutex mutex; /* protects the flow of requests */
>   
>   	unsigned int pin_count;
> diff --git a/drivers/gpu/drm/i915/intel_engine_types.h b/drivers/gpu/drm/i915/intel_engine_types.h
> index b0aa1f0d4e47..d54d2a1840cc 100644
> --- a/drivers/gpu/drm/i915/intel_engine_types.h
> +++ b/drivers/gpu/drm/i915/intel_engine_types.h
> @@ -216,6 +216,7 @@ struct intel_engine_execlists {
>   	 * @queue: queue of requests, in priority lists
>   	 */
>   	struct rb_root_cached queue;
> +	struct rb_root_cached virtual;
>   
>   	/**
>   	 * @csb_write: control register for Context Switch buffer
> @@ -421,6 +422,7 @@ struct intel_engine_cs {
>   #define I915_ENGINE_SUPPORTS_STATS   BIT(1)
>   #define I915_ENGINE_HAS_PREEMPTION   BIT(2)
>   #define I915_ENGINE_HAS_SEMAPHORES   BIT(3)
> +#define I915_ENGINE_IS_VIRTUAL       BIT(4)
>   	unsigned int flags;
>   
>   	/*
> @@ -504,6 +506,12 @@ intel_engine_has_semaphores(const struct intel_engine_cs *engine)
>   	return engine->flags & I915_ENGINE_HAS_SEMAPHORES;
>   }
>   
> +static inline bool
> +intel_engine_is_virtual(const struct intel_engine_cs *engine)
> +{
> +	return engine->flags & I915_ENGINE_IS_VIRTUAL;
> +}
> +
>   #define instdone_slice_mask(dev_priv__) \
>   	(IS_GEN(dev_priv__, 7) ? \
>   	 1 : RUNTIME_INFO(dev_priv__)->sseu.slice_mask)
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 7b938eaff9c5..0c97e8f30223 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -166,6 +166,28 @@
>   
>   #define ACTIVE_PRIORITY (I915_PRIORITY_NEWCLIENT | I915_PRIORITY_NOSEMAPHORE)
>   
> +struct virtual_engine {
> +	struct intel_engine_cs base;
> +
> +	struct intel_context context;
> +	struct kref kref;
> +	struct rcu_head rcu;
> +
> +	struct i915_request *request;
> +	struct ve_node {
> +		struct rb_node rb;
> +		int prio;
> +	} nodes[I915_NUM_ENGINES];
> +
> +	unsigned int count;
> +	struct intel_engine_cs *siblings[0];
> +};
> +
> +static struct virtual_engine *to_virtual_engine(struct intel_engine_cs *engine)
> +{
> +	return container_of(engine, struct virtual_engine, base);
> +}
> +
>   static int execlists_context_deferred_alloc(struct intel_context *ce,
>   					    struct intel_engine_cs *engine);
>   static void execlists_init_reg_state(u32 *reg_state,
> @@ -235,7 +257,8 @@ static int queue_prio(const struct intel_engine_execlists *execlists)
>   }
>   
>   static inline bool need_preempt(const struct intel_engine_cs *engine,
> -				const struct i915_request *rq)
> +				const struct i915_request *rq,
> +				struct rb_node *rb)
>   {
>   	int last_prio;
>   
> @@ -270,6 +293,22 @@ static inline bool need_preempt(const struct intel_engine_cs *engine,
>   	    rq_prio(list_next_entry(rq, link)) > last_prio)
>   		return true;
>   
> +	if (rb) { /* XXX virtual precedence */
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		bool preempt = false;
> +
> +		if (engine == ve->siblings[0]) { /* only preempt one sibling */
> +			spin_lock(&ve->base.timeline.lock);
> +			if (ve->request)
> +				preempt = rq_prio(ve->request) > last_prio;
> +			spin_unlock(&ve->base.timeline.lock);
> +		}
> +
> +		if (preempt)
> +			return preempt;
> +	}
> +
>   	/*
>   	 * If the inflight context did not trigger the preemption, then maybe
>   	 * it was the set of queued requests? Pick the highest priority in
> @@ -388,6 +427,8 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   	list_for_each_entry_safe_reverse(rq, rn,
>   					 &engine->timeline.requests,
>   					 link) {
> +		struct intel_engine_cs *owner;
> +
>   		if (i915_request_completed(rq))
>   			break;
>   
> @@ -396,14 +437,23 @@ __unwind_incomplete_requests(struct intel_engine_cs *engine)
>   
>   		GEM_BUG_ON(rq->hw_context->active);
>   
> -		GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> -		if (rq_prio(rq) != prio) {
> -			prio = rq_prio(rq);
> -			pl = i915_sched_lookup_priolist(engine, prio);
> -		}
> -		GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +		owner = rq->hw_context->engine;
> +		if (likely(owner == engine)) {
> +			GEM_BUG_ON(rq_prio(rq) == I915_PRIORITY_INVALID);
> +			if (rq_prio(rq) != prio) {
> +				prio = rq_prio(rq);
> +				pl = i915_sched_lookup_priolist(engine, prio);
> +			}
> +			GEM_BUG_ON(RB_EMPTY_ROOT(&engine->execlists.queue.rb_root));
> +
> +			list_add(&rq->sched.link, pl);
> +		} else {
> +			if (__i915_request_has_started(rq))
> +				rq->sched.attr.priority |= ACTIVE_PRIORITY;
>   
> -		list_add(&rq->sched.link, pl);
> +			rq->engine = owner;
> +			owner->submit_request(rq);
> +		}
>   
>   		active = rq;
>   	}
> @@ -665,6 +715,50 @@ static void complete_preempt_context(struct intel_engine_execlists *execlists)
>   						  execlists));
>   }
>   
> +static void virtual_update_register_offsets(u32 *regs,
> +					    struct intel_engine_cs *engine)
> +{
> +	u32 base = engine->mmio_base;
> +
> +	regs[CTX_CONTEXT_CONTROL] =
> +		i915_mmio_reg_offset(RING_CONTEXT_CONTROL(engine));
> +	regs[CTX_RING_HEAD] = i915_mmio_reg_offset(RING_HEAD(base));
> +	regs[CTX_RING_TAIL] = i915_mmio_reg_offset(RING_TAIL(base));
> +	regs[CTX_RING_BUFFER_START] = i915_mmio_reg_offset(RING_START(base));
> +	regs[CTX_RING_BUFFER_CONTROL] = i915_mmio_reg_offset(RING_CTL(base));
> +
> +	regs[CTX_BB_HEAD_U] = i915_mmio_reg_offset(RING_BBADDR_UDW(base));
> +	regs[CTX_BB_HEAD_L] = i915_mmio_reg_offset(RING_BBADDR(base));
> +	regs[CTX_BB_STATE] = i915_mmio_reg_offset(RING_BBSTATE(base));
> +	regs[CTX_SECOND_BB_HEAD_U] =
> +		i915_mmio_reg_offset(RING_SBBADDR_UDW(base));
> +	regs[CTX_SECOND_BB_HEAD_L] = i915_mmio_reg_offset(RING_SBBADDR(base));
> +	regs[CTX_SECOND_BB_STATE] = i915_mmio_reg_offset(RING_SBBSTATE(base));
> +
> +	regs[CTX_CTX_TIMESTAMP] =
> +		i915_mmio_reg_offset(RING_CTX_TIMESTAMP(base));
> +	regs[CTX_PDP3_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 3));
> +	regs[CTX_PDP3_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 3));
> +	regs[CTX_PDP2_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 2));
> +	regs[CTX_PDP2_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 2));
> +	regs[CTX_PDP1_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 1));
> +	regs[CTX_PDP1_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 1));
> +	regs[CTX_PDP0_UDW] = i915_mmio_reg_offset(GEN8_RING_PDP_UDW(engine, 0));
> +	regs[CTX_PDP0_LDW] = i915_mmio_reg_offset(GEN8_RING_PDP_LDW(engine, 0));
> +
> +	if (engine->class == RENDER_CLASS) {
> +		regs[CTX_RCS_INDIRECT_CTX] =
> +			i915_mmio_reg_offset(RING_INDIRECT_CTX(base));
> +		regs[CTX_RCS_INDIRECT_CTX_OFFSET] =
> +			i915_mmio_reg_offset(RING_INDIRECT_CTX_OFFSET(base));
> +		regs[CTX_BB_PER_CTX_PTR] =
> +			i915_mmio_reg_offset(RING_BB_PER_CTX_PTR(base));
> +
> +		regs[CTX_R_PWR_CLK_STATE] =
> +			i915_mmio_reg_offset(GEN8_R_PWR_CLK_STATE);
> +	}
> +}
> +
>   static void execlists_dequeue(struct intel_engine_cs *engine)
>   {
>   	struct intel_engine_execlists * const execlists = &engine->execlists;
> @@ -697,6 +791,28 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   	 * and context switches) submission.
>   	 */
>   
> +	for (rb = rb_first_cached(&execlists->virtual); rb; ) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +		struct intel_engine_cs *active;
> +
> +		if (!rq) {
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&execlists->virtual);
> +			continue;
> +		}
> +
> +		active = READ_ONCE(ve->context.active);
> +		if (active && active != engine) {
> +			rb = rb_next(rb);
> +			continue;
> +		}
> +
> +		break;
> +	}
> +
>   	if (last) {
>   		/*
>   		 * Don't resubmit or switch until all outstanding
> @@ -718,7 +834,7 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		if (!execlists_is_active(execlists, EXECLISTS_ACTIVE_HWACK))
>   			return;
>   
> -		if (need_preempt(engine, last)) {
> +		if (need_preempt(engine, last, rb)) {
>   			inject_preempt_context(engine);
>   			return;
>   		}
> @@ -758,6 +874,72 @@ static void execlists_dequeue(struct intel_engine_cs *engine)
>   		last->tail = last->wa_tail;
>   	}
>   
> +	while (rb) { /* XXX virtual is always taking precedence */
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq;
> +
> +		spin_lock(&ve->base.timeline.lock);
> +
> +		rq = ve->request;
> +		if (unlikely(!rq)) { /* lost the race to a sibling */
> +			spin_unlock(&ve->base.timeline.lock);
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +			rb = rb_first_cached(&execlists->virtual);
> +			continue;
> +		}
> +
> +		if (rq_prio(rq) >= queue_prio(execlists)) {
> +			if (last && !can_merge_rq(last, rq)) {
> +				spin_unlock(&ve->base.timeline.lock);
> +				return; /* leave this rq for another engine */
> +			}
> +
> +			GEM_BUG_ON(rq->engine != &ve->base);
> +			ve->request = NULL;
> +			ve->base.execlists.queue_priority_hint = INT_MIN;
> +			rb_erase_cached(rb, &execlists->virtual);
> +			RB_CLEAR_NODE(rb);
> +
> +			GEM_BUG_ON(rq->hw_context != &ve->context);
> +			rq->engine = engine;
> +
> +			if (engine != ve->siblings[0]) {
> +				u32 *regs = ve->context.lrc_reg_state;
> +				unsigned int n;
> +
> +				GEM_BUG_ON(READ_ONCE(ve->context.active));
> +				virtual_update_register_offsets(regs, engine);
> +
> +				/*
> +				 * Move the bound engine to the top of the list
> +				 * for future execution. We then kick this
> +				 * tasklet first before checking others, so that
> +				 * we preferentially reuse this set of bound
> +				 * registers.
> +				 */
> +				for (n = 1; n < ve->count; n++) {
> +					if (ve->siblings[n] == engine) {
> +						swap(ve->siblings[n],
> +						     ve->siblings[0]);
> +						break;
> +					}
> +				}
> +
> +				GEM_BUG_ON(ve->siblings[0] != engine);
> +			}
> +
> +			__i915_request_submit(rq);
> +			trace_i915_request_in(rq, port_index(port, execlists));
> +			submit = true;
> +			last = rq;
> +		}
> +
> +		spin_unlock(&ve->base.timeline.lock);
> +		break;
> +	}
> +
>   	while ((rb = rb_first_cached(&execlists->queue))) {
>   		struct i915_priolist *p = to_priolist(rb);
>   		struct i915_request *rq, *rn;
> @@ -2904,6 +3086,304 @@ void intel_lr_context_resume(struct drm_i915_private *i915)
>   	}
>   }
>   
> +static void __virtual_engine_free(struct rcu_head *rcu)
> +{
> +	struct virtual_engine *ve = container_of(rcu, typeof(*ve), rcu);
> +
> +	kfree(ve);
> +}
> +
> +static void virtual_engine_free(struct kref *kref)
> +{
> +	struct virtual_engine *ve = container_of(kref, typeof(*ve), kref);
> +	unsigned int n;
> +
> +	GEM_BUG_ON(ve->request);
> +	GEM_BUG_ON(ve->context.active);
> +
> +	for (n = 0; n < ve->count; n++) {
> +		struct intel_engine_cs *sibling = ve->siblings[n];
> +		struct rb_node *node = &ve->nodes[sibling->id].rb;
> +
> +		if (RB_EMPTY_NODE(node))
> +			continue;
> +
> +		spin_lock_irq(&sibling->timeline.lock);
> +
> +		if (!RB_EMPTY_NODE(node))
> +			rb_erase_cached(node, &sibling->execlists.virtual);
> +
> +		spin_unlock_irq(&sibling->timeline.lock);
> +	}
> +	GEM_BUG_ON(__tasklet_is_scheduled(&ve->base.execlists.tasklet));
> +
> +	if (ve->context.state)
> +		__execlists_context_fini(&ve->context);
> +
> +	i915_timeline_fini(&ve->base.timeline);
> +	call_rcu(&ve->rcu, __virtual_engine_free);
> +}
> +
> +static void virtual_context_unpin(struct intel_context *ce)
> +{
> +	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> +
> +	execlists_context_unpin(ce);
> +
> +	kref_put(&ve->kref, virtual_engine_free);
> +}
> +
> +static void virtual_engine_initial_hint(struct virtual_engine *ve)
> +{
> +	int swp;
> +
> +	/*
> +	 * Pick a random sibling on starting to help spread the load around.
> +	 *
> +	 * New contexts are typically created with exactly the same order
> +	 * of siblings, and often started in batches. Due to the way we iterate
> +	 * the array of sibling when submitting requests, sibling[0] is
> +	 * prioritised for dequeuing. If we make sure that sibling[0] is fairly
> +	 * randomised across the system, we also help spread the load by the
> +	 * first engine we inspect being different each time.
> +	 *
> +	 * NB This does not force us to execute on this engine, it will just
> +	 * typically be the first we inspect for submission.
> +	 */
> +	swp = prandom_u32_max(ve->count);
> +	if (!swp)
> +		return;
> +
> +	swap(ve->siblings[swp], ve->siblings[0]);
> +	virtual_update_register_offsets(ve->context.lrc_reg_state,
> +					ve->siblings[0]);
> +}
> +
> +static int virtual_context_pin(struct intel_context *ce)
> +{
> +	struct virtual_engine *ve = container_of(ce, typeof(*ve), context);
> +	int err;
> +
> +	/* Note: we must use a real engine class for setting up reg state */
> +	err = __execlists_context_pin(ce, ve->siblings[0]);
> +	if (err)
> +		return err;
> +
> +	virtual_engine_initial_hint(ve);
> +
> +	kref_get(&ve->kref);
> +	return 0;
> +}
> +
> +static const struct intel_context_ops virtual_context_ops = {
> +	.pin = virtual_context_pin,
> +	.unpin = virtual_context_unpin,
> +};
> +
> +static void virtual_submission_tasklet(unsigned long data)
> +{
> +	struct virtual_engine * const ve = (struct virtual_engine *)data;
> +	unsigned int n;
> +	int prio;
> +
> +	prio = READ_ONCE(ve->base.execlists.queue_priority_hint);
> +	if (prio == INT_MIN)
> +		return;
> +
> +	local_irq_disable();
> +	for (n = 0; READ_ONCE(ve->request) && n < ve->count; n++) {
> +		struct intel_engine_cs *sibling = ve->siblings[n];
> +		struct ve_node * const node = &ve->nodes[sibling->id];
> +		struct rb_node **parent, *rb;
> +		bool first;
> +
> +		spin_lock(&sibling->timeline.lock);
> +
> +		if (!RB_EMPTY_NODE(&node->rb)) {
> +			first = rb_first_cached(&sibling->execlists.virtual) == &node->rb;
> +			if (prio == node->prio || (prio > node->prio && first))
> +				goto submit_engine;
> +
> +			rb_erase_cached(&node->rb, &sibling->execlists.virtual);
> +		}
> +
> +		rb = NULL;
> +		first = true;
> +		parent = &sibling->execlists.virtual.rb_root.rb_node;
> +		while (*parent) {
> +			struct ve_node *other;
> +
> +			rb = *parent;
> +			other = rb_entry(rb, typeof(*other), rb);
> +			if (prio > other->prio) {
> +				parent = &rb->rb_left;
> +			} else {
> +				parent = &rb->rb_right;
> +				first = false;
> +			}
> +		}
> +
> +		rb_link_node(&node->rb, rb, parent);
> +		rb_insert_color_cached(&node->rb,
> +				       &sibling->execlists.virtual,
> +				       first);
> +
> +submit_engine:
> +		GEM_BUG_ON(RB_EMPTY_NODE(&node->rb));
> +		node->prio = prio;
> +		if (first && prio > sibling->execlists.queue_priority_hint) {
> +			sibling->execlists.queue_priority_hint = prio;
> +			tasklet_hi_schedule(&sibling->execlists.tasklet);
> +		}
> +
> +		spin_unlock(&sibling->timeline.lock);
> +	}
> +	local_irq_enable();
> +}
> +
> +static void virtual_submit_request(struct i915_request *request)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(request->engine);
> +
> +	GEM_BUG_ON(ve->base.submit_request != virtual_submit_request);
> +
> +	GEM_BUG_ON(ve->request);
> +	ve->base.execlists.queue_priority_hint = rq_prio(request);
> +	WRITE_ONCE(ve->request, request);
> +
> +	tasklet_schedule(&ve->base.execlists.tasklet);
> +}
> +
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count)
> +{
> +	struct virtual_engine *ve;
> +	unsigned int n;
> +	int err;
> +
> +	if (!count)
> +		return ERR_PTR(-EINVAL);
> +
> +	ve = kzalloc(struct_size(ve, siblings, count), GFP_KERNEL);
> +	if (!ve)
> +		return ERR_PTR(-ENOMEM);
> +
> +	kref_init(&ve->kref);
> +	rcu_head_init(&ve->rcu);
> +	ve->base.i915 = ctx->i915;
> +	ve->base.id = -1;
> +	ve->base.class = OTHER_CLASS;
> +	ve->base.uabi_class = I915_ENGINE_CLASS_INVALID;
> +	ve->base.instance = I915_ENGINE_CLASS_INVALID_VIRTUAL;
> +	ve->base.flags = I915_ENGINE_IS_VIRTUAL;
> +
> +	snprintf(ve->base.name, sizeof(ve->base.name), "virtual");
> +
> +	err = i915_timeline_init(ctx->i915,
> +				 &ve->base.timeline,
> +				 ve->base.name,
> +				 NULL);
> +	if (err)
> +		goto err_put;
> +	i915_timeline_set_subclass(&ve->base.timeline, TIMELINE_VIRTUAL);
> +
> +	ve->base.cops = &virtual_context_ops;
> +	ve->base.request_alloc = execlists_request_alloc;
> +
> +	ve->base.schedule = i915_schedule;
> +	ve->base.submit_request = virtual_submit_request;
> +
> +	ve->base.execlists.queue_priority_hint = INT_MIN;
> +	tasklet_init(&ve->base.execlists.tasklet,
> +		     virtual_submission_tasklet,
> +		     (unsigned long)ve);
> +
> +	intel_context_init(&ve->context, ctx, &ve->base);
> +
> +	for (n = 0; n < count; n++) {
> +		struct intel_engine_cs *sibling = siblings[n];
> +
> +		GEM_BUG_ON(!is_power_of_2(sibling->mask));
> +		if (sibling->mask & ve->base.mask)
> +			continue;
> +
> +		if (sibling->execlists.tasklet.func != execlists_submission_tasklet) {
> +			err = -ENODEV;
> +			goto err_put;
> +		}
> +
> +		GEM_BUG_ON(RB_EMPTY_NODE(&ve->nodes[sibling->id].rb));
> +		RB_CLEAR_NODE(&ve->nodes[sibling->id].rb);
> +
> +		ve->siblings[ve->count++] = sibling;
> +		ve->base.mask |= sibling->mask;
> +
> +		if (ve->base.class != OTHER_CLASS) {
> +			if (ve->base.class != sibling->class) {
> +				err = -EINVAL;
> +				goto err_put;
> +			}
> +			continue;
> +		}
> +
> +		ve->base.class = sibling->class;
> +		snprintf(ve->base.name, sizeof(ve->base.name),
> +			 "v%dx%d", ve->base.class, count);
> +		ve->base.context_size = sibling->context_size;
> +
> +		ve->base.emit_bb_start = sibling->emit_bb_start;
> +		ve->base.emit_flush = sibling->emit_flush;
> +		ve->base.emit_init_breadcrumb = sibling->emit_init_breadcrumb;
> +		ve->base.emit_fini_breadcrumb = sibling->emit_fini_breadcrumb;
> +		ve->base.emit_fini_breadcrumb_dw =
> +			sibling->emit_fini_breadcrumb_dw;
> +	}
> +
> +	/* gracefully replace a degenerate virtual engine */
> +	if (is_power_of_2(ve->base.mask)) {
> +		struct intel_engine_cs *actual = ve->siblings[0];
> +		virtual_engine_free(&ve->kref);
> +		return actual;
> +	}
> +
> +	__intel_context_insert(ctx, &ve->base, &ve->context);
> +	return &ve->base;
> +
> +err_put:
> +	virtual_engine_free(&ve->kref);
> +	return ERR_PTR(err);
> +}
> +
> +struct intel_engine_cs *
> +intel_execlists_clone_virtual(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *src)
> +{
> +	struct virtual_engine *se = to_virtual_engine(src);
> +	struct intel_engine_cs *dst;
> +
> +	dst = intel_execlists_create_virtual(ctx,
> +					     se->siblings,
> +					     se->count);
> +	if (IS_ERR(dst))
> +		return dst;
> +
> +	return dst;
> +}
> +
> +void intel_virtual_engine_destroy(struct intel_engine_cs *engine)
> +{
> +	struct virtual_engine *ve = to_virtual_engine(engine);
> +
> +	if (!engine || !intel_engine_is_virtual(engine))
> +		return;
> +
> +	__intel_context_remove(&ve->context);
> +
> +	kref_put(&ve->kref, virtual_engine_free);
> +}
> +
>   void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   				   struct drm_printer *m,
>   				   void (*show_request)(struct drm_printer *m,
> @@ -2961,6 +3441,29 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   		show_request(m, last, "\t\tQ ");
>   	}
>   
> +	last = NULL;
> +	count = 0;
> +	for (rb = rb_first_cached(&execlists->virtual); rb; rb = rb_next(rb)) {
> +		struct virtual_engine *ve =
> +			rb_entry(rb, typeof(*ve), nodes[engine->id].rb);
> +		struct i915_request *rq = READ_ONCE(ve->request);
> +
> +		if (rq) {
> +			if (count++ < max - 1)
> +				show_request(m, rq, "\t\tV ");
> +			else
> +				last = rq;
> +		}
> +	}
> +	if (last) {
> +		if (count > max) {
> +			drm_printf(m,
> +				   "\t\t...skipping %d virtual requests...\n",
> +				   count - max);
> +		}
> +		show_request(m, last, "\t\tV ");
> +	}
> +
>   	spin_unlock_irqrestore(&engine->timeline.lock, flags);
>   }
>   
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index f1aec8a6986f..9d90dc68e02b 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -112,6 +112,17 @@ void intel_execlists_show_requests(struct intel_engine_cs *engine,
>   							const char *prefix),
>   				   unsigned int max);
>   
> +struct intel_engine_cs *
> +intel_execlists_create_virtual(struct i915_gem_context *ctx,
> +			       struct intel_engine_cs **siblings,
> +			       unsigned int count);
> +
> +struct intel_engine_cs *
> +intel_execlists_clone_virtual(struct i915_gem_context *ctx,
> +			      struct intel_engine_cs *src);
> +
> +void intel_virtual_engine_destroy(struct intel_engine_cs *engine);
> +
>   u32 gen8_make_rpcs(struct drm_i915_private *i915, struct intel_sseu *ctx_sseu);
>   
>   #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/selftests/intel_lrc.c b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> index d61520ea03c1..4b8a339529d1 100644
> --- a/drivers/gpu/drm/i915/selftests/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/selftests/intel_lrc.c
> @@ -10,6 +10,7 @@
>   
>   #include "../i915_selftest.h"
>   #include "igt_flush_test.h"
> +#include "igt_live_test.h"
>   #include "igt_spinner.h"
>   #include "i915_random.h"
>   
> @@ -1060,6 +1061,169 @@ static int live_preempt_smoke(void *arg)
>   	return err;
>   }
>   
> +static int nop_virtual_engine(struct drm_i915_private *i915,
> +			      struct intel_engine_cs **siblings,
> +			      unsigned int nsibling,
> +			      unsigned int nctx,
> +			      unsigned int flags)
> +#define CHAIN BIT(0)
> +{
> +	IGT_TIMEOUT(end_time);
> +	struct i915_request *request[16];
> +	struct i915_gem_context *ctx[16];
> +	struct intel_engine_cs *ve[16];
> +	unsigned long n, prime, nc;
> +	struct igt_live_test t;
> +	ktime_t times[2] = {};
> +	int err;
> +
> +	GEM_BUG_ON(!nctx || nctx > ARRAY_SIZE(ctx));
> +
> +	for (n = 0; n < nctx; n++) {
> +		ctx[n] = kernel_context(i915);
> +		if (!ctx[n])
> +			return -ENOMEM;
> +
> +		ve[n] = intel_execlists_create_virtual(ctx[n],
> +						       siblings, nsibling);
> +		if (IS_ERR(ve[n]))
> +			return PTR_ERR(ve[n]);
> +	}
> +
> +	err = igt_live_test_begin(&t, i915, __func__, ve[0]->name);
> +	if (err)
> +		goto out;
> +
> +	for_each_prime_number_from(prime, 1, 8192) {
> +		times[1] = ktime_get_raw();
> +
> +		if (flags & CHAIN) {
> +			for (nc = 0; nc < nctx; nc++) {
> +				for (n = 0; n < prime; n++) {
> +					request[nc] =
> +						i915_request_alloc(ve[nc], ctx[nc]);
> +					if (IS_ERR(request[nc])) {
> +						err = PTR_ERR(request[nc]);
> +						goto out;
> +					}
> +
> +					i915_request_add(request[nc]);
> +				}
> +			}
> +		} else {
> +			for (n = 0; n < prime; n++) {
> +				for (nc = 0; nc < nctx; nc++) {
> +					request[nc] =
> +						i915_request_alloc(ve[nc], ctx[nc]);
> +					if (IS_ERR(request[nc])) {
> +						err = PTR_ERR(request[nc]);
> +						goto out;
> +					}
> +
> +					i915_request_add(request[nc]);
> +				}
> +			}
> +		}
> +
> +		for (nc = 0; nc < nctx; nc++) {
> +			if (i915_request_wait(request[nc],
> +					      I915_WAIT_LOCKED,
> +					      HZ / 10) < 0) {
> +				pr_err("%s(%s): wait for %llx:%lld timed out\n",
> +				       __func__, ve[0]->name,
> +				       request[nc]->fence.context,
> +				       request[nc]->fence.seqno);
> +
> +				GEM_TRACE("%s(%s) failed at request %llx:%lld\n",
> +					  __func__, ve[0]->name,
> +					  request[nc]->fence.context,
> +					  request[nc]->fence.seqno);
> +				GEM_TRACE_DUMP();
> +				i915_gem_set_wedged(i915);
> +				break;
> +			}
> +		}
> +
> +		times[1] = ktime_sub(ktime_get_raw(), times[1]);
> +		if (prime == 1)
> +			times[0] = times[1];
> +
> +		if (__igt_timeout(end_time, NULL))
> +			break;
> +	}
> +
> +	err = igt_live_test_end(&t);
> +	if (err)
> +		goto out;
> +
> +	pr_info("Requestx%d latencies on %s: 1 = %lluns, %lu = %lluns\n",
> +		nctx, ve[0]->name, ktime_to_ns(times[0]),
> +		prime, div64_u64(ktime_to_ns(times[1]), prime));
> +
> +out:
> +	if (igt_flush_test(i915, I915_WAIT_LOCKED))
> +		err = -EIO;
> +
> +	for (nc = 0; nc < nctx; nc++) {
> +		intel_virtual_engine_destroy(ve[nc]);
> +		kernel_context_close(ctx[nc]);
> +	}
> +	return err;
> +}
> +
> +static int live_virtual_engine(void *arg)
> +{
> +	struct drm_i915_private *i915 = arg;
> +	struct intel_engine_cs *siblings[MAX_ENGINE_INSTANCE + 1];
> +	struct intel_engine_cs *engine;
> +	enum intel_engine_id id;
> +	unsigned int class, inst;
> +	int err = -ENODEV;
> +
> +	if (USES_GUC_SUBMISSION(i915))
> +		return 0;
> +
> +	mutex_lock(&i915->drm.struct_mutex);
> +
> +	for_each_engine(engine, i915, id) {
> +		err = nop_virtual_engine(i915, &engine, 1, 1, 0);
> +		if (err) {
> +			pr_err("Failed to wrap engine %s: err=%d\n",
> +			       engine->name, err);
> +			goto out_unlock;
> +		}
> +	}
> +
> +	for (class = 0; class <= MAX_ENGINE_CLASS; class++) {
> +		int nsibling, n;
> +
> +		nsibling = 0;
> +		for (inst = 0; inst <= MAX_ENGINE_INSTANCE; inst++) {
> +			if (!i915->engine_class[class][inst])
> +				break;
> +
> +			siblings[nsibling++] = i915->engine_class[class][inst];
> +		}
> +		if (nsibling < 2)
> +			continue;
> +
> +		for (n = 1; n <= nsibling + 1; n++) {
> +			err = nop_virtual_engine(i915, siblings, nsibling,
> +						 n, 0);
> +			if (err)
> +				goto out_unlock;
> +		}
> +
> +		err = nop_virtual_engine(i915, siblings, nsibling, n, CHAIN);
> +		if (err)
> +			goto out_unlock;
> +	}
> +
> +out_unlock:
> +	mutex_unlock(&i915->drm.struct_mutex);
> +	return err;
> +}
> +
>   int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   {
>   	static const struct i915_subtest tests[] = {
> @@ -1071,6 +1235,7 @@ int intel_execlists_live_selftests(struct drm_i915_private *i915)
>   		SUBTEST(live_chain_preempt),
>   		SUBTEST(live_preempt_hang),
>   		SUBTEST(live_preempt_smoke),
> +		SUBTEST(live_virtual_engine),
>   	};
>   
>   	if (!HAS_EXECLISTS(i915))
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index a609619610f2..592b02676044 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -125,6 +125,7 @@ enum drm_i915_gem_engine_class {
>   };
>   
>   #define I915_ENGINE_CLASS_INVALID_NONE -1
> +#define I915_ENGINE_CLASS_INVALID_VIRTUAL 0
>   
>   /**
>    * DOC: perf_events exposed by i915 through /sys/bus/event_sources/drivers/i915
> @@ -1596,8 +1597,37 @@ struct drm_i915_gem_context_param_sseu {
>   	__u32 rsvd;
>   };
>   
> +/*
> + * i915_context_engines_load_balance:
> + *
> + * Enable load balancing across this set of engines.
> + *
> + * Into the I915_EXEC_DEFAULT slot [0], a virtual engine is created that when
> + * used will proxy the execbuffer request onto one of the set of engines
> + * in such a way as to distribute the load evenly across the set.
> + *
> + * The set of engines must be compatible (e.g. the same HW class) as they
> + * will share the same logical GPU context and ring.
> + *
> + * To intermix rendering with the virtual engine and direct rendering onto
> + * the backing engines (bypassing the load balancing proxy), the context must
> + * be defined to use a single timeline for all engines.
> + */
> +struct i915_context_engines_load_balance {
> +	struct i915_user_extension base;
> +
> +	__u16 engine_index;
> +	__u16 mbz16; /* reserved for future use; must be zero */
> +	__u32 flags; /* all undefined flags must be zero */
> +
> +	__u64 engines_mask; /* selection mask of engines[] */
> +
> +	__u64 mbz64[4]; /* reserved for future use; must be zero */
> +};
> +
>   struct i915_context_param_engines {
>   	__u64 extensions; /* linked chain of extension blocks, 0 terminates */
> +#define I915_CONTEXT_ENGINES_EXT_LOAD_BALANCE 0
>   
>   	struct {
>   		__u16 engine_class; /* see enum drm_i915_gem_engine_class */
> 
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 10/13] drm/i915: Load balancing across a virtual engine
  2019-03-12  7:52   ` Tvrtko Ursulin
@ 2019-03-12  8:56     ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-12  8:56 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-12 07:52:08)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > +static int
> > +set_engines__load_balance(struct i915_user_extension __user *base, void *data)
> > +{
> > +     struct i915_context_engines_load_balance __user *ext =
> > +             container_of_user(base, typeof(*ext), base);
> > +     const struct set_engines *set = data;
> > +     struct intel_engine_cs *ve;
> > +     unsigned int n;
> > +     u64 mask;
> > +     u16 idx;
> > +     int err;
> > +
> > +     if (!HAS_EXECLISTS(set->ctx->i915))
> > +             return -ENODEV;
> > +
> > +     if (USES_GUC_SUBMISSION(set->ctx->i915))
> > +             return -ENODEV; /* not implement yet */
> 
> Didn't it used to be that you were checking for single timeline flag 
> somewhere around here? Now you allow multi-timeline map with a virtual 
> engine slot?

Yes, on reflection I decided that was overly prescriptive. If userspace
wants to create a context with a mixed setup of veng and normal engines,
it can choose whether or not they are a single timeline or setup. The
recommendation is that they share a timeline as that's is more likely to
match their client API.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-08 14:33   ` Tvrtko Ursulin
@ 2019-03-13 10:50     ` Chris Wilson
  2019-03-13 11:13       ` Tvrtko Ursulin
  0 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-13 10:50 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-08 14:33:02)
> 
> On 08/03/2019 14:12, Chris Wilson wrote:
> > +int i915_user_extensions(struct i915_user_extension __user *ext,
> > +                      const i915_user_extension_fn *tbl,
> > +                      unsigned long count,
> > +                      void *data)
> > +{
> > +     unsigned int stackdepth = 512;
> 
> I have doubts about usefulness of trying to impose some limit now. And 
> also reservations about using the name stack. But both are irrelevant 
> implementation details at this stage so meh.

We need defence against malice userspace doing
	struct i915_user_extension ext = {
		.next_extension = &ext,
	};
so sadly some limit is required.

> > +
> > +     while (ext) {
> > +             int err;
> > +             u64 x;
> > +
> > +             if (!stackdepth--) /* recursion vs useful flexibility */
> > +                     return -EINVAL;
> > +
> > +             if (get_user(x, &ext->name))
> > +                     return -EFAULT;
> > +
> > +             err = -EINVAL;
> > +             if (x < count && tbl[x])
> > +                     err = tbl[x](ext, data);
> 
> How about:
> 
>                 put_user(err, &ext->result);
> 
> And:
> 
> struct i915_user_extension {
>         __u64 next_extension;
>         __u64 name;
>         __u32 result;
>         __u32 mbz;
> };
> 
> So we add the ability for each extension to store it's exit code giving 
> userspace opportunity to know which one failed.
> 
> With this I would be satisfied usability is future proof enough.

I'm sorely tempted. The biggest objection I have is this defeats the
elegance of a read-only chain. So who would actually use it?

err = gem_context_create_ext(&chain);
if (err) {
	struct i915_user_extension *ext = (struct i915_user_extension *)chain;
	while (ext && !ext->result)
		ext = (struct i915_user_extension *)ext->next_extension;
	if (ext)
		fprintf(stderr, "context creation failed at extension: %lld", ext->name);
}

What exactly are they going to do? They are not going to do anything
like
	while (err) {
		ext = first_faulty_ext(&chain);
		switch (ext->name) {
		case ...:  do_fixup_A(ext);
		}
		err = gem_context_create_ext(&chain);
	}

I'm not really seeing how they benefit over, and above, handling the
ioctl error by printing out the entire erroneous struct and chain, and
falling back to avoiding that ioctl.

I think what you really want is a per-application/fd debug log, so that
we can dump the actual errors as they arise (without leaking them into
the general syslog).
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-13 10:50     ` Chris Wilson
@ 2019-03-13 11:13       ` Tvrtko Ursulin
  2019-03-13 11:21         ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-13 11:13 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 13/03/2019 10:50, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-08 14:33:02)
>>
>> On 08/03/2019 14:12, Chris Wilson wrote:
>>> +int i915_user_extensions(struct i915_user_extension __user *ext,
>>> +                      const i915_user_extension_fn *tbl,
>>> +                      unsigned long count,
>>> +                      void *data)
>>> +{
>>> +     unsigned int stackdepth = 512;
>>
>> I have doubts about usefulness of trying to impose some limit now. And
>> also reservations about using the name stack. But both are irrelevant
>> implementation details at this stage so meh.
> 
> We need defence against malice userspace doing
> 	struct i915_user_extension ext = {
> 		.next_extension = &ext,
> 	};
> so sadly some limit is required.

Oh yes, good point. I wasn't thinking maliciously enough.

S possible alternative solution could be, in conjunction with the result 
field from below, to only allow visiting any extension once. It would 
require reserving some value as meaning "not visited". Probably zero, so 
non-zero in result would immediately fail the chain, but would also I 
think mean we only support negative values in result as output, mapping 
zeros to one.

>>> +
>>> +     while (ext) {
>>> +             int err;
>>> +             u64 x;
>>> +
>>> +             if (!stackdepth--) /* recursion vs useful flexibility */
>>> +                     return -EINVAL;
>>> +
>>> +             if (get_user(x, &ext->name))
>>> +                     return -EFAULT;
>>> +
>>> +             err = -EINVAL;
>>> +             if (x < count && tbl[x])
>>> +                     err = tbl[x](ext, data);
>>
>> How about:
>>
>>                  put_user(err, &ext->result);
>>
>> And:
>>
>> struct i915_user_extension {
>>          __u64 next_extension;
>>          __u64 name;
>>          __u32 result;
>>          __u32 mbz;
>> };
>>
>> So we add the ability for each extension to store it's exit code giving
>> userspace opportunity to know which one failed.
>>
>> With this I would be satisfied usability is future proof enough.
> 
> I'm sorely tempted. The biggest objection I have is this defeats the
> elegance of a read-only chain. So who would actually use it?
> 
> err = gem_context_create_ext(&chain);
> if (err) {
> 	struct i915_user_extension *ext = (struct i915_user_extension *)chain;
> 	while (ext && !ext->result)
> 		ext = (struct i915_user_extension *)ext->next_extension;
> 	if (ext)
> 		fprintf(stderr, "context creation failed at extension: %lld", ext->name);
> }
> 
> What exactly are they going to do? They are not going to do anything
> like
> 	while (err) {
> 		ext = first_faulty_ext(&chain);
> 		switch (ext->name) {
> 		case ...:  do_fixup_A(ext);
> 		}
> 		err = gem_context_create_ext(&chain);
> 	}
> 
> I'm not really seeing how they benefit over, and above, handling the
> ioctl error by printing out the entire erroneous struct and chain, and
> falling back to avoiding that ioctl.
> 
> I think what you really want is a per-application/fd debug log, so that
> we can dump the actual errors as they arise (without leaking them into
> the general syslog).

Maybe.. could be an extension of the existing problem of "What EINVAL 
you mean exactly?" indeed.

I don't see a problem with writing back though?

Regards,

Tvrtko




_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-13 11:13       ` Tvrtko Ursulin
@ 2019-03-13 11:21         ` Chris Wilson
  2019-03-13 11:35           ` Tvrtko Ursulin
  0 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-13 11:21 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-13 11:13:10)
> 
> On 13/03/2019 10:50, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-08 14:33:02)
> >>
> >> On 08/03/2019 14:12, Chris Wilson wrote:
> >>> +int i915_user_extensions(struct i915_user_extension __user *ext,
> >>> +                      const i915_user_extension_fn *tbl,
> >>> +                      unsigned long count,
> >>> +                      void *data)
> >>> +{
> >>> +     unsigned int stackdepth = 512;
> >>
> >> I have doubts about usefulness of trying to impose some limit now. And
> >> also reservations about using the name stack. But both are irrelevant
> >> implementation details at this stage so meh.
> > 
> > We need defence against malice userspace doing
> >       struct i915_user_extension ext = {
> >               .next_extension = &ext,
> >       };
> > so sadly some limit is required.
> 
> Oh yes, good point. I wasn't thinking maliciously enough.
> 
> S possible alternative solution could be, in conjunction with the result 
> field from below, to only allow visiting any extension once. It would 
> require reserving some value as meaning "not visited". Probably zero, so 
> non-zero in result would immediately fail the chain, but would also I 
> think mean we only support negative values in result as output, mapping 
> zeros to one.

I've avoided using the struct itself for markup so far.
Ugh, it would also mean that userspace has to sanitize the extension
chain between uses.

> >>> +
> >>> +     while (ext) {
> >>> +             int err;
> >>> +             u64 x;
> >>> +
> >>> +             if (!stackdepth--) /* recursion vs useful flexibility */
> >>> +                     return -EINVAL;
> >>> +
> >>> +             if (get_user(x, &ext->name))
> >>> +                     return -EFAULT;
> >>> +
> >>> +             err = -EINVAL;
> >>> +             if (x < count && tbl[x])
> >>> +                     err = tbl[x](ext, data);
> >>
> >> How about:
> >>
> >>                  put_user(err, &ext->result);
> >>
> >> And:
> >>
> >> struct i915_user_extension {
> >>          __u64 next_extension;
> >>          __u64 name;
> >>          __u32 result;
> >>          __u32 mbz;
> >> };
> >>
> >> So we add the ability for each extension to store it's exit code giving
> >> userspace opportunity to know which one failed.
> >>
> >> With this I would be satisfied usability is future proof enough.
> > 
> > I'm sorely tempted. The biggest objection I have is this defeats the
> > elegance of a read-only chain. So who would actually use it?
> > 
> > err = gem_context_create_ext(&chain);
> > if (err) {
> >       struct i915_user_extension *ext = (struct i915_user_extension *)chain;
> >       while (ext && !ext->result)
> >               ext = (struct i915_user_extension *)ext->next_extension;
> >       if (ext)
> >               fprintf(stderr, "context creation failed at extension: %lld", ext->name);
> > }
> > 
> > What exactly are they going to do? They are not going to do anything
> > like
> >       while (err) {
> >               ext = first_faulty_ext(&chain);
> >               switch (ext->name) {
> >               case ...:  do_fixup_A(ext);
> >               }
> >               err = gem_context_create_ext(&chain);
> >       }
> > 
> > I'm not really seeing how they benefit over, and above, handling the
> > ioctl error by printing out the entire erroneous struct and chain, and
> > falling back to avoiding that ioctl.
> > 
> > I think what you really want is a per-application/fd debug log, so that
> > we can dump the actual errors as they arise (without leaking them into
> > the general syslog).
> 
> Maybe.. could be an extension of the existing problem of "What EINVAL 
> you mean exactly?" indeed.
> 
> I don't see a problem with writing back though?

Writing anything gives me the heebie-jeebies. If we keep it a read-only
struct, we can never be tricked into overwriting something important.

It also makes it harder for userspace to reuse as they have to clear the
result field?
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-13 11:21         ` Chris Wilson
@ 2019-03-13 11:35           ` Tvrtko Ursulin
  2019-03-13 11:46             ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-13 11:35 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 13/03/2019 11:21, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-13 11:13:10)
>>
>> On 13/03/2019 10:50, Chris Wilson wrote:
>>> Quoting Tvrtko Ursulin (2019-03-08 14:33:02)
>>>>
>>>> On 08/03/2019 14:12, Chris Wilson wrote:
>>>>> +int i915_user_extensions(struct i915_user_extension __user *ext,
>>>>> +                      const i915_user_extension_fn *tbl,
>>>>> +                      unsigned long count,
>>>>> +                      void *data)
>>>>> +{
>>>>> +     unsigned int stackdepth = 512;
>>>>
>>>> I have doubts about usefulness of trying to impose some limit now. And
>>>> also reservations about using the name stack. But both are irrelevant
>>>> implementation details at this stage so meh.
>>>
>>> We need defence against malice userspace doing
>>>        struct i915_user_extension ext = {
>>>                .next_extension = &ext,
>>>        };
>>> so sadly some limit is required.
>>
>> Oh yes, good point. I wasn't thinking maliciously enough.
>>
>> S possible alternative solution could be, in conjunction with the result
>> field from below, to only allow visiting any extension once. It would
>> require reserving some value as meaning "not visited". Probably zero, so
>> non-zero in result would immediately fail the chain, but would also I
>> think mean we only support negative values in result as output, mapping
>> zeros to one.
> 
> I've avoided using the struct itself for markup so far.
> Ugh, it would also mean that userspace has to sanitize the extension
> chain between uses.
> 
>>>>> +
>>>>> +     while (ext) {
>>>>> +             int err;
>>>>> +             u64 x;
>>>>> +
>>>>> +             if (!stackdepth--) /* recursion vs useful flexibility */
>>>>> +                     return -EINVAL;
>>>>> +
>>>>> +             if (get_user(x, &ext->name))
>>>>> +                     return -EFAULT;
>>>>> +
>>>>> +             err = -EINVAL;
>>>>> +             if (x < count && tbl[x])
>>>>> +                     err = tbl[x](ext, data);
>>>>
>>>> How about:
>>>>
>>>>                   put_user(err, &ext->result);
>>>>
>>>> And:
>>>>
>>>> struct i915_user_extension {
>>>>           __u64 next_extension;
>>>>           __u64 name;
>>>>           __u32 result;
>>>>           __u32 mbz;
>>>> };
>>>>
>>>> So we add the ability for each extension to store it's exit code giving
>>>> userspace opportunity to know which one failed.
>>>>
>>>> With this I would be satisfied usability is future proof enough.
>>>
>>> I'm sorely tempted. The biggest objection I have is this defeats the
>>> elegance of a read-only chain. So who would actually use it?
>>>
>>> err = gem_context_create_ext(&chain);
>>> if (err) {
>>>        struct i915_user_extension *ext = (struct i915_user_extension *)chain;
>>>        while (ext && !ext->result)
>>>                ext = (struct i915_user_extension *)ext->next_extension;
>>>        if (ext)
>>>                fprintf(stderr, "context creation failed at extension: %lld", ext->name);
>>> }
>>>
>>> What exactly are they going to do? They are not going to do anything
>>> like
>>>        while (err) {
>>>                ext = first_faulty_ext(&chain);
>>>                switch (ext->name) {
>>>                case ...:  do_fixup_A(ext);
>>>                }
>>>                err = gem_context_create_ext(&chain);
>>>        }
>>>
>>> I'm not really seeing how they benefit over, and above, handling the
>>> ioctl error by printing out the entire erroneous struct and chain, and
>>> falling back to avoiding that ioctl.
>>>
>>> I think what you really want is a per-application/fd debug log, so that
>>> we can dump the actual errors as they arise (without leaking them into
>>> the general syslog).
>>
>> Maybe.. could be an extension of the existing problem of "What EINVAL
>> you mean exactly?" indeed.
>>
>> I don't see a problem with writing back though?
> 
> Writing anything gives me the heebie-jeebies. If we keep it a read-only
> struct, we can never be tricked into overwriting something important.
> 
> It also makes it harder for userspace to reuse as they have to clear the
> result field?

Yeah.. nothing then.

Shall we only reserve some space with a flags and some rsvd fields just 
in case it will need to change/grow?

Regards,

Tvrtko





_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-13 11:35           ` Tvrtko Ursulin
@ 2019-03-13 11:46             ` Chris Wilson
  2019-03-13 13:11               ` Tvrtko Ursulin
  0 siblings, 1 reply; 58+ messages in thread
From: Chris Wilson @ 2019-03-13 11:46 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-13 11:35:55)
[snip]
> Shall we only reserve some space with a flags and some rsvd fields just 
> in case it will need to change/grow?

The only thing that occurs to me is to exchange the next pointer with a
table of next[] (C++ here we come). But I ask myself, could any
extension like not be part of the next layer?

That is if any particular extension needs to chain up to more than one
iface, it can call each itself:

struct hypothetical_extension {
	struct i915_user_extension base;

	u64 iface1_extension;
	u64 iface2_extension;
	...
	u64 ifaceN_extension;
}

? So far I haven't thought of anything I can't weasel my way out by
punting the problem to the caller :)
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-13 11:46             ` Chris Wilson
@ 2019-03-13 13:11               ` Tvrtko Ursulin
  2019-03-13 13:14                 ` Chris Wilson
  0 siblings, 1 reply; 58+ messages in thread
From: Tvrtko Ursulin @ 2019-03-13 13:11 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx


On 13/03/2019 11:46, Chris Wilson wrote:
> Quoting Tvrtko Ursulin (2019-03-13 11:35:55)
> [snip]
>> Shall we only reserve some space with a flags and some rsvd fields just
>> in case it will need to change/grow?
> 
> The only thing that occurs to me is to exchange the next pointer with a
> table of next[] (C++ here we come). But I ask myself, could any
> extension like not be part of the next layer?
> 
> That is if any particular extension needs to chain up to more than one
> iface, it can call each itself:
> 
> struct hypothetical_extension {
> 	struct i915_user_extension base;
> 
> 	u64 iface1_extension;
> 	u64 iface2_extension;
> 	...
> 	u64 ifaceN_extension;
> }
> 
> ? So far I haven't thought of anything I can't weasel my way out by
> punting the problem to the caller :)

Just top make sure we are on the same page, I was thinking of:

struct i915_user_extension {
	__u64 next_extension;
	__u64 name;
	__u32 flags;
	__u32 rsvd[7];
};

So we could add things like:

/* Store each extension return code in rsvd[0]. */
#define I915_USER_EXTENSION_STORE_RESULT (1)

/* Only check whether extensions are known by the driver. */
#define I915_USER_EXTENSION_DRY_RUN. (2)

And things like that. Because we are putting in a generic extension 
mechanism I am worried that if it itself turns out to have some 
limitation we will not have wiggle room to extend it.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

* Re: [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method
  2019-03-13 13:11               ` Tvrtko Ursulin
@ 2019-03-13 13:14                 ` Chris Wilson
  0 siblings, 0 replies; 58+ messages in thread
From: Chris Wilson @ 2019-03-13 13:14 UTC (permalink / raw)
  To: Tvrtko Ursulin, intel-gfx

Quoting Tvrtko Ursulin (2019-03-13 13:11:09)
> 
> On 13/03/2019 11:46, Chris Wilson wrote:
> > Quoting Tvrtko Ursulin (2019-03-13 11:35:55)
> > [snip]
> >> Shall we only reserve some space with a flags and some rsvd fields just
> >> in case it will need to change/grow?
> > 
> > The only thing that occurs to me is to exchange the next pointer with a
> > table of next[] (C++ here we come). But I ask myself, could any
> > extension like not be part of the next layer?
> > 
> > That is if any particular extension needs to chain up to more than one
> > iface, it can call each itself:
> > 
> > struct hypothetical_extension {
> >       struct i915_user_extension base;
> > 
> >       u64 iface1_extension;
> >       u64 iface2_extension;
> >       ...
> >       u64 ifaceN_extension;
> > }
> > 
> > ? So far I haven't thought of anything I can't weasel my way out by
> > punting the problem to the caller :)
> 
> Just top make sure we are on the same page, I was thinking of:
> 
> struct i915_user_extension {
>         __u64 next_extension;
>         __u64 name;
>         __u32 flags;
>         __u32 rsvd[7];
> };
> 
> So we could add things like:
> 
> /* Store each extension return code in rsvd[0]. */
> #define I915_USER_EXTENSION_STORE_RESULT (1)
> 
> /* Only check whether extensions are known by the driver. */
> #define I915_USER_EXTENSION_DRY_RUN. (2)
> 
> And things like that. Because we are putting in a generic extension 
> mechanism I am worried that if it itself turns out to have some 
> limitation we will not have wiggle room to extend it.

u64 next;
u32 name;
u32 flags;
u32 rsvd[4];

Maybe... That's a cacheline.
-Chris
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2019-03-13 13:14 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-08 14:12 Home straight for veng, the uAPI wars Chris Wilson
2019-03-08 14:12 ` [PATCH 01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Chris Wilson
2019-03-08 14:12 ` [PATCH 02/13] drm/i915: Introduce the i915_user_extension_method Chris Wilson
2019-03-08 14:33   ` Tvrtko Ursulin
2019-03-13 10:50     ` Chris Wilson
2019-03-13 11:13       ` Tvrtko Ursulin
2019-03-13 11:21         ` Chris Wilson
2019-03-13 11:35           ` Tvrtko Ursulin
2019-03-13 11:46             ` Chris Wilson
2019-03-13 13:11               ` Tvrtko Ursulin
2019-03-13 13:14                 ` Chris Wilson
2019-03-08 14:12 ` [PATCH 03/13] drm/i915: Introduce a context barrier callback Chris Wilson
2019-03-08 14:12 ` [PATCH 04/13] drm/i915: Create/destroy VM (ppGTT) for use with contexts Chris Wilson
2019-03-08 15:03   ` Tvrtko Ursulin
2019-03-08 15:35     ` Chris Wilson
2019-03-08 15:41   ` [PATCH v2] " Chris Wilson
2019-03-08 14:12 ` [PATCH 05/13] drm/i915: Extend CONTEXT_CREATE to set parameters upon construction Chris Wilson
2019-03-08 14:12 ` [PATCH 06/13] drm/i915: Allow contexts to share a single timeline across all engines Chris Wilson
2019-03-08 15:56   ` Tvrtko Ursulin
2019-03-08 14:12 ` [PATCH 07/13] drm/i915: Allow userspace to clone contexts on creation Chris Wilson
2019-03-08 16:13   ` Tvrtko Ursulin
2019-03-08 16:34     ` Chris Wilson
2019-03-08 14:12 ` [PATCH 08/13] drm/i915: Allow a context to define its set of engines Chris Wilson
2019-03-08 16:27   ` Tvrtko Ursulin
2019-03-08 16:47     ` Chris Wilson
2019-03-11  9:23       ` Tvrtko Ursulin
2019-03-11  9:45         ` Chris Wilson
2019-03-11 10:12           ` Tvrtko Ursulin
2019-03-11 14:45           ` Chris Wilson
2019-03-11 16:16             ` Tvrtko Ursulin
2019-03-11 16:22               ` Chris Wilson
2019-03-11 16:34                 ` Tvrtko Ursulin
2019-03-11 16:52                   ` Chris Wilson
2019-03-08 14:12 ` [PATCH 09/13] drm/i915: Extend I915_CONTEXT_PARAM_SSEU to support local ctx->engine[] Chris Wilson
2019-03-08 16:31   ` Tvrtko Ursulin
2019-03-08 16:57     ` Chris Wilson
2019-03-11  7:14       ` Tvrtko Ursulin
2019-03-11 10:33         ` Chris Wilson
2019-03-08 17:11     ` Chris Wilson
2019-03-11  7:16       ` Tvrtko Ursulin
2019-03-11 10:31         ` Chris Wilson
2019-03-08 14:12 ` [PATCH 10/13] drm/i915: Load balancing across a virtual engine Chris Wilson
2019-03-11 12:47   ` Tvrtko Ursulin
2019-03-11 13:43     ` Chris Wilson
2019-03-12  7:52   ` Tvrtko Ursulin
2019-03-12  8:56     ` Chris Wilson
2019-03-08 14:12 ` [PATCH 11/13] drm/i915: Extend execution fence to support a callback Chris Wilson
2019-03-11 13:09   ` Tvrtko Ursulin
2019-03-11 14:22     ` Chris Wilson
2019-03-08 14:12 ` [PATCH 12/13] drm/i915/execlists: Virtual engine bonding Chris Wilson
2019-03-11 13:38   ` Tvrtko Ursulin
2019-03-11 14:30     ` Chris Wilson
2019-03-08 14:12 ` [PATCH 13/13] drm/i915: Allow specification of parallel execbuf Chris Wilson
2019-03-11 13:40   ` Tvrtko Ursulin
2019-03-08 14:58 ` ✗ Fi.CI.CHECKPATCH: warning for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio Patchwork
2019-03-08 15:05 ` ✗ Fi.CI.SPARSE: " Patchwork
2019-03-08 15:19 ` ✗ Fi.CI.BAT: failure " Patchwork
2019-03-08 16:47 ` ✗ Fi.CI.BAT: failure for series starting with [01/13] drm/i915: Suppress the "Failed to idle" warning for gem_eio (rev2) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.