All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC PATCH 0/3] SVM for kbl
@ 2017-01-09 16:52 Mika Kuoppala
  2017-01-09 16:52 ` [RFC PATCH 1/3] drm/i915: Create context desc template when context is created Mika Kuoppala
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Mika Kuoppala @ 2017-01-09 16:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku, beignet

Hi,

Now when the blocking problems with iommu layer have been
solved by commits 910170442944e1f8674fd5ddbeeb8ccd1877ea98
and 65ca7f5f7d1cdde6c25172fe6107cd16902f826f it is possible
to test and experiment with this code on KBL.

I have tried to accomodate all the review feedback that
was given by the previous review round (thanks reviewers!).
The context is now created with regular means and then promoted to be
svm context by the context param ioctl as requested by
Chris Wilson. The hw_id vs pasid got resolved as they
are orthogonal things in gen9+. I dropped the bdw support,
and I think before possible merging the skl should be also exluded.
But I kept it for now if there are some brave souls who wants to
try. Development and bugfixing are now kbl+ only.

The big open item from last review round is KMS integration.
I don't have answer to that at this point. Chris asked if this
should be done by creating regular objects and then mmap them for
svm use. Another option could be that with new mmap flag we
could map a range as uncached?

Regardless, for beignet things should already be usable for
experimentation.

I improved the testset but it is still quite thin. Basic tests
work with kbl reliably. The access to gtt through svm doesn't work
(due to special pte used to fault for it). SVM is inherently a
ppgtt concept and trickery to access through it into gtt space feels
akward. Can anyone think any use case for it?

Most fresh snapshot of this code in:
https://cgit.freedesktop.org/~miku/drm-intel/log/?h=svm

and tests:
https://cgit.freedesktop.org/~miku/drm-intel/log/?h=svm

Chris already helped to iron out some bigger wrinkles, thanks!

Here it is and thanks for any feedback,
-Mika

Jesse Barnes (2):
  drm/i915: IOMMU based SVM implementation v16
  drm/i915: add SVM execbuf ioctl v13

Mika Kuoppala (1):
  drm/i915: Create context desc template when context is created

 drivers/gpu/drm/i915/Kconfig               |   1 +
 drivers/gpu/drm/i915/i915_drv.c            |   1 +
 drivers/gpu/drm/i915/i915_drv.h            |   5 +
 drivers/gpu/drm/i915/i915_gem_context.c    |  96 +++++++++++++++-
 drivers/gpu/drm/i915/i915_gem_context.h    |  30 +++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 175 +++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_reg.h            |  16 +++
 drivers/gpu/drm/i915/intel_device_info.c   |   6 +
 drivers/gpu/drm/i915/intel_lrc.c           |  32 ++----
 include/uapi/drm/i915_drm.h                |  45 ++++++++
 10 files changed, 384 insertions(+), 23 deletions(-)

-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [RFC PATCH 1/3] drm/i915: Create context desc template when context is created
  2017-01-09 16:52 [RFC PATCH 0/3] SVM for kbl Mika Kuoppala
@ 2017-01-09 16:52 ` Mika Kuoppala
  2017-01-09 21:22   ` Chris Wilson
  2017-01-09 16:52 ` [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16 Mika Kuoppala
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2017-01-09 16:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: miku, beignet

Move the invariant parts of context desc setup from execlist init
to context creation. This is advantageous when we need to
create different templates based on the context parametrization,
ie. for svm capable contexts.

Cc: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_context.c | 18 ++++++++++++++++--
 drivers/gpu/drm/i915/i915_reg.h         | 15 +++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        | 21 +--------------------
 3 files changed, 32 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 40a6939..dadc845 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -257,6 +257,21 @@ static int assign_hw_id(struct drm_i915_private *dev_priv, unsigned *out)
 	return 0;
 }
 
+static u32 create_default_ctx_desc(const struct drm_i915_private *dev_priv)
+{
+	u32 desc;
+
+	desc = GEN8_CTX_VALID |
+		GEN8_CTX_PRIVILEGE |
+		GEN8_CTX_ADDRESSING_MODE(dev_priv) <<
+		GEN8_CTX_ADDRESSING_MODE_SHIFT;
+
+	if (IS_GEN8(dev_priv))
+		desc |= GEN8_CTX_L3LLC_COHERENT;
+
+	return desc;
+}
+
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
@@ -330,8 +345,7 @@ __create_hw_context(struct drm_i915_private *dev_priv,
 
 	i915_gem_context_set_bannable(ctx);
 	ctx->ring_size = 4 * PAGE_SIZE;
-	ctx->desc_template = GEN8_CTX_ADDRESSING_MODE(dev_priv) <<
-			     GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	ctx->desc_template = create_default_ctx_desc(dev_priv);
 	ATOMIC_INIT_NOTIFIER_HEAD(&ctx->status_notifier);
 
 	/* GuC requires the ring to be placed above GUC_WOPCM_TOP. If GuC is not
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 00970aa..36f4146 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3358,11 +3358,26 @@ enum {
 	INTEL_LEGACY_64B_CONTEXT
 };
 
+enum {
+	FAULT_AND_HANG = 0,
+	FAULT_AND_HALT, /* Debug only */
+	FAULT_AND_STREAM,
+	FAULT_AND_CONTINUE /* Unsupported */
+};
+
+#define GEN8_CTX_VALID (1<<0)
+#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
+#define GEN8_CTX_FORCE_RESTORE (1<<2)
+#define GEN8_CTX_L3LLC_COHERENT (1<<5)
+#define GEN8_CTX_PRIVILEGE (1<<8)
 #define GEN8_CTX_ADDRESSING_MODE_SHIFT 3
 #define GEN8_CTX_ADDRESSING_MODE(dev_priv) (USES_FULL_48BIT_PPGTT(dev_priv) ?\
 				INTEL_LEGACY_64B_CONTEXT : \
 				INTEL_LEGACY_32B_CONTEXT)
 
+#define GEN8_CTX_ID_SHIFT 32
+#define GEN8_CTX_ID_WIDTH 21
+
 #define CHV_CLK_CTL1			_MMIO(0x101100)
 #define VLV_CLK_CTL2			_MMIO(0x101104)
 #define   CLK_CTL2_CZCOUNT_30NS_SHIFT	28
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 6db246a..06322a5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -190,12 +190,6 @@
 #define CTX_R_PWR_CLK_STATE		0x42
 #define CTX_GPGPU_CSR_BASE_ADDRESS	0x44
 
-#define GEN8_CTX_VALID (1<<0)
-#define GEN8_CTX_FORCE_PD_RESTORE (1<<1)
-#define GEN8_CTX_FORCE_RESTORE (1<<2)
-#define GEN8_CTX_L3LLC_COHERENT (1<<5)
-#define GEN8_CTX_PRIVILEGE (1<<8)
-
 #define ASSIGN_CTX_REG(reg_state, pos, reg, val) do { \
 	(reg_state)[(pos)+0] = i915_mmio_reg_offset(reg); \
 	(reg_state)[(pos)+1] = (val); \
@@ -212,14 +206,6 @@
 	reg_state[CTX_PDP0_LDW + 1] = lower_32_bits(px_dma(&ppgtt->pml4)); \
 } while (0)
 
-enum {
-	FAULT_AND_HANG = 0,
-	FAULT_AND_HALT, /* Debug only */
-	FAULT_AND_STREAM,
-	FAULT_AND_CONTINUE /* Unsupported */
-};
-#define GEN8_CTX_ID_SHIFT 32
-#define GEN8_CTX_ID_WIDTH 21
 #define GEN8_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT	0x17
 #define GEN9_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT	0x26
 
@@ -276,11 +262,6 @@ logical_ring_init_platform_invariants(struct intel_engine_cs *engine)
 		IS_BXT_REVID(dev_priv, 0, BXT_REVID_A1) &&
 		(engine->id == VCS || engine->id == VCS2);
 
-	engine->ctx_desc_template = GEN8_CTX_VALID;
-	if (IS_GEN8(dev_priv))
-		engine->ctx_desc_template |= GEN8_CTX_L3LLC_COHERENT;
-	engine->ctx_desc_template |= GEN8_CTX_PRIVILEGE;
-
 	/* TODO: WaDisableLiteRestore when we start using semaphore
 	 * signalling between Command Streamers */
 	/* ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE; */
@@ -319,7 +300,7 @@ intel_lr_context_descriptor_update(struct i915_gem_context *ctx,
 
 	BUILD_BUG_ON(MAX_CONTEXT_HW_ID > (1<<GEN8_CTX_ID_WIDTH));
 
-	desc = ctx->desc_template;				/* bits  3-4  */
+	desc = ctx->desc_template;				/* bits  0-11 */
 	desc |= engine->ctx_desc_template;			/* bits  0-11 */
 	desc |= i915_ggtt_offset(ce->state) + LRC_PPHWSP_PN * PAGE_SIZE;
 								/* bits 12-31 */
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16
  2017-01-09 16:52 [RFC PATCH 0/3] SVM for kbl Mika Kuoppala
  2017-01-09 16:52 ` [RFC PATCH 1/3] drm/i915: Create context desc template when context is created Mika Kuoppala
@ 2017-01-09 16:52 ` Mika Kuoppala
  2017-01-09 21:18   ` Chris Wilson
  2017-01-09 16:52 ` [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13 Mika Kuoppala
  2017-01-09 17:54 ` ✗ Fi.CI.BAT: failure for SVM for kbl Patchwork
  3 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2017-01-09 16:52 UTC (permalink / raw)
  To: intel-gfx
  Cc: David Woodhouse, Daniel Vetter, Jesse Barnes, miku, beignet,
	David Woodhouse

From: Jesse Barnes <jbarnes@virtuousgeek.org>

Use David's new IOMMU layer functions for supporting SVM in i915.

TODO:
  error record collection for failing SVM contexts
  callback handling for fatal faults
  scheduling

v2: integrate David's core IOMMU support
    make sure we don't clobber the PASID in the context reg state
v3: fixup for intel-svm.h changes (David)
v4: use fault & halt for now (Jesse)
    fix ring free in error path on context alloc (Julia)
v5: update with new callback struct (Jesse)
v6: fix init svm check per new IOMMU code (Jesse)
v7: drop debug code and obsolete i915_svm.c file (Jesse)
v8: fix !CONFIG_INTEL_IOMMU_SVM init stub (Jesse)
v9: update to new execlist and reg handling bits (Jesse)
    context teardown fix (lrc deferred alloc vs teardown race?) (Jesse)
    check for SVM availability at context create (Jesse)
v10: intel_context_svm_init/fini & rebase
v11: move context specific stuff to i915_gem_context
v12: move addressing to context descriptor
v13: strip out workqueue and mm notifiers
v14: remove fault cb
v15: streamline execlist context pdps update, add asserts
v16: introduce HAS_SVM

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Cc: David Woodhouse <dwmw2@infradead.org>
Signed-off-by: David Woodhouse <David.Woodhouse@intel.com> (v3)
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v9)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h          |  3 ++
 drivers/gpu/drm/i915/i915_gem_context.c  | 78 ++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.h  | 30 ++++++++++++
 drivers/gpu/drm/i915/i915_reg.h          |  1 +
 drivers/gpu/drm/i915/intel_device_info.c |  6 +++
 drivers/gpu/drm/i915/intel_lrc.c         | 11 ++++-
 include/uapi/drm/i915_drm.h              |  1 +
 7 files changed, 129 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 52d01be..ca1eaaf 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -39,6 +39,7 @@
 #include <linux/backlight.h>
 #include <linux/hashtable.h>
 #include <linux/intel-iommu.h>
+#include <linux/intel-svm.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
 #include <linux/reservation.h>
@@ -787,6 +788,7 @@ struct intel_csr {
 	func(has_fpga_dbg); \
 	func(has_full_ppgtt); \
 	func(has_full_48bit_ppgtt); \
+	func(has_svm); \
 	func(has_gmbus_irq); \
 	func(has_gmch_display); \
 	func(has_guc); \
@@ -2784,6 +2786,7 @@ intel_info(const struct drm_i915_private *dev_priv)
 #define USES_PPGTT(dev_priv)		(i915.enable_ppgtt)
 #define USES_FULL_PPGTT(dev_priv)	(i915.enable_ppgtt >= 2)
 #define USES_FULL_48BIT_PPGTT(dev_priv)	(i915.enable_ppgtt == 3)
+#define HAS_SVM(dev_priv)		((dev_priv)->info.has_svm)
 
 #define HAS_OVERLAY(dev_priv)		 ((dev_priv)->info.has_overlay)
 #define OVERLAY_NEEDS_PHYSICAL(dev_priv) \
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index dadc845..c0f6c9e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -134,6 +134,16 @@ static int get_context_size(struct drm_i915_private *dev_priv)
 	return ret;
 }
 
+static void i915_gem_context_svm_fini(struct i915_gem_context *ctx)
+{
+	struct device *dev = &ctx->i915->drm.pdev->dev;
+
+	GEM_BUG_ON(!ctx->task);
+
+	intel_svm_unbind_mm(dev, ctx->pasid);
+	put_task_struct(ctx->task);
+}
+
 void i915_gem_context_free(struct kref *ctx_ref)
 {
 	struct i915_gem_context *ctx = container_of(ctx_ref, typeof(*ctx), ref);
@@ -143,6 +153,9 @@ void i915_gem_context_free(struct kref *ctx_ref)
 	trace_i915_context_free(ctx);
 	GEM_BUG_ON(!i915_gem_context_is_closed(ctx));
 
+	if (i915_gem_context_is_svm(ctx))
+		i915_gem_context_svm_fini(ctx);
+
 	i915_ppgtt_put(ctx->ppgtt);
 
 	for (i = 0; i < I915_NUM_ENGINES; i++) {
@@ -272,6 +285,61 @@ static u32 create_default_ctx_desc(const struct drm_i915_private *dev_priv)
 	return desc;
 }
 
+static u32 create_svm_ctx_desc(void)
+{
+	/*
+	 * Switch to stream once we have a scheduler and can
+	 * re-submit contexts.
+	 */
+	return GEN8_CTX_VALID |
+		INTEL_ADVANCED_CONTEXT << GEN8_CTX_ADDRESSING_MODE_SHIFT |
+		FAULT_AND_HALT << GEN8_CTX_FAULT_SHIFT;
+}
+
+static int i915_gem_context_svm_init(struct i915_gem_context *ctx)
+{
+	struct device *dev = &ctx->i915->drm.pdev->dev;
+	int ret;
+
+	GEM_BUG_ON(ctx->task);
+
+	if (!HAS_SVM(ctx->i915))
+		return -ENODEV;
+
+	get_task_struct(current);
+
+	ret = intel_svm_bind_mm(dev, &ctx->pasid, 0, NULL);
+	if (ret) {
+		DRM_DEBUG_DRIVER("pasid alloc fail: %d\n", ret);
+		put_task_struct(current);
+		return ret;
+	}
+
+	ctx->task = current;
+	i915_gem_context_set_svm(ctx);
+
+	return 0;
+}
+
+static int i915_gem_context_enable_svm(struct i915_gem_context *ctx)
+{
+	int ret;
+
+	if (!HAS_SVM(ctx->i915))
+		return -ENODEV;
+
+	if (i915_gem_context_is_svm(ctx))
+		return -EINVAL;
+
+	ret = i915_gem_context_svm_init(ctx);
+	if (ret)
+		return ret;
+
+	ctx->desc_template = create_svm_ctx_desc();
+
+	return 0;
+}
+
 static struct i915_gem_context *
 __create_hw_context(struct drm_i915_private *dev_priv,
 		    struct drm_i915_file_private *file_priv)
@@ -1071,6 +1139,9 @@ int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
 	case I915_CONTEXT_PARAM_BANNABLE:
 		args->value = i915_gem_context_is_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_SVM:
+		args->value = i915_gem_context_is_svm(ctx);
+		break;
 	default:
 		ret = -EINVAL;
 		break;
@@ -1128,6 +1199,13 @@ int i915_gem_context_setparam_ioctl(struct drm_device *dev, void *data,
 		else
 			i915_gem_context_clear_bannable(ctx);
 		break;
+	case I915_CONTEXT_PARAM_SVM:
+		/* There is no coming back once svm is enabled */
+		if (args->value && !args->size)
+			ret = i915_gem_context_enable_svm(ctx);
+		else
+			ret = -EINVAL;
+		break;
 	default:
 		ret = -EINVAL;
 		break;
diff --git a/drivers/gpu/drm/i915/i915_gem_context.h b/drivers/gpu/drm/i915/i915_gem_context.h
index 0ac750b..2514abb 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.h
+++ b/drivers/gpu/drm/i915/i915_gem_context.h
@@ -108,6 +108,7 @@ struct i915_gem_context {
 #define CONTEXT_BANNABLE		3
 #define CONTEXT_BANNED			4
 #define CONTEXT_FORCE_SINGLE_SUBMISSION	5
+#define CONTEXT_SVM			6
 
 	/**
 	 * @hw_id: - unique identifier for the context
@@ -140,6 +141,25 @@ struct i915_gem_context {
 	 */
 	int priority;
 
+	/**
+	 * @pasid: process address space identifier
+	 *
+	 * Unique identifier for the shared address space with cpu.
+	 * Used by the gpu to associate context's (ppgtt) address
+	 * space with the corresponding process's address space for
+	 * Shared Virtual Memory (SVM). 20 bits.
+	 */
+	u32 pasid;
+
+	/**
+	 * @task: user space task struct for this context
+	 *
+	 * If this is svm context, @task is the corresponding
+	 * user space process with which we share the vm. See
+	 * @pasid.
+	 */
+	struct task_struct *task;
+
 	/** ggtt_alignment: alignment restriction for context objects */
 	u32 ggtt_alignment;
 	/** ggtt_offset_bias: placement restriction for context objects */
@@ -241,6 +261,16 @@ static inline void i915_gem_context_set_force_single_submission(struct i915_gem_
 	__set_bit(CONTEXT_FORCE_SINGLE_SUBMISSION, &ctx->flags);
 }
 
+static inline bool i915_gem_context_is_svm(const struct i915_gem_context *ctx)
+{
+	return test_bit(CONTEXT_SVM, &ctx->flags);
+}
+
+static inline void i915_gem_context_set_svm(struct i915_gem_context *ctx)
+{
+	__set_bit(CONTEXT_SVM, &ctx->flags);
+}
+
 static inline bool i915_gem_context_is_default(const struct i915_gem_context *c)
 {
 	return c->user_handle == DEFAULT_CONTEXT_HANDLE;
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 36f4146..eff0eef 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -3375,6 +3375,7 @@ enum {
 				INTEL_LEGACY_64B_CONTEXT : \
 				INTEL_LEGACY_32B_CONTEXT)
 
+#define GEN8_CTX_FAULT_SHIFT 6
 #define GEN8_CTX_ID_SHIFT 32
 #define GEN8_CTX_ID_WIDTH 21
 
diff --git a/drivers/gpu/drm/i915/intel_device_info.c b/drivers/gpu/drm/i915/intel_device_info.c
index f642f6d..2878bd3 100644
--- a/drivers/gpu/drm/i915/intel_device_info.c
+++ b/drivers/gpu/drm/i915/intel_device_info.c
@@ -414,6 +414,10 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
 	if (IS_BXT_REVID(dev_priv, 0, BXT_REVID_A1))
 		info->has_snoop = false;
 
+	info->has_svm = INTEL_GEN(dev_priv) >= 9 &&
+		info->has_full_48bit_ppgtt &&
+		intel_svm_available(&dev_priv->drm.pdev->dev);
+
 	DRM_DEBUG_DRIVER("slice mask: %04x\n", info->sseu.slice_mask);
 	DRM_DEBUG_DRIVER("slice total: %u\n", hweight8(info->sseu.slice_mask));
 	DRM_DEBUG_DRIVER("subslice total: %u\n",
@@ -429,4 +433,6 @@ void intel_device_info_runtime_init(struct drm_i915_private *dev_priv)
 			 info->sseu.has_subslice_pg ? "y" : "n");
 	DRM_DEBUG_DRIVER("has EU power gating: %s\n",
 			 info->sseu.has_eu_pg ? "y" : "n");
+	DRM_DEBUG_DRIVER("has svm: %s\n",
+			 info->has_svm ? "y" : "n");
 }
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 06322a5..13c9c56 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -209,6 +209,13 @@
 #define GEN8_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT	0x17
 #define GEN9_CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT	0x26
 
+#define ASSIGN_CTX_SVM(reg_state, ctx, engine) do { \
+		ASSIGN_CTX_REG(reg_state, CTX_PDP0_UDW, \
+			       GEN8_RING_PDP_UDW((engine), 0), 0); \
+		ASSIGN_CTX_REG(reg_state, CTX_PDP0_LDW, \
+			       GEN8_RING_PDP_LDW((engine), 0), (ctx)->pasid); \
+} while (0)
+
 /* Typical size of the average request (2 pipecontrols and a MI_BB) */
 #define EXECLISTS_REQUEST_SIZE 64 /* bytes */
 
@@ -2084,7 +2091,9 @@ static void execlists_init_reg_state(u32 *reg_state,
 	ASSIGN_CTX_REG(reg_state, CTX_PDP0_LDW, GEN8_RING_PDP_LDW(engine, 0),
 		       0);
 
-	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+	if (i915_gem_context_is_svm(ctx)) {
+		ASSIGN_CTX_SVM(reg_state, ctx, engine);
+	} else if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		/* 64b PPGTT (48bit canonical)
 		 * PDP0_DESCRIPTOR contains the base address to PML4 and
 		 * other PDP Descriptors are ignored.
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index da32c2f..5b06ab1 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -1227,6 +1227,7 @@ struct drm_i915_gem_context_param {
 #define I915_CONTEXT_PARAM_GTT_SIZE	0x3
 #define I915_CONTEXT_PARAM_NO_ERROR_CAPTURE	0x4
 #define I915_CONTEXT_PARAM_BANNABLE	0x5
+#define I915_CONTEXT_PARAM_SVM		0x6
 	__u64 value;
 };
 
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13
  2017-01-09 16:52 [RFC PATCH 0/3] SVM for kbl Mika Kuoppala
  2017-01-09 16:52 ` [RFC PATCH 1/3] drm/i915: Create context desc template when context is created Mika Kuoppala
  2017-01-09 16:52 ` [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16 Mika Kuoppala
@ 2017-01-09 16:52 ` Mika Kuoppala
  2017-01-09 21:09   ` Chris Wilson
  2017-01-09 17:54 ` ✗ Fi.CI.BAT: failure for SVM for kbl Patchwork
  3 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2017-01-09 16:52 UTC (permalink / raw)
  To: intel-gfx; +Cc: Daniel Vetter, Jesse Barnes, miku, beignet

From: Jesse Barnes <jbarnes@virtuousgeek.org>

We just need to pass in an address to execute and some flags, since we
don't have to worry about buffer relocation or any of the other usual
stuff.  Takes in a fance and returns a fence to be used for
synchronization.

v2: add a request after batch submission (Jesse)
v3: add a flag for fence creation (Chris)
v4: add CLOEXEC flag (Kristian)
    add non-RCS ring support (Jesse)
v5: update for request alloc change (Jesse)
v6: new sync file interface, error paths, request breadcrumbs
v7: always CLOEXEC for sync_file_install
v8: rebase on new sync file api
v9: rework on top of fence requests and sync_file
v10: take fence ref for sync_file (Chris)
     use correct flush (Chris)
     limit exec on rcs
v11: allow exec on all engines
v12: dma_fence_put/get, fix error path on getting sync_file
v13: add in fences, removed superfluous dma_fence_put/get

Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v5)
Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
---
 drivers/gpu/drm/i915/Kconfig               |   1 +
 drivers/gpu/drm/i915/i915_drv.c            |   1 +
 drivers/gpu/drm/i915/i915_drv.h            |   2 +
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 175 +++++++++++++++++++++++++++++
 include/uapi/drm/i915_drm.h                |  44 ++++++++
 5 files changed, 223 insertions(+)

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 183f5dc..387990d 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -8,6 +8,7 @@ config DRM_I915
 	# the shmem_readpage() which depends upon tmpfs
 	select SHMEM
 	select TMPFS
+	select SYNC_FILE
 	select DRM_KMS_HELPER
 	select DRM_PANEL
 	select DRM_MIPI_DSI
diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
index 4d22b4b..2276ba6 100644
--- a/drivers/gpu/drm/i915/i915_drv.c
+++ b/drivers/gpu/drm/i915/i915_drv.c
@@ -2567,6 +2567,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_EXEC_MM, i915_gem_exec_mm, DRM_RENDER_ALLOW),
 };
 
 static struct drm_driver driver = {
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ca1eaaf..8436ea3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3094,6 +3094,8 @@ int i915_gem_execbuffer(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_execbuffer2(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv);
+int i915_gem_exec_mm(struct drm_device *dev, void *data,
+		     struct drm_file *file);
 int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
 			struct drm_file *file_priv);
 int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a5fe299..bb51ce4 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -29,6 +29,7 @@
 #include <linux/dma_remapping.h>
 #include <linux/reservation.h>
 #include <linux/uaccess.h>
+#include <linux/sync_file.h>
 
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
@@ -1984,3 +1985,177 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 	drm_free_large(exec2_list);
 	return ret;
 }
+
+static struct intel_engine_cs *
+exec_mm_select_engine(const struct drm_i915_private *dev_priv,
+		      const struct drm_i915_exec_mm *exec_mm)
+{
+	const unsigned int user_ring_id = exec_mm->ring_id;
+	struct intel_engine_cs *engine;
+
+	if (user_ring_id > I915_USER_RINGS) {
+		DRM_DEBUG("exec_mm with unknown ring: %u\n", user_ring_id);
+		return NULL;
+	}
+
+	engine = dev_priv->engine[user_ring_map[user_ring_id]];
+
+	if (!engine) {
+		DRM_DEBUG("exec_mm with invalid engine: %u\n", user_ring_id);
+		return NULL;
+	}
+
+	return engine;
+}
+
+static int do_exec_mm(struct drm_i915_exec_mm *exec_mm,
+		      struct drm_i915_gem_request *req,
+		      const u32 flags)
+{
+	struct sync_file *out_fence = NULL;
+	struct dma_fence *in_fence = NULL;
+	int out_fence_fd = -1;
+	int ret;
+
+	if (flags & I915_EXEC_MM_FENCE_IN) {
+		in_fence = sync_file_get_fence(exec_mm->in_fence_fd);
+		if (!in_fence)
+			return -EINVAL;
+
+		ret = i915_gem_request_await_dma_fence(req, in_fence);
+		if (ret < 0)
+			goto err_out;
+	}
+
+	if (flags & I915_EXEC_MM_FENCE_OUT) {
+		out_fence = sync_file_create(&req->fence);
+		if (!out_fence) {
+			DRM_DEBUG_DRIVER("sync_file_create failed\n");
+			ret = -ENOMEM;
+			goto err_out;
+		}
+
+		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
+		if (out_fence_fd < 0) {
+			ret = out_fence_fd;
+			out_fence_fd = -1;
+			goto err_out;
+		}
+
+		fd_install(out_fence_fd, out_fence->file);
+		exec_mm->out_fence_fd = out_fence_fd;
+	}
+
+	ret = req->engine->emit_flush(req, EMIT_INVALIDATE);
+	if (ret) {
+		DRM_DEBUG_DRIVER("engine flush failed: %d\n", ret);
+		goto err_out;
+	}
+
+	ret = req->engine->emit_bb_start(req, exec_mm->batch_ptr, 0, 0);
+	if (ret) {
+		DRM_DEBUG_DRIVER("engine dispatch execbuf failed: %d\n", ret);
+		goto err_out;
+	}
+
+	return 0;
+
+err_out:
+	if (out_fence_fd != -1)
+		put_unused_fd(out_fence_fd);
+
+	if (out_fence)
+		fput(out_fence->file);
+
+	if (in_fence)
+		dma_fence_put(in_fence);
+
+	return ret;
+}
+
+int i915_gem_exec_mm(struct drm_device *dev, void *data, struct drm_file *file)
+{
+	struct drm_i915_private *dev_priv = to_i915(dev);
+	struct drm_i915_exec_mm *exec_mm = data;
+	struct intel_engine_cs *engine;
+	struct i915_gem_context *ctx;
+	struct drm_i915_gem_request *req;
+	const u32 ctx_id = exec_mm->ctx_id;
+	const u32 flags = exec_mm->flags;
+	int ret;
+
+	if (exec_mm->batch_ptr & 3) {
+		DRM_DEBUG_DRIVER("ptr not 4-byt aligned %llu\n",
+				 exec_mm->batch_ptr);
+		return -EINVAL;
+	}
+
+	if (flags & ~(I915_EXEC_MM_FENCE_OUT | I915_EXEC_MM_FENCE_IN)) {
+		DRM_DEBUG_DRIVER("bad flags: 0x%08x\n", flags);
+		return -EINVAL;
+	}
+
+	if (exec_mm->pad != 0) {
+		DRM_DEBUG_DRIVER("bad pad: 0x%08x\n", exec_mm->pad);
+		return -EINVAL;
+	}
+
+	if (file == NULL)
+		return -EINVAL;
+
+	engine = exec_mm_select_engine(dev_priv, exec_mm);
+	if (!engine)
+		return -EINVAL;
+
+	intel_runtime_pm_get(dev_priv);
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret) {
+		DRM_ERROR("mutex interrupted\n");
+		goto pm_put;
+	}
+
+	ctx = i915_gem_validate_context(dev, file, engine, ctx_id);
+	if (IS_ERR(ctx)) {
+		ret = PTR_ERR(ctx);
+		DRM_DEBUG_DRIVER("couldn't get context\n");
+		goto unlock;
+	}
+
+	if (!i915_gem_context_is_svm(ctx)) {
+		ret = -EINVAL;
+		DRM_DEBUG_DRIVER("context is not SVM enabled\n");
+		goto unlock;
+	}
+
+	i915_gem_context_get(ctx);
+
+	req = i915_gem_request_alloc(engine, ctx);
+	if (IS_ERR(req)) {
+		ret = PTR_ERR(req);
+		goto ctx_put;
+	}
+
+	ret = i915_gem_request_add_to_client(req, file);
+	if (ret) {
+		DRM_DEBUG_DRIVER("failed to add request to client, %d\n", ret);
+		goto add_request;
+	}
+
+	/* If we fail here, we still need to submit the req */
+	ret = do_exec_mm(exec_mm, req, flags);
+
+add_request:
+	i915_add_request(req);
+
+ctx_put:
+	i915_gem_context_put(ctx);
+
+unlock:
+	mutex_unlock(&dev->struct_mutex);
+
+pm_put:
+	intel_runtime_pm_put(dev_priv);
+
+	return ret;
+}
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 5b06ab1..843f943 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -259,6 +259,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
 #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
 #define DRM_I915_PERF_OPEN		0x36
+#define DRM_I915_GEM_EXEC_MM		0x37
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -313,6 +314,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_PERF_OPEN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
+#define DRM_IOCTL_I915_GEM_EXEC_MM			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_EXEC_MM, struct drm_i915_exec_mm)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1363,6 +1365,48 @@ enum drm_i915_perf_record_type {
 	DRM_I915_PERF_RECORD_MAX /* non-ABI */
 };
 
+/**
+ * drm_i915_exec_mm - shared address space execbuf
+ * @batch_ptr: address of batch buffer (in context's CPU address space)
+ * @ctx_id: context to use for execution, must be svm enabled
+ * @ring_id: ring to which this context will be submitted, see execbuffer2
+ * @flags: used to signal if in/out fences should be used
+ * @out_fence_fd: returned fence handle of out fence you can wait on
+ * @in_fence_fd: given fence handle of fence the gpu will wait for
+ * @pad: padding, must be zero
+ *
+ * This simplified execbuf just executes an MI_BATCH_BUFFER_START at
+ * @batch_ptr using @ctx_id as the context.  The context will indicate
+ * which address space the @batch_ptr will use.
+ *
+ * Note @batch_ptr must be dword aligned.
+ *
+ * By default, the kernel will simply execute the address given on the GPU.
+ *
+ * If the %I915_EXEC_MM_FENCE_OUT flag is passed in the @flags field however,
+ * the kernel will return a sync_file (dma_fence) object in @out_fence_fd for
+ * the caller to use to synchronize execution of the passed batch.
+ *
+ * If the %I915_EXEC_MM_FENCE_IN flag is passed in the @flags field,
+ * the kernel will wait until the fence (dma_fence) object passed in
+ * @in_fence_fd to complete before submitting batch to gpu.
+ *
+ */
+struct drm_i915_exec_mm {
+	__u64 batch_ptr;
+	__u32 ctx_id;
+	__u32 ring_id; /* see execbuffer2 ring flags */
+
+#define I915_EXEC_MM_FENCE_OUT (1<<0)
+#define I915_EXEC_MM_FENCE_IN  (1<<1)
+	__u32 flags;
+
+	__u32 out_fence_fd;
+	__u32 in_fence_fd;
+
+	__u32 pad;
+};
+
 #if defined(__cplusplus)
 }
 #endif
-- 
2.7.4

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* ✗ Fi.CI.BAT: failure for SVM for kbl
  2017-01-09 16:52 [RFC PATCH 0/3] SVM for kbl Mika Kuoppala
                   ` (2 preceding siblings ...)
  2017-01-09 16:52 ` [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13 Mika Kuoppala
@ 2017-01-09 17:54 ` Patchwork
  3 siblings, 0 replies; 13+ messages in thread
From: Patchwork @ 2017-01-09 17:54 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx

== Series Details ==

Series: SVM for kbl
URL   : https://patchwork.freedesktop.org/series/17710/
State : failure

== Summary ==

Series 17710v1 SVM for kbl
https://patchwork.freedesktop.org/api/1.0/series/17710/revisions/1/mbox/

Test kms_force_connector_basic:
        Subgroup force-connector-state:
                dmesg-warn -> SKIP       (fi-snb-2520m)
        Subgroup prune-stale-modes:
                skip       -> PASS       (fi-ivb-3520m)
Test pm_rpm:
        Subgroup basic-pci-d3-state:
                pass       -> INCOMPLETE (fi-byt-n2820)

fi-bdw-5557u     total:246  pass:232  dwarn:0   dfail:0   fail:0   skip:14 
fi-bsw-n3050     total:246  pass:207  dwarn:0   dfail:0   fail:0   skip:39 
fi-bxt-j4205     total:246  pass:224  dwarn:0   dfail:0   fail:0   skip:22 
fi-byt-j1900     total:246  pass:219  dwarn:0   dfail:0   fail:0   skip:27 
fi-byt-n2820     total:214  pass:182  dwarn:0   dfail:0   fail:0   skip:31 
fi-hsw-4770      total:246  pass:227  dwarn:0   dfail:0   fail:0   skip:19 
fi-hsw-4770r     total:246  pass:227  dwarn:0   dfail:0   fail:0   skip:19 
fi-ivb-3520m     total:246  pass:225  dwarn:0   dfail:0   fail:0   skip:21 
fi-ivb-3770      total:246  pass:225  dwarn:0   dfail:0   fail:0   skip:21 
fi-kbl-7500u     total:246  pass:225  dwarn:0   dfail:0   fail:0   skip:21 
fi-skl-6260u     total:246  pass:233  dwarn:0   dfail:0   fail:0   skip:13 
fi-skl-6700hq    total:246  pass:226  dwarn:0   dfail:0   fail:0   skip:20 
fi-skl-6700k     total:246  pass:222  dwarn:3   dfail:0   fail:0   skip:21 
fi-skl-6770hq    total:246  pass:233  dwarn:0   dfail:0   fail:0   skip:13 
fi-snb-2520m     total:246  pass:214  dwarn:0   dfail:0   fail:0   skip:32 
fi-snb-2600      total:246  pass:214  dwarn:0   dfail:0   fail:0   skip:32 
fi-bxt-t5700 failed to collect. IGT log at Patchwork_3456/fi-bxt-t5700/igt.log

9ea6a075c23ea914695d57944c0e74cff0c6bff4 drm-tip: 2017y-01m-09d-16h-11m-21s UTC integration manifest
b3f1e06 drm/i915: add SVM execbuf ioctl v13
36cada1 drm/i915: IOMMU based SVM implementation v16
082e27d drm/i915: Create context desc template when context is created

== Logs ==

For more details see: https://intel-gfx-ci.01.org/CI/Patchwork_3456/
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13
  2017-01-09 16:52 ` [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13 Mika Kuoppala
@ 2017-01-09 21:09   ` Chris Wilson
  2017-02-07 12:11     ` Mika Kuoppala
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2017-01-09 21:09 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: Daniel Vetter, intel-gfx, Jesse Barnes, miku, beignet

On Mon, Jan 09, 2017 at 06:52:54PM +0200, Mika Kuoppala wrote:
> From: Jesse Barnes <jbarnes@virtuousgeek.org>
> 
> We just need to pass in an address to execute and some flags, since we
> don't have to worry about buffer relocation or any of the other usual
> stuff.  Takes in a fance and returns a fence to be used for
> synchronization.
> 
> v2: add a request after batch submission (Jesse)
> v3: add a flag for fence creation (Chris)
> v4: add CLOEXEC flag (Kristian)
>     add non-RCS ring support (Jesse)
> v5: update for request alloc change (Jesse)
> v6: new sync file interface, error paths, request breadcrumbs
> v7: always CLOEXEC for sync_file_install
> v8: rebase on new sync file api
> v9: rework on top of fence requests and sync_file
> v10: take fence ref for sync_file (Chris)
>      use correct flush (Chris)
>      limit exec on rcs
> v11: allow exec on all engines
> v12: dma_fence_put/get, fix error path on getting sync_file
> v13: add in fences, removed superfluous dma_fence_put/get
> 
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v5)
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/Kconfig               |   1 +
>  drivers/gpu/drm/i915/i915_drv.c            |   1 +
>  drivers/gpu/drm/i915/i915_drv.h            |   2 +
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 175 +++++++++++++++++++++++++++++
>  include/uapi/drm/i915_drm.h                |  44 ++++++++
>  5 files changed, 223 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
> index 183f5dc..387990d 100644
> --- a/drivers/gpu/drm/i915/Kconfig
> +++ b/drivers/gpu/drm/i915/Kconfig
> @@ -8,6 +8,7 @@ config DRM_I915
>  	# the shmem_readpage() which depends upon tmpfs
>  	select SHMEM
>  	select TMPFS
> +	select SYNC_FILE

Duplicate.

>  	select DRM_KMS_HELPER
>  	select DRM_PANEL
>  	select DRM_MIPI_DSI
> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
> index 4d22b4b..2276ba6 100644
> --- a/drivers/gpu/drm/i915/i915_drv.c
> +++ b/drivers/gpu/drm/i915/i915_drv.c
> @@ -2567,6 +2567,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
>  	DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW),
> +	DRM_IOCTL_DEF_DRV(I915_GEM_EXEC_MM, i915_gem_exec_mm, DRM_RENDER_ALLOW),
>  };
>  
>  static struct drm_driver driver = {
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ca1eaaf..8436ea3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -3094,6 +3094,8 @@ int i915_gem_execbuffer(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
>  int i915_gem_execbuffer2(struct drm_device *dev, void *data,
>  			 struct drm_file *file_priv);
> +int i915_gem_exec_mm(struct drm_device *dev, void *data,
> +		     struct drm_file *file);
>  int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>  			struct drm_file *file_priv);
>  int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index a5fe299..bb51ce4 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -29,6 +29,7 @@
>  #include <linux/dma_remapping.h>
>  #include <linux/reservation.h>
>  #include <linux/uaccess.h>
> +#include <linux/sync_file.h>

Out of date.
  
>  #include <drm/drmP.h>
>  #include <drm/i915_drm.h>
> @@ -1984,3 +1985,177 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
>  	drm_free_large(exec2_list);
>  	return ret;
>  }
> +
> +static struct intel_engine_cs *
> +exec_mm_select_engine(const struct drm_i915_private *dev_priv,
> +		      const struct drm_i915_exec_mm *exec_mm)
> +{
> +	const unsigned int user_ring_id = exec_mm->ring_id;
> +	struct intel_engine_cs *engine;
> +
> +	if (user_ring_id > I915_USER_RINGS) {
> +		DRM_DEBUG("exec_mm with unknown ring: %u\n", user_ring_id);
> +		return NULL;
> +	}
> +
> +	engine = dev_priv->engine[user_ring_map[user_ring_id]];
> +
> +	if (!engine) {
> +		DRM_DEBUG("exec_mm with invalid engine: %u\n", user_ring_id);
> +		return NULL;
> +	}
> +
> +	return engine;
> +}

I would rather we reuse eb_select_engine(). We know we have to fix the
ABI anyway, so might as well kill 2 birds with one stone.

> +static int do_exec_mm(struct drm_i915_exec_mm *exec_mm,
> +		      struct drm_i915_gem_request *req,
> +		      const u32 flags)
> +{
> +	struct sync_file *out_fence = NULL;
> +	struct dma_fence *in_fence = NULL;
> +	int out_fence_fd = -1;
> +	int ret;
> +
> +	if (flags & I915_EXEC_MM_FENCE_IN) {
> +		in_fence = sync_file_get_fence(exec_mm->in_fence_fd);
> +		if (!in_fence)
> +			return -EINVAL;
> +
> +		ret = i915_gem_request_await_dma_fence(req, in_fence);

You can drop the fence ref immediately.

		dma_fence_put(in_fence);
> +		if (ret < 0)
> +			goto err_out;
> +	}
> +
> +	if (flags & I915_EXEC_MM_FENCE_OUT) {
> +		out_fence = sync_file_create(&req->fence);
> +		if (!out_fence) {
> +			DRM_DEBUG_DRIVER("sync_file_create failed\n");
> +			ret = -ENOMEM;
> +			goto err_out;
> +		}
> +
> +		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
> +		if (out_fence_fd < 0) {
> +			ret = out_fence_fd;
> +			out_fence_fd = -1;
> +			goto err_out;
> +		}
> +
> +		fd_install(out_fence_fd, out_fence->file);
> +		exec_mm->out_fence_fd = out_fence_fd;

Try not to set before success.

> +	ret = req->engine->emit_flush(req, EMIT_INVALIDATE);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("engine flush failed: %d\n", ret);
> +		goto err_out;
> +	}
> +
> +	ret = req->engine->emit_bb_start(req, exec_mm->batch_ptr, 0, 0);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("engine dispatch execbuf failed: %d\n", ret);
> +		goto err_out;
> +	}
> +
> +	return 0;
> +
> +err_out:
> +	if (out_fence_fd != -1)
> +		put_unused_fd(out_fence_fd);
> +
> +	if (out_fence)
> +		fput(out_fence->file);
> +
> +	if (in_fence)
> +		dma_fence_put(in_fence);
> +
> +	return ret;
> +}
> +
> +int i915_gem_exec_mm(struct drm_device *dev, void *data, struct drm_file *file)

Might as call this exec_svm. Unless you have immediate plans to
generalise.

> +{
> +	struct drm_i915_private *dev_priv = to_i915(dev);
> +	struct drm_i915_exec_mm *exec_mm = data;
> +	struct intel_engine_cs *engine;
> +	struct i915_gem_context *ctx;
> +	struct drm_i915_gem_request *req;
> +	const u32 ctx_id = exec_mm->ctx_id;
> +	const u32 flags = exec_mm->flags;
> +	int ret;
> +
> +	if (exec_mm->batch_ptr & 3) {
> +		DRM_DEBUG_DRIVER("ptr not 4-byt aligned %llu\n",
> +				 exec_mm->batch_ptr);
> +		return -EINVAL;
> +	}

Where does access_ok() get performed? A little spiel on the mm validation
would help here if it is all done for us by svm.

> +	if (flags & ~(I915_EXEC_MM_FENCE_OUT | I915_EXEC_MM_FENCE_IN)) {
> +		DRM_DEBUG_DRIVER("bad flags: 0x%08x\n", flags);
> +		return -EINVAL;
> +	}
> +
> +	if (exec_mm->pad != 0) {
> +		DRM_DEBUG_DRIVER("bad pad: 0x%08x\n", exec_mm->pad);
> +		return -EINVAL;
> +	}
> +
> +	if (file == NULL)
> +		return -EINVAL;

? That's not a user error (EINVAL), that's a kaboom!

> +
> +	engine = exec_mm_select_engine(dev_priv, exec_mm);
> +	if (!engine)
> +		return -EINVAL;
> +
> +	intel_runtime_pm_get(dev_priv);
> +
> +	ret = i915_mutex_lock_interruptible(dev);
> +	if (ret) {
> +		DRM_ERROR("mutex interrupted\n");
> +		goto pm_put;
> +	}
> +
> +	ctx = i915_gem_validate_context(dev, file, engine, ctx_id);
> +	if (IS_ERR(ctx)) {
> +		ret = PTR_ERR(ctx);
> +		DRM_DEBUG_DRIVER("couldn't get context\n");
> +		goto unlock;
> +	}
> +
> +	if (!i915_gem_context_is_svm(ctx)) {
> +		ret = -EINVAL;
> +		DRM_DEBUG_DRIVER("context is not SVM enabled\n");
> +		goto unlock;
> +	}
> +
> +	i915_gem_context_get(ctx);
> +
> +	req = i915_gem_request_alloc(engine, ctx);
> +	if (IS_ERR(req)) {
> +		ret = PTR_ERR(req);
> +		goto ctx_put;
> +	}
> +
> +	ret = i915_gem_request_add_to_client(req, file);
> +	if (ret) {
> +		DRM_DEBUG_DRIVER("failed to add request to client, %d\n", ret);
> +		goto add_request;
> +	}
> +
> +	/* If we fail here, we still need to submit the req */
> +	ret = do_exec_mm(exec_mm, req, flags);
> +
> +add_request:
> +	i915_add_request(req);

Do the same as execbuf just for consistency (i.e.
__i915_add_request(req, ret == 0)).

What's the story for i915_gpu_error.c?

> +
> +ctx_put:
> +	i915_gem_context_put(ctx);
> +
> +unlock:
> +	mutex_unlock(&dev->struct_mutex);
> +
> +pm_put:
> +	intel_runtime_pm_put(dev_priv);
> +
> +	return ret;
> +}
> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
> index 5b06ab1..843f943 100644
> --- a/include/uapi/drm/i915_drm.h
> +++ b/include/uapi/drm/i915_drm.h
> @@ -259,6 +259,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
>  #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
>  #define DRM_I915_PERF_OPEN		0x36
> +#define DRM_I915_GEM_EXEC_MM		0x37
>  
>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
> @@ -313,6 +314,7 @@ typedef struct _drm_i915_sarea {
>  #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
>  #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
>  #define DRM_IOCTL_I915_PERF_OPEN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
> +#define DRM_IOCTL_I915_GEM_EXEC_MM			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_EXEC_MM, struct drm_i915_exec_mm)
>  
>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>   * on the security mechanisms provided by hardware.
> @@ -1363,6 +1365,48 @@ enum drm_i915_perf_record_type {
>  	DRM_I915_PERF_RECORD_MAX /* non-ABI */
>  };
>  
> +/**
> + * drm_i915_exec_mm - shared address space execbuf
> + * @batch_ptr: address of batch buffer (in context's CPU address space)
> + * @ctx_id: context to use for execution, must be svm enabled
> + * @ring_id: ring to which this context will be submitted, see execbuffer2
> + * @flags: used to signal if in/out fences should be used
> + * @out_fence_fd: returned fence handle of out fence you can wait on
> + * @in_fence_fd: given fence handle of fence the gpu will wait for
> + * @pad: padding, must be zero
> + *
> + * This simplified execbuf just executes an MI_BATCH_BUFFER_START at
> + * @batch_ptr using @ctx_id as the context.  The context will indicate
> + * which address space the @batch_ptr will use.
> + *
> + * Note @batch_ptr must be dword aligned.
> + *
> + * By default, the kernel will simply execute the address given on the GPU.
> + *
> + * If the %I915_EXEC_MM_FENCE_OUT flag is passed in the @flags field however,
> + * the kernel will return a sync_file (dma_fence) object in @out_fence_fd for
> + * the caller to use to synchronize execution of the passed batch.
> + *
> + * If the %I915_EXEC_MM_FENCE_IN flag is passed in the @flags field,
> + * the kernel will wait until the fence (dma_fence) object passed in
> + * @in_fence_fd to complete before submitting batch to gpu.
> + *
> + */
> +struct drm_i915_exec_mm {
> +	__u64 batch_ptr;
> +	__u32 ctx_id;
> +	__u32 ring_id; /* see execbuffer2 ring flags */
> +
> +#define I915_EXEC_MM_FENCE_OUT (1<<0)
> +#define I915_EXEC_MM_FENCE_IN  (1<<1)
> +	__u32 flags;
> +
> +	__u32 out_fence_fd;
> +	__u32 in_fence_fd;

In then out just makes more sense to mo!
> +
> +	__u32 pad;

If you use u64 flags (as the second parameter) we can put this pad to
good use.

> +};
> +
>  #if defined(__cplusplus)
>  }
>  #endif
> -- 
> 2.7.4
> 

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16
  2017-01-09 16:52 ` [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16 Mika Kuoppala
@ 2017-01-09 21:18   ` Chris Wilson
  2017-01-12 15:48     ` Mika Kuoppala
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2017-01-09 21:18 UTC (permalink / raw)
  To: Mika Kuoppala
  Cc: David Woodhouse, Daniel Vetter, intel-gfx, Jesse Barnes, miku,
	beignet, David Woodhouse

On Mon, Jan 09, 2017 at 06:52:53PM +0200, Mika Kuoppala wrote:
> +static int i915_gem_context_enable_svm(struct i915_gem_context *ctx)
> +{
> +	int ret;
> +
> +	if (!HAS_SVM(ctx->i915))
> +		return -ENODEV;

How does legacy execbuf work with an svm context? It will write the
ppgtt, but those are no longer read by the GPU. So it will generate
faults at random addresses. Am I right in thinking we need to EINVAL if
using execbuf + context_is_svm?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 1/3] drm/i915: Create context desc template when context is created
  2017-01-09 16:52 ` [RFC PATCH 1/3] drm/i915: Create context desc template when context is created Mika Kuoppala
@ 2017-01-09 21:22   ` Chris Wilson
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2017-01-09 21:22 UTC (permalink / raw)
  To: Mika Kuoppala; +Cc: intel-gfx, miku, beignet

On Mon, Jan 09, 2017 at 06:52:52PM +0200, Mika Kuoppala wrote:
> Move the invariant parts of context desc setup from execlist init
> to context creation. This is advantageous when we need to
> create different templates based on the context parametrization,
> ie. for svm capable contexts.
> 
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_gem_context.c | 18 ++++++++++++++++--
>  drivers/gpu/drm/i915/i915_reg.h         | 15 +++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 21 +--------------------
>  3 files changed, 32 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index 40a6939..dadc845 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -257,6 +257,21 @@ static int assign_hw_id(struct drm_i915_private *dev_priv, unsigned *out)
>  	return 0;
>  }
>  
> +static u32 create_default_ctx_desc(const struct drm_i915_private *dev_priv)

I would refrain from using create. default_desc_template() will do
without suggesting that we create/alloc anything.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16
  2017-01-09 21:18   ` Chris Wilson
@ 2017-01-12 15:48     ` Mika Kuoppala
  2017-01-12 16:03       ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2017-01-12 15:48 UTC (permalink / raw)
  To: Chris Wilson
  Cc: David Woodhouse, Daniel Vetter, intel-gfx, Jesse Barnes, miku,
	beignet, David Woodhouse

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Jan 09, 2017 at 06:52:53PM +0200, Mika Kuoppala wrote:
>> +static int i915_gem_context_enable_svm(struct i915_gem_context *ctx)
>> +{
>> +	int ret;
>> +
>> +	if (!HAS_SVM(ctx->i915))
>> +		return -ENODEV;
>
> How does legacy execbuf work with an svm context? It will write the
> ppgtt, but those are no longer read by the GPU. So it will generate
> faults at random addresses. Am I right in thinking we need to EINVAL if
> using execbuf + context_is_svm?

Yes without further experiments, it is best to block the legacy path
with -EINVAL. I will add this.

I guess with some tweaking the legacy interface could be made to work,
but it would need is_svm_context() checks in rather many places
in the execbuffer path to avoid relocations/pins.

-Mika

> -Chris
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16
  2017-01-12 15:48     ` Mika Kuoppala
@ 2017-01-12 16:03       ` Chris Wilson
  2017-01-12 16:10         ` Jesse Barnes
  0 siblings, 1 reply; 13+ messages in thread
From: Chris Wilson @ 2017-01-12 16:03 UTC (permalink / raw)
  To: Mika Kuoppala
  Cc: David Woodhouse, Daniel Vetter, intel-gfx, Joonas Lahtinen,
	Jesse Barnes, miku, beignet, David Woodhouse

On Thu, Jan 12, 2017 at 05:48:49PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> 
> > On Mon, Jan 09, 2017 at 06:52:53PM +0200, Mika Kuoppala wrote:
> >> +static int i915_gem_context_enable_svm(struct i915_gem_context *ctx)
> >> +{
> >> +	int ret;
> >> +
> >> +	if (!HAS_SVM(ctx->i915))
> >> +		return -ENODEV;
> >
> > How does legacy execbuf work with an svm context? It will write the
> > ppgtt, but those are no longer read by the GPU. So it will generate
> > faults at random addresses. Am I right in thinking we need to EINVAL if
> > using execbuf + context_is_svm?
> 
> Yes without further experiments, it is best to block the legacy path
> with -EINVAL. I will add this.
> 
> I guess with some tweaking the legacy interface could be made to work,
> but it would need is_svm_context() checks in rather many places
> in the execbuffer path to avoid relocations/pins.

Hmm. right. Basically we have to ignore all objects if svm. Basically we
strip off everything (having to EINVAL if passed in relocs etc) and more
or less call exec_svm. The advantage is that it keeps the request tracking
of the objects correct, but it can only work with softpinning of the
objects at their cpu addresses. (I don't propose we map the object in
the cpu table at the ppgtt offset!)

Anyway the decision has to be made upfront whether we want to support
the frankenapi.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16
  2017-01-12 16:03       ` Chris Wilson
@ 2017-01-12 16:10         ` Jesse Barnes
  0 siblings, 0 replies; 13+ messages in thread
From: Jesse Barnes @ 2017-01-12 16:10 UTC (permalink / raw)
  To: David Woodhouse, Mika Kuoppala, Chris Wilson, intel-gfx,
	Joonas Lahtinen, David Woodhouse, Daniel Vetter, beignet, miku


[-- Attachment #1.1: Type: text/plain, Size: 1732 bytes --]

On Jan 12, 2017 8:04 AM, "Chris Wilson" <chris@chris-wilson.co.uk> wrote:

On Thu, Jan 12, 2017 at 05:48:49PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
>
> > On Mon, Jan 09, 2017 at 06:52:53PM +0200, Mika Kuoppala wrote:
> >> +static int i915_gem_context_enable_svm(struct i915_gem_context *ctx)
> >> +{
> >> +  int ret;
> >> +
> >> +  if (!HAS_SVM(ctx->i915))
> >> +          return -ENODEV;
> >
> > How does legacy execbuf work with an svm context? It will write the
> > ppgtt, but those are no longer read by the GPU. So it will generate
> > faults at random addresses. Am I right in thinking we need to EINVAL if
> > using execbuf + context_is_svm?
>
> Yes without further experiments, it is best to block the legacy path
> with -EINVAL. I will add this.
>
> I guess with some tweaking the legacy interface could be made to work,
> but it would need is_svm_context() checks in rather many places
> in the execbuffer path to avoid relocations/pins.

Hmm. right. Basically we have to ignore all objects if svm. Basically we
strip off everything (having to EINVAL if passed in relocs etc) and more
or less call exec_svm. The advantage is that it keeps the request tracking
of the objects correct, but it can only work with softpinning of the
objects at their cpu addresses. (I don't propose we map the object in
the cpu table at the ppgtt offset!)

Anyway the decision has to be made upfront whether we want to support
the frankenapi.


FWIW I  think the Beignet team wanted this functionality to make their
implementation easier. It's a bit more invasive but might be worth it for
userspace to make their transition easier.

Jesse


-Chris

--
Chris Wilson, Intel Open Source Technology Centre

[-- Attachment #1.2: Type: text/html, Size: 2800 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13
  2017-01-09 21:09   ` Chris Wilson
@ 2017-02-07 12:11     ` Mika Kuoppala
  2017-02-07 12:23       ` Chris Wilson
  0 siblings, 1 reply; 13+ messages in thread
From: Mika Kuoppala @ 2017-02-07 12:11 UTC (permalink / raw)
  To: Chris Wilson; +Cc: Daniel Vetter, intel-gfx, Jesse Barnes, miku, beignet

Chris Wilson <chris@chris-wilson.co.uk> writes:

> On Mon, Jan 09, 2017 at 06:52:54PM +0200, Mika Kuoppala wrote:
>> From: Jesse Barnes <jbarnes@virtuousgeek.org>
>> 
>> We just need to pass in an address to execute and some flags, since we
>> don't have to worry about buffer relocation or any of the other usual
>> stuff.  Takes in a fance and returns a fence to be used for
>> synchronization.
>> 
>> v2: add a request after batch submission (Jesse)
>> v3: add a flag for fence creation (Chris)
>> v4: add CLOEXEC flag (Kristian)
>>     add non-RCS ring support (Jesse)
>> v5: update for request alloc change (Jesse)
>> v6: new sync file interface, error paths, request breadcrumbs
>> v7: always CLOEXEC for sync_file_install
>> v8: rebase on new sync file api
>> v9: rework on top of fence requests and sync_file
>> v10: take fence ref for sync_file (Chris)
>>      use correct flush (Chris)
>>      limit exec on rcs
>> v11: allow exec on all engines
>> v12: dma_fence_put/get, fix error path on getting sync_file
>> v13: add in fences, removed superfluous dma_fence_put/get
>> 
>> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Cc: Joonas Lahtinen <joonas.lahtinen@linux.intel.com>
>> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org> (v5)
>> Signed-off-by: Mika Kuoppala <mika.kuoppala@intel.com>
>> ---
>>  drivers/gpu/drm/i915/Kconfig               |   1 +
>>  drivers/gpu/drm/i915/i915_drv.c            |   1 +
>>  drivers/gpu/drm/i915/i915_drv.h            |   2 +
>>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 175 +++++++++++++++++++++++++++++
>>  include/uapi/drm/i915_drm.h                |  44 ++++++++
>>  5 files changed, 223 insertions(+)
>> 
>> diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
>> index 183f5dc..387990d 100644
>> --- a/drivers/gpu/drm/i915/Kconfig
>> +++ b/drivers/gpu/drm/i915/Kconfig
>> @@ -8,6 +8,7 @@ config DRM_I915
>>  	# the shmem_readpage() which depends upon tmpfs
>>  	select SHMEM
>>  	select TMPFS
>> +	select SYNC_FILE
>
> Duplicate.
>
>>  	select DRM_KMS_HELPER
>>  	select DRM_PANEL
>>  	select DRM_MIPI_DSI
>> diff --git a/drivers/gpu/drm/i915/i915_drv.c b/drivers/gpu/drm/i915/i915_drv.c
>> index 4d22b4b..2276ba6 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.c
>> +++ b/drivers/gpu/drm/i915/i915_drv.c
>> @@ -2567,6 +2567,7 @@ static const struct drm_ioctl_desc i915_ioctls[] = {
>>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_RENDER_ALLOW),
>>  	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_RENDER_ALLOW),
>>  	DRM_IOCTL_DEF_DRV(I915_PERF_OPEN, i915_perf_open_ioctl, DRM_RENDER_ALLOW),
>> +	DRM_IOCTL_DEF_DRV(I915_GEM_EXEC_MM, i915_gem_exec_mm, DRM_RENDER_ALLOW),
>>  };
>>  
>>  static struct drm_driver driver = {
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index ca1eaaf..8436ea3 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -3094,6 +3094,8 @@ int i915_gem_execbuffer(struct drm_device *dev, void *data,
>>  			struct drm_file *file_priv);
>>  int i915_gem_execbuffer2(struct drm_device *dev, void *data,
>>  			 struct drm_file *file_priv);
>> +int i915_gem_exec_mm(struct drm_device *dev, void *data,
>> +		     struct drm_file *file);
>>  int i915_gem_busy_ioctl(struct drm_device *dev, void *data,
>>  			struct drm_file *file_priv);
>>  int i915_gem_get_caching_ioctl(struct drm_device *dev, void *data,
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index a5fe299..bb51ce4 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -29,6 +29,7 @@
>>  #include <linux/dma_remapping.h>
>>  #include <linux/reservation.h>
>>  #include <linux/uaccess.h>
>> +#include <linux/sync_file.h>
>
> Out of date.

Looking at the current code, it is duplicate as the explicit fencing
prolly brought it in.

>>  #include <drm/drmP.h>
>>  #include <drm/i915_drm.h>
>> @@ -1984,3 +1985,177 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
>>  	drm_free_large(exec2_list);
>>  	return ret;
>>  }
>> +
>> +static struct intel_engine_cs *
>> +exec_mm_select_engine(const struct drm_i915_private *dev_priv,
>> +		      const struct drm_i915_exec_mm *exec_mm)
>> +{
>> +	const unsigned int user_ring_id = exec_mm->ring_id;
>> +	struct intel_engine_cs *engine;
>> +
>> +	if (user_ring_id > I915_USER_RINGS) {
>> +		DRM_DEBUG("exec_mm with unknown ring: %u\n", user_ring_id);
>> +		return NULL;
>> +	}
>> +
>> +	engine = dev_priv->engine[user_ring_map[user_ring_id]];
>> +
>> +	if (!engine) {
>> +		DRM_DEBUG("exec_mm with invalid engine: %u\n", user_ring_id);
>> +		return NULL;
>> +	}
>> +
>> +	return engine;
>> +}
>
> I would rather we reuse eb_select_engine(). We know we have to fix the
> ABI anyway, so might as well kill 2 birds with one stone.
>

I am still hesistant about this. Tried but this would expose the exec
svm flags to similar kind of trickery for BSD ring.

>> +static int do_exec_mm(struct drm_i915_exec_mm *exec_mm,
>> +		      struct drm_i915_gem_request *req,
>> +		      const u32 flags)
>> +{
>> +	struct sync_file *out_fence = NULL;
>> +	struct dma_fence *in_fence = NULL;
>> +	int out_fence_fd = -1;
>> +	int ret;
>> +
>> +	if (flags & I915_EXEC_MM_FENCE_IN) {
>> +		in_fence = sync_file_get_fence(exec_mm->in_fence_fd);
>> +		if (!in_fence)
>> +			return -EINVAL;
>> +
>> +		ret = i915_gem_request_await_dma_fence(req, in_fence);
>
> You can drop the fence ref immediately.

So closely mimiced from explicit fences that execbuf path
might be able to do the same :)

>
> 		dma_fence_put(in_fence);
>> +		if (ret < 0)
>> +			goto err_out;
>> +	}
>> +
>> +	if (flags & I915_EXEC_MM_FENCE_OUT) {
>> +		out_fence = sync_file_create(&req->fence);
>> +		if (!out_fence) {
>> +			DRM_DEBUG_DRIVER("sync_file_create failed\n");
>> +			ret = -ENOMEM;
>> +			goto err_out;
>> +		}
>> +
>> +		out_fence_fd = get_unused_fd_flags(O_CLOEXEC);
>> +		if (out_fence_fd < 0) {
>> +			ret = out_fence_fd;
>> +			out_fence_fd = -1;
>> +			goto err_out;
>> +		}
>> +
>> +		fd_install(out_fence_fd, out_fence->file);
>> +		exec_mm->out_fence_fd = out_fence_fd;
>
> Try not to set before success.
>
>> +	ret = req->engine->emit_flush(req, EMIT_INVALIDATE);
>> +	if (ret) {
>> +		DRM_DEBUG_DRIVER("engine flush failed: %d\n", ret);
>> +		goto err_out;
>> +	}
>> +
>> +	ret = req->engine->emit_bb_start(req, exec_mm->batch_ptr, 0, 0);
>> +	if (ret) {
>> +		DRM_DEBUG_DRIVER("engine dispatch execbuf failed: %d\n", ret);
>> +		goto err_out;
>> +	}
>> +
>> +	return 0;
>> +
>> +err_out:
>> +	if (out_fence_fd != -1)
>> +		put_unused_fd(out_fence_fd);
>> +
>> +	if (out_fence)
>> +		fput(out_fence->file);
>> +
>> +	if (in_fence)
>> +		dma_fence_put(in_fence);
>> +
>> +	return ret;
>> +}
>> +
>> +int i915_gem_exec_mm(struct drm_device *dev, void *data, struct drm_file *file)
>
> Might as call this exec_svm. Unless you have immediate plans to
> generalise.
>

I renamed everywhere s/mm/svm

>> +{
>> +	struct drm_i915_private *dev_priv = to_i915(dev);
>> +	struct drm_i915_exec_mm *exec_mm = data;
>> +	struct intel_engine_cs *engine;
>> +	struct i915_gem_context *ctx;
>> +	struct drm_i915_gem_request *req;
>> +	const u32 ctx_id = exec_mm->ctx_id;
>> +	const u32 flags = exec_mm->flags;
>> +	int ret;
>> +
>> +	if (exec_mm->batch_ptr & 3) {
>> +		DRM_DEBUG_DRIVER("ptr not 4-byt aligned %llu\n",
>> +				 exec_mm->batch_ptr);
>> +		return -EINVAL;
>> +	}
>
> Where does access_ok() get performed? A little spiel on the mm validation
> would help here if it is all done for us by svm.
>

Passing in length wouldn't help much as the refs inside the bb
would also need to be checked?

>> +	if (flags & ~(I915_EXEC_MM_FENCE_OUT | I915_EXEC_MM_FENCE_IN)) {
>> +		DRM_DEBUG_DRIVER("bad flags: 0x%08x\n", flags);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (exec_mm->pad != 0) {
>> +		DRM_DEBUG_DRIVER("bad pad: 0x%08x\n", exec_mm->pad);
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (file == NULL)
>> +		return -EINVAL;
>
> ? That's not a user error (EINVAL), that's a kaboom!
>
>> +
>> +	engine = exec_mm_select_engine(dev_priv, exec_mm);
>> +	if (!engine)
>> +		return -EINVAL;
>> +
>> +	intel_runtime_pm_get(dev_priv);
>> +
>> +	ret = i915_mutex_lock_interruptible(dev);
>> +	if (ret) {
>> +		DRM_ERROR("mutex interrupted\n");
>> +		goto pm_put;
>> +	}
>> +
>> +	ctx = i915_gem_validate_context(dev, file, engine, ctx_id);
>> +	if (IS_ERR(ctx)) {
>> +		ret = PTR_ERR(ctx);
>> +		DRM_DEBUG_DRIVER("couldn't get context\n");
>> +		goto unlock;
>> +	}
>> +
>> +	if (!i915_gem_context_is_svm(ctx)) {
>> +		ret = -EINVAL;
>> +		DRM_DEBUG_DRIVER("context is not SVM enabled\n");
>> +		goto unlock;
>> +	}
>> +
>> +	i915_gem_context_get(ctx);
>> +
>> +	req = i915_gem_request_alloc(engine, ctx);
>> +	if (IS_ERR(req)) {
>> +		ret = PTR_ERR(req);
>> +		goto ctx_put;
>> +	}
>> +
>> +	ret = i915_gem_request_add_to_client(req, file);
>> +	if (ret) {
>> +		DRM_DEBUG_DRIVER("failed to add request to client, %d\n", ret);
>> +		goto add_request;
>> +	}
>> +
>> +	/* If we fail here, we still need to submit the req */
>> +	ret = do_exec_mm(exec_mm, req, flags);
>> +
>> +add_request:
>> +	i915_add_request(req);
>
> Do the same as execbuf just for consistency (i.e.
> __i915_add_request(req, ret == 0)).
>
> What's the story for i915_gpu_error.c?
>

Work in progress.

>> +
>> +ctx_put:
>> +	i915_gem_context_put(ctx);
>> +
>> +unlock:
>> +	mutex_unlock(&dev->struct_mutex);
>> +
>> +pm_put:
>> +	intel_runtime_pm_put(dev_priv);
>> +
>> +	return ret;
>> +}
>> diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
>> index 5b06ab1..843f943 100644
>> --- a/include/uapi/drm/i915_drm.h
>> +++ b/include/uapi/drm/i915_drm.h
>> @@ -259,6 +259,7 @@ typedef struct _drm_i915_sarea {
>>  #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
>>  #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
>>  #define DRM_I915_PERF_OPEN		0x36
>> +#define DRM_I915_GEM_EXEC_MM		0x37
>>  
>>  #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
>>  #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
>> @@ -313,6 +314,7 @@ typedef struct _drm_i915_sarea {
>>  #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
>>  #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
>>  #define DRM_IOCTL_I915_PERF_OPEN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_PERF_OPEN, struct drm_i915_perf_open_param)
>> +#define DRM_IOCTL_I915_GEM_EXEC_MM			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_EXEC_MM, struct drm_i915_exec_mm)
>>  
>>  /* Allow drivers to submit batchbuffers directly to hardware, relying
>>   * on the security mechanisms provided by hardware.
>> @@ -1363,6 +1365,48 @@ enum drm_i915_perf_record_type {
>>  	DRM_I915_PERF_RECORD_MAX /* non-ABI */
>>  };
>>  
>> +/**
>> + * drm_i915_exec_mm - shared address space execbuf
>> + * @batch_ptr: address of batch buffer (in context's CPU address space)
>> + * @ctx_id: context to use for execution, must be svm enabled
>> + * @ring_id: ring to which this context will be submitted, see execbuffer2
>> + * @flags: used to signal if in/out fences should be used
>> + * @out_fence_fd: returned fence handle of out fence you can wait on
>> + * @in_fence_fd: given fence handle of fence the gpu will wait for
>> + * @pad: padding, must be zero
>> + *
>> + * This simplified execbuf just executes an MI_BATCH_BUFFER_START at
>> + * @batch_ptr using @ctx_id as the context.  The context will indicate
>> + * which address space the @batch_ptr will use.
>> + *
>> + * Note @batch_ptr must be dword aligned.
>> + *
>> + * By default, the kernel will simply execute the address given on the GPU.
>> + *
>> + * If the %I915_EXEC_MM_FENCE_OUT flag is passed in the @flags field however,
>> + * the kernel will return a sync_file (dma_fence) object in @out_fence_fd for
>> + * the caller to use to synchronize execution of the passed batch.
>> + *
>> + * If the %I915_EXEC_MM_FENCE_IN flag is passed in the @flags field,
>> + * the kernel will wait until the fence (dma_fence) object passed in
>> + * @in_fence_fd to complete before submitting batch to gpu.
>> + *
>> + */
>> +struct drm_i915_exec_mm {
>> +	__u64 batch_ptr;
>> +	__u32 ctx_id;
>> +	__u32 ring_id; /* see execbuffer2 ring flags */
>> +
>> +#define I915_EXEC_MM_FENCE_OUT (1<<0)
>> +#define I915_EXEC_MM_FENCE_IN  (1<<1)
>> +	__u32 flags;
>> +
>> +	__u32 out_fence_fd;
>> +	__u32 in_fence_fd;
>
> In then out just makes more sense to mo!
>> +
>> +	__u32 pad;
>
> If you use u64 flags (as the second parameter) we can put this pad to
> good use.
>

I have addressed everything with the exception where I have
stated otherwise.

Thank you for the feedback,
-Mika

>> +};
>> +
>>  #if defined(__cplusplus)
>>  }
>>  #endif
>> -- 
>> 2.7.4
>> 
>
> -- 
> Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13
  2017-02-07 12:11     ` Mika Kuoppala
@ 2017-02-07 12:23       ` Chris Wilson
  0 siblings, 0 replies; 13+ messages in thread
From: Chris Wilson @ 2017-02-07 12:23 UTC (permalink / raw)
  To: Mika Kuoppala
  Cc: Daniel Vetter, intel-gfx, Joonas Lahtinen, Jesse Barnes, miku, beignet

On Tue, Feb 07, 2017 at 02:11:41PM +0200, Mika Kuoppala wrote:
> Chris Wilson <chris@chris-wilson.co.uk> writes:
> >> +static struct intel_engine_cs *
> >> +exec_mm_select_engine(const struct drm_i915_private *dev_priv,
> >> +		      const struct drm_i915_exec_mm *exec_mm)
> >> +{
> >> +	const unsigned int user_ring_id = exec_mm->ring_id;
> >> +	struct intel_engine_cs *engine;
> >> +
> >> +	if (user_ring_id > I915_USER_RINGS) {
> >> +		DRM_DEBUG("exec_mm with unknown ring: %u\n", user_ring_id);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	engine = dev_priv->engine[user_ring_map[user_ring_id]];
> >> +
> >> +	if (!engine) {
> >> +		DRM_DEBUG("exec_mm with invalid engine: %u\n", user_ring_id);
> >> +		return NULL;
> >> +	}
> >> +
> >> +	return engine;
> >> +}
> >
> > I would rather we reuse eb_select_engine(). We know we have to fix the
> > ABI anyway, so might as well kill 2 birds with one stone.
> >
> 
> I am still hesistant about this. Tried but this would expose the exec
> svm flags to similar kind of trickery for BSD ring.

My point was that ABI needs fixing. So rather than fixing it twice, do
it right once.

> >> +static int do_exec_mm(struct drm_i915_exec_mm *exec_mm,
> >> +		      struct drm_i915_gem_request *req,
> >> +		      const u32 flags)
> >> +{
> >> +	struct sync_file *out_fence = NULL;
> >> +	struct dma_fence *in_fence = NULL;
> >> +	int out_fence_fd = -1;
> >> +	int ret;
> >> +
> >> +	if (flags & I915_EXEC_MM_FENCE_IN) {
> >> +		in_fence = sync_file_get_fence(exec_mm->in_fence_fd);
> >> +		if (!in_fence)
> >> +			return -EINVAL;
> >> +
> >> +		ret = i915_gem_request_await_dma_fence(req, in_fence);
> >
> > You can drop the fence ref immediately.
> 
> So closely mimiced from explicit fences that execbuf path
> might be able to do the same :)

Not quite. We do the easy error checking in execbuf before we allocate
the request and commit hw resources. Being picky, exec_svm should do the
same.

> >> +{
> >> +	struct drm_i915_private *dev_priv = to_i915(dev);
> >> +	struct drm_i915_exec_mm *exec_mm = data;
> >> +	struct intel_engine_cs *engine;
> >> +	struct i915_gem_context *ctx;
> >> +	struct drm_i915_gem_request *req;
> >> +	const u32 ctx_id = exec_mm->ctx_id;
> >> +	const u32 flags = exec_mm->flags;
> >> +	int ret;
> >> +
> >> +	if (exec_mm->batch_ptr & 3) {
> >> +		DRM_DEBUG_DRIVER("ptr not 4-byt aligned %llu\n",
> >> +				 exec_mm->batch_ptr);
> >> +		return -EINVAL;
> >> +	}
> >
> > Where does access_ok() get performed? A little spiel on the mm validation
> > would help here if it is all done for us by svm.
> >
> 
> Passing in length wouldn't help much as the refs inside the bb
> would also need to be checked?

Sure, I'm quite happy for a comment on why this is special and doesn't
need access validation checks. But we need something.

For example, why even bother checking for ptr alignment? It is just
another faulting instruction.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Beignet mailing list
Beignet@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/beignet

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-07 12:23 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-09 16:52 [RFC PATCH 0/3] SVM for kbl Mika Kuoppala
2017-01-09 16:52 ` [RFC PATCH 1/3] drm/i915: Create context desc template when context is created Mika Kuoppala
2017-01-09 21:22   ` Chris Wilson
2017-01-09 16:52 ` [RFC PATCH 2/3] drm/i915: IOMMU based SVM implementation v16 Mika Kuoppala
2017-01-09 21:18   ` Chris Wilson
2017-01-12 15:48     ` Mika Kuoppala
2017-01-12 16:03       ` Chris Wilson
2017-01-12 16:10         ` Jesse Barnes
2017-01-09 16:52 ` [RFC PATCH 3/3] drm/i915: add SVM execbuf ioctl v13 Mika Kuoppala
2017-01-09 21:09   ` Chris Wilson
2017-02-07 12:11     ` Mika Kuoppala
2017-02-07 12:23       ` Chris Wilson
2017-01-09 17:54 ` ✗ Fi.CI.BAT: failure for SVM for kbl Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.