All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC] Page table sharing and bufferless execbuf
@ 2015-09-04 16:58 Jesse Barnes
  2015-09-04 16:58 ` [PATCH 1/9] mm: move mmu_find_ops to mmu_notifier.c Jesse Barnes
                   ` (10 more replies)
  0 siblings, 11 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

I've been carrying something looking rougly like this patchset around
internally for a long time now, and with SKL out there now, I figured
it's time to get it posted and start the process of integration.

David is working on pulling over most of the "driver based PASID
handling" and other code into the IOMMU layer, but the integration with
the rest of the driver needs some careful thought (handling task
referencing along with context referencing, context creation, etc.)

I believe John H has a better version of the sync point patch already;
I'm only posting this one because it was already in my tree.  I can
re-base to the new one once it lands.

A few things to think about:
  - how to signal GPU hangs with the new execbuf (a signal might be more
    natural as the execution appears more CPU-like?  what state do we
    have to worry about restoring for bufferless contexts?
  - page fault re-queue needs to be handled properly for bufferless
    contexts that take a fault and need to stall
  - the new context creation ioctl is needed for something else too I
    believe?

Anyway, comments and feedback welcome.  I have some tests and libdrm
code I'll push to my private fdo repos soon as well.

Thanks,
Jesse


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [PATCH 1/9] mm: move mmu_find_ops to mmu_notifier.c
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
@ 2015-09-04 16:58 ` Jesse Barnes
  2015-09-04 16:58 ` [PATCH 2/9] signal: export force_sig_info Jesse Barnes
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

For use by other modules.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/misc/sgi-gru/grutlbpurge.c | 19 -------------------
 include/linux/mmu_notifier.h       |  8 ++++++++
 mm/mmu_notifier.c                  | 19 +++++++++++++++++++
 3 files changed, 27 insertions(+), 19 deletions(-)

diff --git a/drivers/misc/sgi-gru/grutlbpurge.c b/drivers/misc/sgi-gru/grutlbpurge.c
index 2129274..0f86892 100644
--- a/drivers/misc/sgi-gru/grutlbpurge.c
+++ b/drivers/misc/sgi-gru/grutlbpurge.c
@@ -275,25 +275,6 @@ static const struct mmu_notifier_ops gru_mmuops = {
 	.release		= gru_release,
 };
 
-/* Move this to the basic mmu_notifier file. But for now... */
-static struct mmu_notifier *mmu_find_ops(struct mm_struct *mm,
-			const struct mmu_notifier_ops *ops)
-{
-	struct mmu_notifier *mn, *gru_mn = NULL;
-
-	if (mm->mmu_notifier_mm) {
-		rcu_read_lock();
-		hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list,
-					 hlist)
-		    if (mn->ops == ops) {
-			gru_mn = mn;
-			break;
-		}
-		rcu_read_unlock();
-	}
-	return gru_mn;
-}
-
 struct gru_mm_struct *gru_register_mmu_notifier(void)
 {
 	struct gru_mm_struct *gms;
diff --git a/include/linux/mmu_notifier.h b/include/linux/mmu_notifier.h
index 61cd67f..0a78f5e 100644
--- a/include/linux/mmu_notifier.h
+++ b/include/linux/mmu_notifier.h
@@ -215,6 +215,8 @@ extern void __mmu_notifier_invalidate_range_end(struct mm_struct *mm,
 				  unsigned long start, unsigned long end);
 extern void __mmu_notifier_invalidate_range(struct mm_struct *mm,
 				  unsigned long start, unsigned long end);
+extern struct mmu_notifier *mmu_find_ops(struct mm_struct *mm,
+					 const struct mmu_notifier_ops *ops);
 
 static inline void mmu_notifier_release(struct mm_struct *mm)
 {
@@ -425,6 +427,12 @@ static inline void mmu_notifier_mm_destroy(struct mm_struct *mm)
 {
 }
 
+static inline struct mmu_notifier *mmu_find_ops(struct mm_struct *mm,
+						struct mmu_notifier_ops *ops)
+{
+	return NULL;
+}
+
 #define ptep_clear_flush_young_notify ptep_clear_flush_young
 #define pmdp_clear_flush_young_notify pmdp_clear_flush_young
 #define	ptep_clear_flush_notify ptep_clear_flush
diff --git a/mm/mmu_notifier.c b/mm/mmu_notifier.c
index 3b9b3d0..d978138 100644
--- a/mm/mmu_notifier.c
+++ b/mm/mmu_notifier.c
@@ -389,6 +389,25 @@ void mmu_notifier_unregister_no_release(struct mmu_notifier *mn,
 }
 EXPORT_SYMBOL_GPL(mmu_notifier_unregister_no_release);
 
+struct mmu_notifier *mmu_find_ops(struct mm_struct *mm,
+				  const struct mmu_notifier_ops *ops)
+{
+	struct mmu_notifier *mn, *tmp_mn = NULL;
+
+	if (mm->mmu_notifier_mm) {
+		rcu_read_lock();
+		hlist_for_each_entry_rcu(mn, &mm->mmu_notifier_mm->list,
+					 hlist)
+		    if (mn->ops == ops) {
+			tmp_mn = mn;
+			break;
+		}
+		rcu_read_unlock();
+	}
+	return tmp_mn;
+}
+EXPORT_SYMBOL_GPL(mmu_find_ops);
+
 static int __init mmu_notifier_init(void)
 {
 	return init_srcu_struct(&srcu);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 2/9] signal: export force_sig_info
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
  2015-09-04 16:58 ` [PATCH 1/9] mm: move mmu_find_ops to mmu_notifier.c Jesse Barnes
@ 2015-09-04 16:58 ` Jesse Barnes
  2015-09-04 16:58 ` [PATCH 3/9] android/sync: add sync_fence_create_dma Jesse Barnes
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

For signaling tasks from drivers.
---
 kernel/signal.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/signal.c b/kernel/signal.c
index 0f6bbbe..9122aa2 100644
--- a/kernel/signal.c
+++ b/kernel/signal.c
@@ -1227,6 +1227,7 @@ force_sig_info(int sig, struct siginfo *info, struct task_struct *t)
 
 	return ret;
 }
+EXPORT_SYMBOL(force_sig_info);
 
 /*
  * Nuke all other threads in the group.
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 3/9] android/sync: add sync_fence_create_dma
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
  2015-09-04 16:58 ` [PATCH 1/9] mm: move mmu_find_ops to mmu_notifier.c Jesse Barnes
  2015-09-04 16:58 ` [PATCH 2/9] signal: export force_sig_info Jesse Barnes
@ 2015-09-04 16:58 ` Jesse Barnes
  2015-09-04 16:58 ` [PATCH 4/9] android/sync: hack: enable fence signaling in Android Native Sync implementation Jesse Barnes
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>

This allows users of dma fences to create a android fence.

Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
---
 drivers/staging/android/sync.c | 13 +++++++++----
 drivers/staging/android/sync.h |  3 ++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index f83e00c..7f0e919 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
 }
 
 /* TODO: implement a create which takes more that one sync_pt */
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
 {
 	struct sync_fence *fence;
 
@@ -199,16 +199,21 @@ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
 	fence->num_fences = 1;
 	atomic_set(&fence->status, 1);
 
-	fence->cbs[0].sync_pt = &pt->base;
+	fence->cbs[0].sync_pt = pt;
 	fence->cbs[0].fence = fence;
-	if (fence_add_callback(&pt->base, &fence->cbs[0].cb,
-			       fence_check_cb_func))
+	if (fence_add_callback(pt, &fence->cbs[0].cb, fence_check_cb_func))
 		atomic_dec(&fence->status);
 
 	sync_fence_debug_add(fence);
 
 	return fence;
 }
+EXPORT_SYMBOL(sync_fence_create_dma);
+
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+{
+	return sync_fence_create_dma(name, &pt->base);
+}
 EXPORT_SYMBOL(sync_fence_create);
 
 struct sync_fence *sync_fence_fdget(int fd)
diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index a21b79f..50052e4 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -250,8 +250,9 @@ void sync_pt_free(struct sync_pt *pt);
  * @pt:		sync_pt to add to the fence
  *
  * Creates a fence containg @pt.  Once this is called, the fence takes
- * ownership of @pt.
+ * a reference on @pt.
  */
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
 struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
 
 /*
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 4/9] android/sync: hack: enable fence signaling in Android Native Sync implementation
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (2 preceding siblings ...)
  2015-09-04 16:58 ` [PATCH 3/9] android/sync: add sync_fence_create_dma Jesse Barnes
@ 2015-09-04 16:58 ` Jesse Barnes
  2015-09-04 16:58 ` [PATCH 5/9] drm/i915: add create_context2 ioctl Jesse Barnes
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

---
 drivers/staging/android/sync.c | 29 +++++++++++++++++------------
 1 file changed, 17 insertions(+), 12 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index 7f0e919..858278d 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -384,24 +384,29 @@ int sync_fence_wait(struct sync_fence *fence, long timeout)
 		timeout = msecs_to_jiffies(timeout);
 
 	trace_sync_wait(fence, 1);
-	for (i = 0; i < fence->num_fences; ++i)
+	for (i = 0; i < fence->num_fences; ++i) {
 		trace_sync_pt(fence->cbs[i].sync_pt);
+		ret = fence_wait_timeout(fence->cbs[i].sync_pt, true, timeout);
+		if (ret < 0) {
+			trace_sync_wait(fence, 0);
+			return ret;
+		} else if (ret == 0) {
+			trace_sync_wait(fence, 0);
+			if (timeout) {
+				pr_info("fence timeout on [%p] after %dms\n",
+					fence, jiffies_to_msecs(timeout));
+				sync_dump();
+			}
+			return -ETIME;
+		}
+	}
+#if 0
 	ret = wait_event_interruptible_timeout(fence->wq,
 					       atomic_read(&fence->status) <= 0,
 					       timeout);
+#endif
 	trace_sync_wait(fence, 0);
 
-	if (ret < 0) {
-		return ret;
-	} else if (ret == 0) {
-		if (timeout) {
-			pr_info("fence timeout on [%p] after %dms\n", fence,
-				jiffies_to_msecs(timeout));
-			sync_dump();
-		}
-		return -ETIME;
-	}
-
 	ret = atomic_read(&fence->status);
 	if (ret) {
 		pr_info("fence error %ld on [%p]\n", ret, fence);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 5/9] drm/i915: add create_context2 ioctl
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (3 preceding siblings ...)
  2015-09-04 16:58 ` [PATCH 4/9] android/sync: hack: enable fence signaling in Android Native Sync implementation Jesse Barnes
@ 2015-09-04 16:58 ` Jesse Barnes
  2015-09-04 16:59 ` [PATCH 6/9] drm/i915: driver based PASID handling Jesse Barnes
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:58 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

For passing flags (e.g. SVM) when creating a context.

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_dma.c         |  1 +
 drivers/gpu/drm/i915/i915_drv.h         |  2 ++
 drivers/gpu/drm/i915/i915_gem_context.c | 63 +++++++++++++++++++++++++++------
 include/uapi/drm/i915_drm.h             | 18 ++++++++++
 4 files changed, 73 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index f0eaa6f..b868084 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1300,6 +1300,7 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_USERPTR, i915_gem_userptr_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
+	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE2, i915_gem_context_create2_ioctl, DRM_UNLOCKED),
 };
 
 int i915_max_ioctl = ARRAY_SIZE(i915_ioctls);
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 1287007..20beb51 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -3139,6 +3139,8 @@ static inline bool i915_gem_context_is_default(const struct intel_context *c)
 
 int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file);
+int i915_gem_context_create2_ioctl(struct drm_device *dev, void *data,
+				   struct drm_file *file);
 int i915_gem_context_destroy_ioctl(struct drm_device *dev, void *data,
 				   struct drm_file *file);
 int i915_gem_context_getparam_ioctl(struct drm_device *dev, void *data,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index 74aa0c9..e9ec2f3 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -238,14 +238,18 @@ err_out:
  */
 static struct intel_context *
 i915_gem_create_context(struct drm_device *dev,
-			struct drm_i915_file_private *file_priv)
+			struct drm_i915_file_private *file_priv, u32 flags)
 {
 	const bool is_global_default_ctx = file_priv == NULL;
 	struct intel_context *ctx;
+	bool create_vm = false;
 	int ret = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
+	if (flags & (I915_GEM_CONTEXT_FULL_PPGTT | I915_GEM_CONTEXT_ENABLE_SVM))
+		create_vm = true;
+
 	ctx = __create_hw_context(dev, file_priv);
 	if (IS_ERR(ctx))
 		return ctx;
@@ -266,7 +270,7 @@ i915_gem_create_context(struct drm_device *dev,
 		}
 	}
 
-	if (USES_FULL_PPGTT(dev)) {
+	if (create_vm) {
 		struct i915_hw_ppgtt *ppgtt = i915_ppgtt_create(dev, file_priv);
 
 		if (IS_ERR_OR_NULL(ppgtt)) {
@@ -325,6 +329,7 @@ int i915_gem_context_init(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_context *ctx;
+	u32 flags = 0;
 	int i;
 
 	/* Init should only be called once per module load. Eventually the
@@ -352,7 +357,10 @@ int i915_gem_context_init(struct drm_device *dev)
 		}
 	}
 
-	ctx = i915_gem_create_context(dev, NULL);
+	if (USES_FULL_PPGTT(dev))
+		flags |= I915_GEM_CONTEXT_FULL_PPGTT;
+
+	ctx = i915_gem_create_context(dev, NULL, flags);
 	if (IS_ERR(ctx)) {
 		DRM_ERROR("Failed to create default global context (error %ld)\n",
 			  PTR_ERR(ctx));
@@ -452,7 +460,8 @@ int i915_gem_context_open(struct drm_device *dev, struct drm_file *file)
 	idr_init(&file_priv->context_idr);
 
 	mutex_lock(&dev->struct_mutex);
-	ctx = i915_gem_create_context(dev, file_priv);
+	ctx = i915_gem_create_context(dev, file_priv, USES_FULL_PPGTT(dev) ?
+				      I915_GEM_CONTEXT_FULL_PPGTT : 0);
 	mutex_unlock(&dev->struct_mutex);
 
 	if (IS_ERR(ctx)) {
@@ -820,13 +829,15 @@ static bool contexts_enabled(struct drm_device *dev)
 	return i915.enable_execlists || to_i915(dev)->hw_context_size;
 }
 
-int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
+int i915_gem_context_create2_ioctl(struct drm_device *dev, void *data,
 				  struct drm_file *file)
 {
-	struct drm_i915_gem_context_create *args = data;
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_gem_context_create2 *args = data;
 	struct drm_i915_file_private *file_priv = file->driver_priv;
 	struct intel_context *ctx;
-	int ret;
+	u32 flags = args->flags;
+	int ret = 0;
 
 	if (!contexts_enabled(dev))
 		return -ENODEV;
@@ -835,14 +846,44 @@ int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
 	if (ret)
 		return ret;
 
-	ctx = i915_gem_create_context(dev, file_priv);
-	mutex_unlock(&dev->struct_mutex);
-	if (IS_ERR(ctx))
-		return PTR_ERR(ctx);
+	if (USES_FULL_PPGTT(dev))
+		flags |= I915_GEM_CONTEXT_FULL_PPGTT;
+
+	if ((flags & I915_GEM_CONTEXT_ENABLE_SVM) &&
+	    !dev_priv->svm.svm_available) {
+		ret = -ENODEV;
+		goto out_unlock;
+	}
+
+	ctx = i915_gem_create_context(dev, file_priv, flags);
+	if (IS_ERR(ctx)) {
+		ret = PTR_ERR(ctx);
+		goto out_unlock;
+	}
 
 	args->ctx_id = ctx->user_handle;
 	DRM_DEBUG_DRIVER("HW context %d created\n", args->ctx_id);
 
+out_unlock:
+	mutex_unlock(&dev->struct_mutex);
+	return ret;
+}
+
+int i915_gem_context_create_ioctl(struct drm_device *dev, void *data,
+				  struct drm_file *file)
+{
+	struct drm_i915_gem_context_create *args = data;
+	struct drm_i915_gem_context_create2 tmp;
+	int ret;
+
+	tmp.flags = 0;
+
+	ret = i915_gem_context_create2_ioctl(dev, &tmp, file);
+	if (ret)
+		return ret;
+
+	args->ctx_id = tmp.ctx_id;
+
 	return 0;
 }
 
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index fd5aa47..bac2fb2 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -230,6 +230,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_USERPTR		0x33
 #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
 #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
+#define DRM_I915_GEM_CONTEXT_CREATE2	0x36
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -283,6 +284,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_USERPTR			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_USERPTR, struct drm_i915_gem_userptr)
 #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
+#define DRM_IOCTL_I915_GEM_CONTEXT_CREATE2	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE2, struct drm_i915_gem_context_create2)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1072,6 +1074,22 @@ struct drm_i915_gem_context_create {
 	__u32 pad;
 };
 
+/*
+ * SVM handling
+ *
+ * A context can opt in to SVM support (thereby using its CPU page tables
+ * when accessing data from the GPU) by using the %I915_ENABLE_SVM flag
+ * and passing an existing context id.  This is a one way transition; SVM
+ * contexts can not be downgraded into PPGTT contexts once converted.
+ */
+#define I915_GEM_CONTEXT_ENABLE_SVM		(1<<0)
+#define I915_GEM_CONTEXT_FULL_PPGTT		(1<<1)
+struct drm_i915_gem_context_create2 {
+	/*  output: id of new context*/
+	__u32 ctx_id;
+	__u32 flags;
+};
+
 struct drm_i915_gem_context_destroy {
 	__u32 ctx_id;
 	__u32 pad;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 6/9] drm/i915: driver based PASID handling
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (4 preceding siblings ...)
  2015-09-04 16:58 ` [PATCH 5/9] drm/i915: add create_context2 ioctl Jesse Barnes
@ 2015-09-04 16:59 ` Jesse Barnes
  2015-10-07 13:00   ` David Woodhouse
  2015-10-08 15:57   ` Chris Wilson
  2015-09-04 16:59 ` [PATCH 7/9] drm/i915: add fences to the request struct Jesse Barnes
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 52728 bytes --]

New file with VT-d SVM and PASID handling functions and page table
management.  This belongs in the IOMMU code (along with some extra bits
for waiting for invalidations and page faults to complete, flushing the
device IOTLB, etc.)

FIXME:
  need work queue for re-submitting contexts
  TE bit handling on SKL
---
 drivers/gpu/drm/i915/Makefile           |    5 +-
 drivers/gpu/drm/i915/i915_drv.h         |   43 ++
 drivers/gpu/drm/i915/i915_gem.c         |    3 +
 drivers/gpu/drm/i915/i915_gem_context.c |    3 +
 drivers/gpu/drm/i915/i915_irq.c         |    7 +
 drivers/gpu/drm/i915/i915_reg.h         |   47 ++
 drivers/gpu/drm/i915/i915_svm.c         | 1102 +++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  120 +++-
 drivers/gpu/drm/i915/intel_lrc.h        |    1 +
 9 files changed, 1299 insertions(+), 32 deletions(-)
 create mode 100644 drivers/gpu/drm/i915/i915_svm.c

diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index 44d290a..e4883a7 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -38,7 +38,8 @@ i915-y += i915_cmd_parser.o \
 	  intel_lrc.o \
 	  intel_mocs.o \
 	  intel_ringbuffer.o \
-	  intel_uncore.o
+	  intel_uncore.o \
+	  i915_svm.o
 
 # general-purpose microcontroller (GuC) support
 i915-y += intel_guc_loader.o \
@@ -93,6 +94,8 @@ i915-y += dvo_ch7017.o \
 # virtual gpu code
 i915-y += i915_vgpu.o
 
+i915-$(CONFIG_MMU_NOTIFIER) += i915_svm.o
+
 # legacy horrors
 i915-y += i915_dma.o
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 20beb51..ca38a7a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -47,6 +47,7 @@
 #include <drm/drm_gem.h>
 #include <linux/backlight.h>
 #include <linux/hashtable.h>
+#include <linux/mmu_notifier.h>
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
@@ -848,6 +849,13 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct intel_mm_struct {
+	struct kref kref;
+	struct mmu_notifier notifier;
+	struct drm_i915_private *dev_priv;
+	struct list_head context_list;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -874,6 +882,9 @@ struct i915_ctx_hang_stats {
 struct intel_context {
 	struct kref ref;
 	int user_handle;
+	bool is_svm; /* shares x86 page tables */
+	u32 pasid; /* 20 bits */
+	struct intel_mm_struct *ims;
 	uint8_t remap_slice;
 	struct drm_i915_private *i915;
 	int flags;
@@ -895,6 +906,9 @@ struct intel_context {
 		int pin_count;
 	} engine[I915_NUM_RINGS];
 
+	struct list_head mm_list;
+	struct task_struct *tsk;
+
 	struct list_head link;
 };
 
@@ -1334,6 +1348,21 @@ struct i915_gem_mm {
 	u32 object_count;
 };
 
+#define PASID_COUNT 32
+
+struct i915_svm_state {
+	bool svm_available;
+	struct extended_root_table_entry *root_table;
+	struct extended_context_table_entry *context;
+	struct pasid_table_entry *pasid_table;
+	struct pasid_state_table_entry *pasid_state_table;
+        struct intel_context *pasid_ctx[PASID_COUNT];
+	u8 *prq_ring;
+	u8 *ivq_ring;
+	struct work_struct work;
+	spinlock_t lock; /* protects pasid_ctx, prq, and ivq rings */
+};
+
 struct drm_i915_error_state_buf {
 	struct drm_i915_private *i915;
 	unsigned bytes;
@@ -1942,6 +1971,8 @@ struct drm_i915_private {
 
 	struct i915_runtime_pm pm;
 
+	struct i915_svm_state svm;
+
 	/* Abstract the submission mechanism (legacy ringbuffer or execlists) away */
 	struct {
 		int (*execbuf_submit)(struct i915_execbuffer_params *params,
@@ -3342,6 +3373,18 @@ int i915_reg_read_ioctl(struct drm_device *dev, void *data,
 int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
 			       struct drm_file *file);
 
+/* svm */
+extern struct intel_mm_struct *intel_bind_mm(struct drm_device *dev,
+					     struct intel_context *ctx);
+extern void intel_unbind_mm(struct intel_context *ctx);
+extern int intel_alloc_pasid(struct drm_device *dev,
+			     struct intel_context *ctx);
+extern void intel_free_pasid(struct drm_device *dev,
+			     struct intel_context *ctx);
+extern void intel_init_svm(struct drm_device *dev);
+extern void intel_iommu_tlb_flush(struct drm_device *dev);
+extern void intel_gpu_fault(struct drm_device *dev);
+
 /* overlay */
 extern struct intel_overlay_error_state *intel_overlay_capture_error_state(struct drm_device *dev);
 extern void intel_overlay_print_error_state(struct drm_i915_error_state_buf *e,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 41263cd..5341591 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4665,6 +4665,9 @@ i915_gem_init_hw(struct drm_device *dev)
 		}
 	}
 
+	if (INTEL_INFO(dev)->gen >= 8)
+		intel_init_svm(dev);
+
 	i915_gem_init_swizzling(dev);
 
 	/*
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index e9ec2f3..3207bbf 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -254,6 +254,9 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (flags & I915_GEM_CONTEXT_ENABLE_SVM)
+		ctx->is_svm = true;
+
 	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 2063279..192ad0c 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2176,6 +2176,8 @@ static irqreturn_t gen8_irq_handler(int irq, void *arg)
 			ret = IRQ_HANDLED;
 			if (tmp & GEN8_DE_MISC_GSE)
 				intel_opregion_asle_intr(dev);
+			else if (tmp & GEN8_DE_MISC_SVM_PRQ)
+				queue_work(dev_priv->wq, &dev_priv->svm.work);
 			else
 				DRM_ERROR("Unexpected DE Misc interrupt\n");
 		}
@@ -3571,6 +3573,11 @@ static void gen8_de_irq_postinstall(struct drm_i915_private *dev_priv)
 					  de_pipe_enables);
 
 	GEN5_IRQ_INIT(GEN8_DE_PORT_, ~de_port_masked, de_port_enables);
+
+	I915_WRITE(0x42a4, (1<<29));
+	I915_WRITE(GEN8_DE_MISC_IMR, ~GEN8_DE_MISC_SVM_PRQ);
+	I915_WRITE(GEN8_DE_MISC_IER, GEN8_DE_MISC_SVM_PRQ);
+	POSTING_READ(GEN8_DE_PORT_IER);
 }
 
 static int gen8_irq_postinstall(struct drm_device *dev)
diff --git a/drivers/gpu/drm/i915/i915_reg.h b/drivers/gpu/drm/i915/i915_reg.h
index 84ed9ab..618bd33 100644
--- a/drivers/gpu/drm/i915/i915_reg.h
+++ b/drivers/gpu/drm/i915/i915_reg.h
@@ -2806,6 +2806,52 @@ enum skl_disp_power_wells {
 				INTERVAL_1_33_US(us) : \
 				INTERVAL_1_28_US(us))
 
+/* GFXVTBAR mirror */
+#define GFXVTBAR_BASE		0x118000
+
+#define BDW_SVM_DEV_MODE_CNFG	(0x110000)
+#define   BDW_SVM_MODE_DRIVER	(1<<0)
+
+#define SVM_GCMD		(GFXVTBAR_BASE + 0x18)
+#define   GCMD_TE		(1<<31)
+#define   GCMD_SRTP		(1<<30)
+#define   GCMD_QIE		(1<<26)
+
+#define SVM_GSTS		(GFXVTBAR_BASE + 0x1c)
+#define   GSTS_TES		(1<<31)
+#define   GSTS_RTPS		(1<<30)
+#define   GSTS_QIES		(1<<26)
+
+#define SVM_RTADDR		(GFXVTBAR_BASE + 0x20)
+#define   SVM_RTT_TYPE_EXT	(1<<11)
+
+#define SVM_CCMD		(GFXVTBAR_BASE + 0x28)
+#define   CCMD_ICC		(1ULL<<63)
+#define   CCMD_CIRG_GLOBAL	(1ULL<<61)
+
+#define SVM_IVQ_HEAD		(GFXVTBAR_BASE + 0x80)
+#define SVM_IVQ_TAIL		(GFXVTBAR_BASE + 0x88)
+#define SVM_IQA			(GFXVTBAR_BASE + 0x90)
+#define SVM_IECTL		(GFXVTBAR_BASE + 0xa0)
+
+#define SVM_PRQ_HEAD		(GFXVTBAR_BASE + 0xc0)
+#define SVM_PRQ_TAIL		(GFXVTBAR_BASE + 0xc8)
+#define SVM_PRQA		(GFXVTBAR_BASE + 0xd0)
+
+#define SVM_PRECTL		(GFXVTBAR_BASE + 0xe0)
+#define   PRE_IM		(1<<31)
+
+#define SVM_IOTLB		(GFXVTBAR_BASE + 0x508)
+#define   IOTLB_IVT		(1ULL<<63)
+#define   IOTLB_GLOBAL		(1ULL<<60)
+#define   IOTLB_DOMAIN		(2ULL<<60)
+#define   IOTLB_PAGE		(3ULL<<60)
+#define   IOTLB_IIRG_MASK	(3ULL<<60)
+#define   IOTLB_GLOBAL_DONE	(1ULL<<57)
+#define   IOTLB_DOMAIN_DONE	(2ULL<<57)
+#define   IOTLB_PAGE_DONE	(3ULL<<57)
+#define   IOTLB_IAIG_MASK	(3ULL<<57)
+
 /*
  * Logical Context regs
  */
@@ -5791,6 +5837,7 @@ enum skl_disp_power_wells {
 #define GEN8_DE_MISC_IIR 0x44468
 #define GEN8_DE_MISC_IER 0x4446c
 #define  GEN8_DE_MISC_GSE		(1 << 27)
+#define  GEN8_DE_MISC_SVM_PRQ		(1 << 22)
 
 #define GEN8_PCU_ISR 0x444e0
 #define GEN8_PCU_IMR 0x444e4
diff --git a/drivers/gpu/drm/i915/i915_svm.c b/drivers/gpu/drm/i915/i915_svm.c
new file mode 100644
index 0000000..1d05318
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_svm.c
@@ -0,0 +1,1102 @@
+/*
+ * Copyright 2013-2014 Intel Corporation
+ * All Rights Reserved.
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the
+ * "Software"), to deal in the Software without restriction, including
+ * without limitation the rights to use, copy, modify, merge, publish,
+ * distribute, sub license, and/or sell copies of the Software, and to
+ * permit persons to whom the Software is furnished to do so, subject to
+ * the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the
+ * next paragraph) shall be included in all copies or substantial portions
+ * of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS
+ * OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF
+ * MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT.
+ * IN NO EVENT SHALL TUNGSTEN GRAPHICS AND/OR ITS SUPPLIERS BE LIABLE FOR
+ * ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT,
+ * TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE
+ * SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+ */
+
+#include <linux/mmu_notifier.h>
+#include <drm/drmP.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
+#include "i915_trace.h"
+#include "intel_drv.h"
+
+/*
+ * The root tables have 256 entries, one per bus in the PCI domain.  Each
+ * of them points to a context table (or two context tables in the extended
+ * case), which have entries for each devfn in the domain.  So entry 0 in
+ * each table is devfn(0,0), entry 8 is devfn(1,0) etc.  For gfx, we just
+ * care about entry 16, which is devfn(2,0).
+ */
+
+/**
+ * legacy_root_table_entry - translation root table w/o extended features
+ * @present: entry present & valid
+ * @ctx_addr: address of a legacy_context_table_entry
+ */
+struct legacy_root_table_entry {
+	u64 present:1;
+	u64 rsvd2:11;
+	u64 ctx_addr:27;
+	u64 rsvd1:25;
+	u64 rsvd0;
+} __attribute__((packed));
+
+/**
+ * extended_root_table_entry - translation root table w/extended features
+ * @lo_present: low ctx addr entry present & valid
+ * @lo_ctx_addr: low bits of address of a extended_context_table_entry
+ * @hi_present: high ctx addr entry present & valid
+ * @hi_ctx_addr: high bits of address of a extended_context_table_entry
+ */
+struct extended_root_table_entry {
+	u64 lo_present:1;
+	u64 rsvd3:11;
+	u64 lo_ctx_addr:27;
+	u64 rsvd2:25;
+	u64 hi_present:1;
+	u64 rsvd1:11;
+	u64 hi_ctx_addr:27;
+	u64 rsvd0:25;
+} __attribute__((packed));
+
+/*
+ * Only untranslated requests walk the page tables, translated DMA requests and
+ * translation requests blocked
+ */
+#define LEGACY_TTYPE_UT_ONLY		0
+/* All types are translated with the page tables */
+#define LEGACY_TTYPE_ALL		1
+/*
+ * Untranslated requests are treated as passthrough (host physical ==
+ * guest physical) regardless of translation enable status.  Translated
+ * DMA requests and translation requests are blocked.
+ */
+#define LEGACY_TTYPE_UT_PT		2
+
+#define AGAW_30		0 /* 2 level page tables */
+#define AGAW_39		1 /* 3 level page tables */
+#define AGAW_48		2 /* 4 level page tables */
+#define AGAW_57		3 /* 5 level page tables */
+#define AGAW_64		4 /* 6 level page tables */
+
+/**
+ * legacy_context_table_entry - context table for a non-PASID context
+ * @present: entry present & valid
+ * @fault_disable: disable fault reporting of non-recoverable faults
+ * @translation_type: see legacy translation type defines
+ * @pgt_addr: address of a PML4 page table root
+ * @addr_width: see AGAW defines
+ * @domain_id: domain of request (not applicable to GFX)
+ */
+struct legacy_context_table_entry {
+	/* Bits 0:63 */
+	u64 present:1;
+	u64 fault_disable:1;
+	u64 translation_type:2;
+	u64 rsvd0:8;
+	u64 pgt_addr:27;
+	u64 rsvd1:25;
+	/* Bits 64:127 */
+	u64 addr_width:3;
+	u64 available:4;
+	u64 rsvd2:1;
+	u64 domain_id:16;
+	u64 rsvd3:40;
+} __attribute__((packed));
+
+
+/*
+ * Only untranslated requests walk the page tables, translated DMA requests and
+ * translation requests blocked
+ */
+#define EXTENDED_TTYPE_UT		0
+/*
+ * Untranslated and translation requets with or without PASID are translated,
+ * translated requests bypass translation.
+ */
+#define EXTENDED_TTYPE_UT_TR		1
+/*
+ * Untranslated requests w/o PASID bypass translation, all other requests
+ * blocked.
+ */
+#define EXTENDED_TTYPE_UT_PT		2
+/*
+ * Untranslated requests w/o PASID bypass, untranslated requests with PASID
+ * are translated, all other requests blocked.
+ */
+#define EXTENDED_TTYPE_UT_PT_PASID	4
+/*
+ * Untranslated and translation requests w/o PASID bypass, untranslated
+ * and translation requests w/PASID are translated.  Translated requests
+ * bypass.
+ */
+#define EXTENDED_TTYPE_UT_TR_PASID_PT	5
+/*
+ * Untranslated requests w/o PASID are blocked.  Untranslated requests with
+ * PASID are translated.  All other requests blocked.  N/A for gfx.
+ */
+#define EXTENDED_TTYPE_UT_PASID		6
+#define EXTENDED_TTYPE_UT_TR_PASID	7
+
+#define EXTENDED_MTYPE_UC		0
+#define EXTENDED_MTYPE_WB		6
+
+/**
+ * extended_context_table_entry - context table for a PASID context
+ * @present: entry present & valid
+ * @fault_disable: disable fault reporting of non-recoverable faults
+ * @translation_type: see extended translation type defines
+ * @ext_mem_type: extended memory type lookup requested
+ * @lazy_invalidate_en: allow deferred invalidate for PASIDs not in use
+ * @pg_req_en: report page requests through SW queue
+ * @nesting_en: allow second level page table walk through VT-d
+ * @slpt_addr: address of a PML4 page table root (if nesting_en set)
+ * @addr_width: see AGAW defines
+ * @pge: page global enable
+ * @nxe: no execute enable
+ * @wp: write protect
+ * @cd: cache disable
+ * @ext_mem_type_en: enable extended memory type lookup
+ * @domain_id: domain of request (not applicable to GFX)
+ * @smep: supervisor mode execution protection
+ * @sre: supervisor request enable
+ * @ere: execute request enable
+ * @slee: second level execute enable
+ * @sl64kpe: second level 64KG page enable
+ * @pat: page attribute table
+ * @pasid_table_size: size of PASID table (2^x+5 entries)
+ * @pasid_table_addr: address of PASID table
+ * @paid_state_table_addr: address of PASID state table
+ */
+struct extended_context_table_entry {
+	/* Bits 0:63 */
+	u64 present:1;
+	u64 fault_disable:1;
+	u64 translation_type:3;
+	u64 ext_mem_type:3;
+	u64 lazy_invalidate_en:1;
+	u64 pg_req_en:1;
+	u64 nesting_en:1;
+	u64 pasid_en:1;
+	u64 slpt_addr:27;
+	u64 rsvd0:25;
+	/* Bits 64:127 */
+	u64 addr_width:3;
+	u64 pge:1;
+	u64 nxe:1;
+	u64 wp:1;
+	u64 cd:1;
+	u64 ext_mem_type_en:1;
+	u64 domain_id:16;
+	u64 smep:1;
+	u64 sre:1;
+	u64 ere:1;
+	u64 slee:1;
+	u64 sl64kpe:1;
+	u64 rsvd1:3;
+	u64 pat:32;
+	/* Bits 128:191 */
+	u64 pasid_table_size:4;
+	u64 rsvd2:8;
+	u64 pasid_table_addr:27;
+	u64 rsvd3:25;
+	/* Bits 192:255 */
+	u64 rsvd4:12;
+	u64 pasid_state_table_addr:27;
+	u64 rsvd5:25;
+} __attribute__((packed));
+
+/**
+ * pasid_state_table_entry - state for a PASID
+ * @active_refcnt: active count managed by GPU hw
+ * @lazy_invalidate: use to defer invalidations, not used for gfx
+ */
+struct pasid_state_table_entry {
+	u64 rsvd0:32;
+	u64 active_refcnt:16;
+	u64 rsvd1:15;
+	u64 lazy_invalidate:1;
+} __attribute__((packed));
+
+/**
+ * pasid_table_entry - per-PASID PML4 and ancillary data
+ * @present: entry present & valid
+ * @pwt: write through (not used for gfx)
+ * @pcd: cache disable (not used for gfx)
+ * @eafe: extended access flag enable
+ * @fl64kpe: enable bit 11 IPS handling in first level page tables
+ * @pml4: physical addr of PML4 root of page tables
+ */
+struct pasid_table_entry {
+	u64 present:1;
+	u64 rsvd0:2;
+	u64 pwt:1;
+	u64 pcd:1;
+	u64 rsvd1:5;
+	u64 eafe:1;
+	u64 fl64kpe:1;
+	u64 pml4:27;
+	u64 rsvd2:25;
+} __attribute__((packed));
+
+/* Invalidation queue descriptors */
+#define CC_INV_DSC		1
+#define IOTLB_INV_DSC		2
+#define DEV_IOTLB_INV_DSC	3
+#define IEC_INV_DSC		4
+#define INV_WAIT_DSC		5
+#define EXT_IOTLB_INV_DSC	6
+#define PTC_INV_DSC		7
+#define EXT_DEV_IOTLB_INV_DSC	8
+struct inv_dsc {
+	u64 lo;
+	u64 hi;
+};
+
+#define INV_PAGES_WITH_PASID	0
+#define INV_NON_GLOBALS_WITH_PASID 1
+#define INV_NON_GLOBALS_ALL_PASIDS 2
+#define INV_GLOBAL		3
+struct iotlb_inv_dsc {
+	u64 dsc_type:4;
+	u64 g:2;
+	u64 dw:1;
+	u64 dr:1;
+	u64 rsvd1:8;
+	u64 did:16;
+	u64 rsvd2:32;
+	u64 am:6;
+	u64 ih:1;
+	u64 rsvd3:5;
+	u64 addr:52;
+};
+
+#define EXT_IOTLB_INV_G_ALL_PASIDS		0
+#define EXT_IOTLB_INV_G_ALL_PASIDS_NON_GLOBAL	1
+#define EXT_IOTLB_INV_G_PASID_NON_GLOBAL	2
+#define EXT_IOTLB_INV_G_PASID_PAGE_SELECT	3
+struct ext_iotlb_inv_dsc {
+	u64 dsc_type:4;
+	u64 g:2;
+	u64 rsvd1:10;
+	u64 did:16;
+	u64 pasid:20;
+	u64 rsvd2:12;
+	u64 am:6;
+	u64 ih:1;
+	u64 gl:1;
+	u64 rsvd3:4;
+	u64 addr:52;
+};
+
+struct dev_iotlb_inv_dsc {
+	u64 dsc_type:4;
+	u64 rsvd1:12;
+	u64 max_invs_pend:5;
+	u64 rsvd2:11;
+	u64 sid:16;
+	u64 rsvd3:16;
+	u64 s:1;
+	u64 rsvd4:11;
+	u64 addr:52;
+};
+
+#define PAGE_GRP_RESP_DSC	9
+#define PAGE_STREAM_RESP_DSC	10
+
+#define RESP_CODE_SUCCESS	0
+#define RESP_CODE_NOPAGE	1
+#define RESP_CODE_ERROR		15
+struct page_group_response_dsc {
+	u64 dsc_type:4;
+	u64 pasid_present:1;
+	u64 rsvd1:11;
+	u64 requestor_id:16;
+	u64 pasid:20;
+	u64 rsvd2:12;
+	u64 resp_code:4;
+	u64 rsvd3:28;
+	u64 private:23;
+	u64 prg_index:9;
+};
+
+/* Page request queue descriptor */
+struct page_request_dsc {
+	u64 srr:1;
+	u64 bof:1;
+	u64 pasid_present:1;
+	u64 lpig:1;
+	u64 pasid:20;
+	u64 bus:8;
+	u64 private:23;
+	u64 prg_index:9;
+	u64 rd_req:1;
+	u64 wr_req:1;
+	u64 exe_req:1;
+	u64 priv_req:1;
+	u64 devfn:8;
+	u64 addr:52;
+} __attribute((packed));
+
+/*
+ * TODO:
+ *  - variable size prq and invq rings
+ *  - split out root, context, and pasid table initialization, generalize
+ *  - sync object list in exec_mm to prevent execution
+ *  - fine grained command events with MI_USER_INTERRUPT and events
+ *  - error handling in page fault handler
+ *  - invalidation queue error handling
+ *  - debug support for page faults
+ *    - notification when in-use page is unmapped
+ *  - adv context cleanup
+ *    - allocate one on open by default
+ *    - split from intel_context
+ *  - VT-d nesting support
+ *    - push code to IOMMU layer
+ *  - PASID allocation & wraparound for invalidation handling
+ */
+#if 0
+void intel_iommu_tlb_flush(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	I915_WRITE(0x4260, 1);
+	if (wait_for(!(I915_READ(0x4260) & 1), 10))
+		DRM_ERROR("render TLB invalidate timed out\n");
+
+	I915_WRITE64(SVM_IOTLB, IOTLB_IVT | IOTLB_GLOBAL);
+	if (wait_for(!(I915_READ64(SVM_IOTLB) & IOTLB_IVT), 10))
+		DRM_ERROR("IOMMU TLB invalidate timed out\n");
+	I915_WRITE64(SVM_CCMD, CCMD_ICC | CCMD_CIRG_GLOBAL);
+	if (wait_for(!(I915_READ64(SVM_CCMD) & CCMD_ICC), 10))
+		DRM_ERROR("IOMMU context cache invalidate timed out\n");
+}
+#else
+static void ivq_write_ext_iotlb_inv_descriptor(struct drm_device *dev,
+					       struct ext_iotlb_inv_dsc *desc);
+
+void intel_iommu_tlb_flush(struct drm_device *dev)
+{
+	struct ext_iotlb_inv_dsc dsc = { 0 };
+
+	dsc.dsc_type = EXT_IOTLB_INV_DSC;
+	dsc.g = EXT_IOTLB_INV_G_ALL_PASIDS;
+	dsc.ih = 0;
+	ivq_write_ext_iotlb_inv_descriptor(dev, &dsc);
+}
+#endif
+
+static void prq_read_descriptor(struct drm_device *dev,
+				struct page_request_dsc *desc)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	u8 *prq_ring = dev_priv->svm.prq_ring;
+	struct page_request_dsc *addr;
+	int offset;
+
+	spin_lock(&dev_priv->svm.lock);
+	offset = I915_READ(SVM_PRQ_HEAD);
+	if (offset * sizeof(*addr) > PAGE_SIZE)
+		offset = 0;
+	prq_ring += offset;
+	addr = (struct page_request_dsc *)prq_ring;
+	*desc = *addr;
+	I915_WRITE(SVM_PRQ_HEAD, offset + sizeof(*addr));
+	spin_unlock(&dev_priv->svm.lock);
+}
+
+static void ivq_write_inv_descriptor(struct drm_device *dev,
+				     struct inv_dsc *desc)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	u8 *ivq_ring = dev_priv->svm.ivq_ring;
+	struct inv_dsc *addr;
+	int offset;
+
+	spin_lock(&dev_priv->svm.lock);
+	offset = I915_READ(SVM_IVQ_TAIL);
+	if (offset * sizeof(*addr) > PAGE_SIZE)
+		offset = 0;
+	ivq_ring += offset;
+	addr = (struct inv_dsc *)ivq_ring;
+	*addr = *desc;
+	I915_WRITE(SVM_IVQ_TAIL, offset + sizeof(*addr));
+	spin_unlock(&dev_priv->svm.lock);
+}
+
+static void ivq_write_ext_iotlb_inv_descriptor(struct drm_device *dev,
+					       struct ext_iotlb_inv_dsc *desc)
+{
+
+	ivq_write_inv_descriptor(dev, (struct inv_dsc *)desc);
+}
+
+static void ivq_write_resp_descriptor(struct drm_device *dev,
+				      struct page_group_response_dsc *desc)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	u8 *ivq_ring = dev_priv->svm.ivq_ring;
+	struct page_group_response_dsc *addr;
+	int offset;
+
+	spin_lock(&dev_priv->svm.lock);
+	offset = I915_READ(SVM_IVQ_TAIL);
+	if (offset * sizeof(*addr) > PAGE_SIZE)
+		offset = 0;
+	ivq_ring += offset;
+	addr = (struct page_group_response_dsc *)ivq_ring;
+	*addr = *desc;
+	I915_WRITE(SVM_IVQ_TAIL, offset + sizeof(*addr));
+	spin_unlock(&dev_priv->svm.lock);
+}
+
+static void gpu_mm_segv(struct task_struct *tsk, unsigned long address,
+			int si_code)
+{
+	siginfo_t info;
+
+	/* Need specific signal info here */
+	info.si_signo	= SIGSEGV;
+	info.si_errno	= EIO;
+	info.si_code	= si_code;
+	info.si_addr	= (void __user *)address;
+
+	force_sig_info(SIGSEGV, &info, tsk);
+}
+
+/*
+ * Read the fault descriptor and handle the fault:
+ *   get PML4 from PASID
+ *   get mm struct
+ *   get the vma
+ *   verify the address is valid
+ *   call handle_mm_fault after taking the mm->mmap_sem
+ */
+void intel_gpu_fault_work(struct work_struct *work)
+{
+	struct i915_svm_state *svm = container_of(work, struct i915_svm_state,
+						  work);
+	struct drm_i915_private *dev_priv =
+		container_of(svm, struct drm_i915_private, svm);
+	struct drm_device *dev = dev_priv->dev;
+	struct intel_ringbuffer *ringbuf;
+	struct page_request_dsc desc;
+	struct page_group_response_dsc resp;
+	struct intel_context *ctx;
+	struct task_struct *tsk;
+	struct mm_struct *mm;
+	struct vm_area_struct *vma;
+	u64 address;
+	int ret;
+
+	DRM_ERROR("PRQ updated, head 0x%08x, tail 0x%08x\n",
+		  I915_READ(SVM_PRQ_HEAD), I915_READ(SVM_PRQ_TAIL));
+	prq_read_descriptor(dev, &desc);
+	DRM_ERROR("page fault on addr 0x%016llx, PASID %d, srr %d\n",
+		  (u64)(desc.addr << PAGE_SHIFT), desc.pasid, desc.srr);
+
+	spin_lock(&dev_priv->svm.lock);
+	ctx = dev_priv->svm.pasid_ctx[desc.pasid];
+	tsk = ctx->tsk;
+	mm = tsk->mm;
+	address = desc.addr << PAGE_SHIFT;
+	ringbuf = ctx->engine[RCS].ringbuf;
+	spin_unlock(&dev_priv->svm.lock);
+
+	down_read_trylock(&mm->mmap_sem);
+	vma = find_extend_vma(mm, address);
+	if (!vma || address < vma->vm_start) {
+		DRM_ERROR("bad VMA or address out of range\n");
+		gpu_mm_segv(tsk, address, SEGV_MAPERR);
+		goto out_unlock; /* need to kill process */
+	}
+
+	ret = handle_mm_fault(mm, vma, address,
+			      desc.wr_req ? FAULT_FLAG_WRITE : 0);
+	if (ret & VM_FAULT_ERROR) {
+		gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
+		goto out_unlock;
+	}
+
+	if (ret & VM_FAULT_MAJOR)
+		tsk->maj_flt++;
+	else
+		tsk->min_flt++;
+
+	if (desc.srr)
+		resp.dsc_type = PAGE_STREAM_RESP_DSC;
+	else
+		resp.dsc_type = PAGE_GRP_RESP_DSC;
+	resp.pasid = desc.pasid;
+	resp.pasid_present = 1;
+	resp.requestor_id = PCI_DEVID(0, PCI_DEVFN(2,0));
+	resp.resp_code = RESP_CODE_SUCCESS;
+	resp.prg_index = desc.prg_index;
+	resp.private = desc.private;
+	ivq_write_resp_descriptor(dev, &resp);
+out_unlock:
+	up_read(&mm->mmap_sem);
+
+	/* FIXME: wait for page response to be serviced */
+
+	/* FIXME: queue context for re-submit */
+	/* execlists_context_queue(req); */
+}
+
+/*
+ * SVM with Intel GPUs
+ *
+ * On BDW and newer GPUs, the GPU can use x86 page tables to convert linear
+ * addresses to physical addresses for GPU activity.  This allows applications
+ * to simply malloc() buffers for use on the GPU (whether command buffers,
+ * state, source data, or destination data).
+ *
+ * We accomplish this sharing one of two ways for a given mm_struct:
+ *   1) copy the page tables and manage the mm_struct with mmu_notifier
+ *      callbacks
+ *   2) use the current->mm page tables directly, synchronizing with
+ *      TLB shootdowns and updates on the host with the mmu_notifiers
+ *
+ * In case (1) we can delay page table updates until GPU activity on a given
+ * page or range is complete to make debug on the application side easier.
+ * TLB shootdowns will occur only on the GPU after any outstanding activity
+ * has completed and page tables have been updated.
+ *
+ * In case (2) we must not block, since threads on the CPU will be concurrently
+ * accessing the data, and we don't want to delay them.  This may mean shooting
+ * down GPU activity currently in progress, either by preempting the current
+ * batch or by removing commands from a ringbuffer where we previously
+ * queued them.
+ *
+ * Note that in either of these cases, CPU threads changing the memory
+ * map out from under the GPU is highly likely to be a programming error,
+ * just as it is when a CPU thread modifies the mapping of a virtual
+ * address space in use by another thread.
+ *
+ * We track the mmu notifier bits using struct intel_mm_struct, which has
+ * a one to many relationship with intel_context (i.e. multiple contexts
+ * can exist and share the same address space).  Thus we refcount the
+ * struct and only destroy it when the final reference is dropped.
+ */
+
+/* Make sure GPU writes can't hit the mm that's about to go away */
+static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
+{
+	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
+						   notifier);
+	struct drm_i915_private *dev_priv = ims->dev_priv;
+	struct drm_device *dev = dev_priv->dev;
+	struct intel_context *ctx;
+
+	/*
+	 * Wait for any outstanding activity and unbind the mm.  Since
+	 * each context has its own ring, we can simply wait for the ring
+	 * to idle before invalidating the PASID and flushing the TLB.
+	 */
+	mutex_lock(&dev->struct_mutex);
+	list_for_each_entry(ctx, &ims->context_list, mm_list) {
+		intel_ring_idle(ctx->engine[RCS].ringbuf->ring);
+	}
+
+	intel_iommu_tlb_flush(dev_priv->dev);
+	mutex_unlock(&dev->struct_mutex);
+}
+
+static void intel_flush_page_locked(struct drm_device *dev, int pasid,
+				    unsigned long address)
+{
+	struct ext_iotlb_inv_dsc dsc = { 0 };
+
+	dsc.dsc_type = EXT_IOTLB_INV_DSC;
+	dsc.g = EXT_IOTLB_INV_G_PASID_PAGE_SELECT;
+	dsc.pasid = pasid;
+	dsc.ih = 0;
+	dsc.addr = address;
+	dsc.am = 1;
+	ivq_write_ext_iotlb_inv_descriptor(dev, &dsc);
+}
+
+static void intel_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
+			     unsigned long address, pte_t pte)
+{
+	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
+						   notifier);
+	struct drm_i915_private *dev_priv = ims->dev_priv;
+	struct drm_device *dev = dev_priv->dev;
+
+	struct intel_context *ctx;
+
+	mutex_lock(&dev->struct_mutex);
+	list_for_each_entry(ctx, &ims->context_list, mm_list)
+		intel_flush_page_locked(dev, ctx->pasid, address);
+	mutex_unlock(&dev->struct_mutex);
+}
+
+static void intel_invalidate_page(struct mmu_notifier *mn,
+				  struct mm_struct *mm,
+				  unsigned long address)
+{
+	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
+						   notifier);
+	struct drm_i915_private *dev_priv = ims->dev_priv;
+	struct drm_device *dev = dev_priv->dev;
+	struct intel_context *ctx;
+
+	mutex_lock(&dev->struct_mutex);
+	list_for_each_entry(ctx, &ims->context_list, mm_list)
+		intel_flush_page_locked(dev, ctx->pasid, address);
+	mutex_unlock(&dev->struct_mutex);
+}
+
+/* Need to unmap this range and make sure it doesn't get re-faulted */
+static void intel_invalidate_range_start(struct mmu_notifier *mn,
+					 struct mm_struct *mm,
+					 unsigned long start, unsigned long end)
+{
+	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
+						   notifier);
+	struct drm_i915_private *dev_priv = ims->dev_priv;
+	struct drm_device *dev = dev_priv->dev;
+
+	/* FIXME: invalidate page only */
+	intel_iommu_tlb_flush(dev);
+}
+
+/* Pages have been freed at this point */
+static void intel_invalidate_range_end(struct mmu_notifier *mn,
+				       struct mm_struct *mm,
+				       unsigned long start, unsigned long end)
+{
+	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
+						   notifier);
+	struct drm_i915_private *dev_priv = ims->dev_priv;
+	struct drm_device *dev = dev_priv->dev;
+
+	/* FIXME: invalidate page only */
+	intel_iommu_tlb_flush(dev);
+}
+
+static const struct mmu_notifier_ops intel_mmuops = {
+	.release = intel_mm_release,
+	/* no clear_flush_young, we just share the x86 bits */
+	/* no test_young, we just share the x86 bits */
+	.change_pte = intel_change_pte,
+	.invalidate_page = intel_invalidate_page,
+	.invalidate_range_start = intel_invalidate_range_start,
+	.invalidate_range_end = intel_invalidate_range_end,
+};
+
+struct intel_mm_struct *intel_bind_mm(struct drm_device *dev,
+				      struct intel_context *ctx)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct intel_mm_struct *ims;
+	struct mmu_notifier *mn;
+	int ret;
+
+	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
+
+	mn = mmu_find_ops(current->mm, &intel_mmuops);
+	if (mn) {
+		ims = container_of(mn, struct intel_mm_struct, notifier);
+		kref_get(&ims->kref);
+		goto out;
+	}
+
+	ims = kzalloc(sizeof(*ims), GFP_KERNEL);
+	if (!ims) {
+		ret = -ENOMEM;
+		goto error;
+	}
+	INIT_LIST_HEAD(&ims->context_list);
+
+	ims->notifier.ops = &intel_mmuops;
+
+	ret = mmu_notifier_register(&ims->notifier, current->mm);
+	if (ret)
+		goto error;
+
+	ims->dev_priv = dev->dev_private;
+
+out:
+	list_add(&ctx->mm_list, &ims->context_list);
+	return ims;
+error:
+	kfree(ims);
+	return ERR_PTR(ret);
+}
+
+static void intel_mm_free(struct kref *ims_ref)
+{
+	struct intel_mm_struct *ims =
+		container_of(ims_ref, struct intel_mm_struct, kref);
+
+	mmu_notifier_unregister(&ims->notifier, current->mm);
+	kfree(ims);
+}
+
+void intel_unbind_mm(struct intel_context *ctx)
+{
+	struct drm_i915_private *dev_priv = ctx->ims->dev_priv;
+
+	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
+
+	list_del(&ctx->mm_list);
+	kref_put(&ctx->ims->kref, intel_mm_free);
+
+	return;
+}
+
+int intel_exec_mm_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *file)
+{
+//	struct drm_i915_exec_mm *exec_mm = data;
+//	struct drm_i915_private *dev_priv = dev->dev_private;
+
+	/* Load new context into context reg */
+	return 0;
+}
+
+/*
+ * The PASID table has 32 entries in the current config, rotate through
+ * them as needed.
+ */
+int intel_alloc_pasid(struct drm_device *dev, struct intel_context *ctx)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct pasid_table_entry *table;
+	int i;
+
+	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
+
+	spin_lock(&dev_priv->svm.lock);
+	table = dev_priv->svm.pasid_table;
+
+	for (i = 0; i < PASID_COUNT; i++) {
+		if (!table[i].present)
+			goto found;
+	}
+
+	spin_unlock(&dev_priv->svm.lock);
+	return -1;
+
+found:
+	table[i].pml4 = __pa(current->mm->pgd) >> PAGE_SHIFT;
+	table[i].present = 1;
+
+	ctx->pasid = i;
+	dev_priv->svm.pasid_ctx[ctx->pasid] = NULL;
+	spin_unlock(&dev_priv->svm.lock);
+
+	intel_iommu_tlb_flush(dev);
+
+	return 0;
+}
+
+void intel_free_pasid(struct drm_device *dev, struct intel_context *ctx)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct pasid_table_entry *table;
+
+	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
+
+	if (ctx->pasid >= PASID_COUNT)
+		return;
+
+	spin_lock(&dev_priv->svm.lock);
+	table = dev_priv->svm.pasid_table;
+	memset(&table[ctx->pasid], 0, sizeof(struct pasid_table_entry));
+	dev_priv->svm.pasid_ctx[ctx->pasid] = NULL;
+	ctx->pasid = -1;
+	spin_unlock(&dev_priv->svm.lock);
+
+	intel_iommu_tlb_flush(dev);
+}
+
+/*
+ * Each root table entry is 16 bytes wide.  In legacy mode, only
+ * the lower 64 bits are used:
+ *   Bits 38:12: context table pointer
+ *   Bit 0: present
+ *   all other bits reserved
+ * In extended mode (what we use for SVM):
+ *   Bits 102:76: upper context table pointer
+ *   Bit 64: upper present
+ *   Bits 38:12: lower context table pointer
+ *   Bit 0: lower present
+ *   all other bits reserved
+ *
+ * The context entries are 128 bit in legacy mode:
+ *   Bits 87:72: Domain ID
+ *   Bits 70:67: Available
+ *   Bits 66:64: Address width
+ *   Bits 38:12: Page table pointer
+ *   Bits 3:2: Translation type
+ *     00 - only untranslated DMA requests go through this table
+ *          translated and translation requests are blocked
+ *     01 - untranslated, translated, and translation requests supported
+ *     10 - untranslated requests are treated as pass through (HPA == GPA),
+ *          translated DMA requests and translation requests are blocked
+ *     11 - reserved
+ *   Bit 1: fault disable
+ *   Bit 0: Present
+ * and 256 bit in extended:
+ *   Bits 230:204: PASID state table pointer
+ *   Bits 166:140: PASID table pointer
+ *   Bits 131:128: PASID table size
+ *   Bits 127:96: Page table attribute (PAT)
+ *   Bit 92: SL64KPE
+ *   Bit 91: SLEE
+ *   Bit 90: ERE
+ *   Bit 89: SRE
+ *   Bit 88: SMEP
+ *   Bits 87:72: Domain ID
+ *   Bit 71: Extended memory type enable
+ *   Bit 70: cache disable (CD)
+ *   Bit 69: write protect (WP)
+ *   Bit 68: no execute enable (NXE)
+ *   Bit 67: page global enable (PGE)
+ *   Bits 66:64: address width
+ *   Bits 38:12: 2nd level (VT-d) page table pointer
+ *   Bit 11: PASID enable
+ *   Bit 10: Nesting enable
+ *   Bit 9: Page Request enable
+ *   Bit 8: Lazy-Invalidate enable
+ *   Bits 7:5: Extended Memory Type (VT-d)
+ *   Bits 4:2: Translation type
+ *     000 - Only Untranslated DMA requests are translated through this page
+ *           table. Translated DMA requests and Translation Requests are
+ *           blocked.  Untranslated requests-without-PASID are remapped using
+ *           the second-level page-table referenced through SLPTPTR field.
+ *           If PASIDE field is Set, Untranslated requests-with-PASID are
+ *           remapped using the PASID Table referenced through PASIDPTPTR
+ *           field. If PASIDE field is Clear, Untranslated requests-with-PASID
+ *           are blocked.  Translation requests (with or without PASID), and
+ *           Translated Requests are blocked.
+ *     001 - Un-translated and Translation requests without PASID supported
+ *           (and with PASID supported, if PASID Enable Set); Translate
+ *           requests bypass address translation.  Untranslated
+ *           requests-without-PASID and Translation requests-without-PASID are
+ *           remapped using the second level page-table referenced through
+ *           SLPTPTR field. If PASIDE field is Set, Untranslated
+ *           requests-with-PASID and Translation requests-with-PASID are
+ *           remapped using the PASID Table referenced through PASIDPTPTR
+ *           field. If PASIDE field is Clear, Untranslated requests-with-PASID,
+ *           and Translation requests-with-PASID, are blocked. Translated
+ *           requests bypass address translation.
+ *     010 - If Pass-through Supported (GT supports pass-through),
+ *           Un-translated requests without PASID bypass address translation;
+ *           All other requests (with or without PASID) blocked. Untranslated
+ *           requests-without-PASID bypass address translation and are
+ *           processed as passthrough. SLPTPTR field is ignored by hardware.
+ *           Untranslated requests-with-PASID, Translation requests (with or
+ *           without PASID), and Translated requests are blocked.
+ *     011 - Reserved.
+ *     100 - Un-translated requests without PASID bypass address translation;
+ *           Un-translated requests with PASID supported, if PASID Enable Set;
+ *           All other requests blocked. Untranslated requests-without-PASID
+ *           bypass address translation and are processed as passthrough.
+ *           SLPTPTR field is ignored by hardware. Untranslated
+ *           requests-with-PASID are remapped using the PASID Table referenced
+ *           through PASIDPTPTR field. Translation requests (with or without
+ *           PASID) and Translated requests are blocked.
+ *     101 - Un-translated and Translation requests without PASID bypass
+ *           address translation; Un-translated and Translation requests with
+ *           PASID supported, if PASID Enable Set; Translated requests bypass
+ *           address translation.  Untranslated requests-without-PASID bypass
+ *           address translation and are processed as passthrough. SLPTPTR
+ *           field is ignored by hardware.  Translation requests-without-PASID
+ *           are responded with Untranslated access only bit Set (U=1) along
+ *           with read and write permissions (R=W=1). SLPTPTR field is ignored
+ *           by hardware. Untranslated requests-with-PASID, and Translation
+ *           requests-with-PASID are remapped using the PASID Table referenced
+ *           through PASIDPTPTR field.  Translated requests bypass address
+ *           translation.
+ *     110 - Un-translated requests without PASID are blocked; Un-translated
+ *           requests with PASID supported, if PASID Enable Set; All other
+ *           requests blocked – Not applicable to GFX, GT should treat this as
+ *           reserved.
+ *     111 - Un-translated and Translation requests without PASID blocked;
+ *           Un-translated and Translation requests with PASID supported, if
+ *           PASID Enable Set; Translated requests bypass address translation.
+ *           Note: Not applicable to GFX, GT should treat this as reserved.
+ *   Bit 1: Fault disable
+ *   Bit 0: Present
+ *
+ * Page walks for graphics addresses can go through one or two levels of
+ * translation, depending on whether VT-d is enabled.
+ *
+ * If we're in driver mode (currently the only supported mode), we always
+ * use a single level of translation, meaning the second level page table
+ * pointer (if present) is ignored.
+ *
+ * The full walk starts at the root table, which indexes into the upper
+ * and lower context tables.  Those tables point to PASID mapping and state
+ * tables and potentially a second level page table for VT-d (which, as noted
+ * above, is unused currently).  The PASID mapping table points to a PML4
+ * (x86 compatible) page table, while the state table indicates other
+ * information about the PASID involved in the request, which ultimately comes
+ * from the execlist port submission of the context descriptor.
+ *
+ * To enable a shared CPU/GPU address space, we can use a couple of different
+ * translation types, either 101 or 01 w/o nesting.  The main requirement
+ * is that requests with PASID are translated through the page tables provided,
+ * potentially with nesting if we're running in a VT-d context (which we
+ * don't currently support).
+ */
+#define CONTEXT_OFFSET (PAGE_SIZE * 1)
+#define PASID_OFFSET (PAGE_SIZE * 2)
+#define PASID_STATE_OFFSET (PAGE_SIZE * 3)
+#define PRQ_OFFSET (PAGE_SIZE * 4)
+#define IVQ_OFFSET (PAGE_SIZE * 5)
+static void intel_init_svm_root_table(struct drm_device *dev,
+				      drm_dma_handle_t *tables)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct extended_root_table_entry *root_table;
+	struct extended_context_table_entry *context;
+	struct pasid_table_entry *pasid_table;
+	struct pasid_state_table_entry *pasid_state_table;
+	u64 *tmp;
+
+	root_table = tables->vaddr;
+	context = tables->vaddr + CONTEXT_OFFSET;
+        pasid_table = tables->vaddr + PASID_OFFSET;
+	pasid_state_table = tables->vaddr + PASID_STATE_OFFSET;
+
+	DRM_ERROR("programmed PASID table, vaddr %p, busaddr 0x%16llx\n",
+		  pasid_table, tables->busaddr + PASID_OFFSET);
+
+	/* Context entry for gfx device */
+	context[16].pat = 0x66666666;
+	context[16].ere = 1;
+	context[16].sre = 1;
+	context[16].smep = 1;
+	context[16].domain_id = 1;
+	context[16].addr_width = AGAW_48; /* full x86 walk */
+	context[16].pasid_en = 1;
+	context[16].nesting_en = 0; /* not yet */
+	context[16].pg_req_en = 1;
+	context[16].lazy_invalidate_en = 1;
+	context[16].ext_mem_type = EXTENDED_MTYPE_WB;
+	context[16].translation_type = EXTENDED_TTYPE_UT_TR_PASID_PT;
+	context[16].fault_disable = 0;
+	context[16].present = 1;
+	context[16].pasid_state_table_addr = (tables->busaddr + PASID_STATE_OFFSET) >> PAGE_SHIFT;
+	context[16].pasid_table_addr = (tables->busaddr + PASID_OFFSET) >>
+		PAGE_SHIFT;
+	context[16].pasid_table_size = 0; /* 2^(5+x) */
+
+	tmp = (u64 *)&context[16];
+	DRM_ERROR("root entry: 0x%016llx%016llx\n", tmp[1], tmp[0]);
+
+	DRM_ERROR("programmed context table, vaddr %p, busaddr 0x%16llx\n",
+		  context, tables->busaddr + CONTEXT_OFFSET);
+
+	/* Root table */
+	root_table[0].lo_ctx_addr = (tables->busaddr + CONTEXT_OFFSET) >>
+		PAGE_SHIFT;
+	root_table[0].lo_present = 1;
+	root_table[0].hi_present = 0;
+
+	tmp = (u64 *)&root_table[0];
+	DRM_ERROR("root entry: 0x%016llx%016llx\n", tmp[1], tmp[0]);
+
+	dev_priv->svm.root_table = root_table;
+	dev_priv->svm.context = context;
+        dev_priv->svm.pasid_table = pasid_table;
+	dev_priv->svm.pasid_state_table = pasid_state_table;
+	dev_priv->svm.prq_ring = tables->vaddr + PRQ_OFFSET;
+	dev_priv->svm.ivq_ring = tables->vaddr + IVQ_OFFSET;
+
+	/* Enable the page request queue */
+	I915_WRITE64(SVM_PRQA, tables->busaddr + PRQ_OFFSET);
+	I915_WRITE(SVM_PRQ_HEAD, 0);
+	I915_WRITE(SVM_PRQ_TAIL, 0);
+	I915_WRITE(SVM_PRECTL, 0);
+
+	/* Set up the invalidation request queue */
+	I915_WRITE64(SVM_IQA, tables->busaddr + IVQ_OFFSET);
+	I915_WRITE(SVM_IVQ_HEAD, 0);
+	I915_WRITE(SVM_IVQ_TAIL, 0);
+	I915_WRITE(SVM_IECTL, 0);
+
+	I915_WRITE(SVM_GCMD, GCMD_QIE);
+	if (wait_for(I915_READ(SVM_GSTS) & GSTS_QIES, 500))
+		DRM_ERROR("timed out waiting for queued invalidation enable\n");
+
+	/* All set, program the root */
+	I915_WRITE(SVM_RTADDR, tables->busaddr | SVM_RTT_TYPE_EXT);
+
+	I915_WRITE(SVM_GCMD, GCMD_SRTP);
+	if (wait_for(I915_READ(SVM_GSTS) & GSTS_RTPS, 500))
+		DRM_ERROR("timed out waiting for root table to load\n");
+
+	DRM_ERROR("programmed SVM root, vaddr %p, busaddr 0x%16llx\n",
+		  tables->vaddr, tables->busaddr);
+
+	intel_iommu_tlb_flush(dev);
+}
+
+/*
+ * Probe for SVM capability.  If found:
+ *  - try to switch to driver mode
+ *  - set up root PASID table
+ *  - enable page fault and error handling interrupts
+ *  - allow SVM ioctls
+ */
+void intel_init_svm(struct drm_device *dev)
+{
+	struct drm_i915_private *dev_priv = dev->dev_private;
+	drm_dma_handle_t *tables;
+	u32 dev_mode;
+	int num_tables = 6;
+
+	dev_mode = I915_READ(BDW_SVM_DEV_MODE_CNFG);
+	I915_WRITE(BDW_SVM_DEV_MODE_CNFG, dev_mode | BDW_SVM_MODE_DRIVER);
+	dev_mode = I915_READ(BDW_SVM_DEV_MODE_CNFG);
+#if defined(CONFIG_INTEL_IOMMU) || defined(IOMMU_SUPPORT)
+#error must disable IOMMU support
+#endif
+	if (!dev_mode & BDW_SVM_MODE_DRIVER) {
+		DRM_ERROR("driver mode not available, disabling SVM\n");
+		goto err;
+	}
+
+	tables = drm_pci_alloc(dev, PAGE_SIZE*num_tables, PAGE_SIZE);
+	if (!tables) {
+		DRM_ERROR("table alloc failed, disabling SVM\n");
+		goto err;
+	}
+
+	memset(tables->vaddr, 0, PAGE_SIZE*num_tables);
+
+	intel_init_svm_root_table(dev, tables);
+
+	spin_lock_init(&dev_priv->svm.lock);
+
+#if 0
+	I915_WRITE(SVM_GCMD, GCMD_TE);
+	if (wait_for(I915_READ(SVM_GSTS) & GSTS_TES, 500))
+		DRM_ERROR("timed out waiting for translation enable\n");
+#endif
+	INIT_WORK(&dev_priv->svm.work, intel_gpu_fault_work);
+
+	DRM_ERROR("SVM driver mode enabled\n");
+	dev_priv->svm.svm_available = true;
+	return;
+
+err:
+	dev_priv->svm.svm_available = false;
+	return;
+}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 40cbba4..1450491 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -217,6 +217,7 @@ enum {
 	FAULT_AND_STREAM,
 	FAULT_AND_CONTINUE /* Unsupported */
 };
+#define GEN8_CTX_FAULT_SHIFT 6
 #define GEN8_CTX_ID_SHIFT 32
 #define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
 
@@ -289,12 +290,21 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
 
 	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
-	if (IS_GEN8(ctx_obj->base.dev))
-		desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+	if (ctx->is_svm) {
+		desc |= ADVANCED_CONTEXT << GEN8_CTX_ADDRESSING_MODE_SHIFT;
+		desc |= FAULT_AND_STREAM << GEN8_CTX_FAULT_SHIFT;
+		desc |= lrca;
+		desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
+	} else {
+		desc |= GEN8_CTX_ADDRESSING_MODE(dev) <<
+			GEN8_CTX_ADDRESSING_MODE_SHIFT;
+		if (IS_GEN8(ctx_obj->base.dev))
+			desc |= GEN8_CTX_L3LLC_COHERENT;
+		desc |= GEN8_CTX_PRIVILEGE;
+		desc |= lrca;
+		desc |= (u64)intel_execlists_ctx_id(ctx_obj) <<
+			GEN8_CTX_ID_SHIFT;
+	}
 
 	/* TODO: WaDisableLiteRestore when we start using semaphore
 	 * signalling between Command Streamers */
@@ -545,7 +555,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 		   _MASKED_FIELD(0x07 << 8, ((u32)ring->next_context_status_buffer & 0x07) << 8));
 }
 
-static int execlists_context_queue(struct drm_i915_gem_request *request)
+int execlists_context_queue(struct drm_i915_gem_request *request)
 {
 	struct intel_engine_cs *ring = request->ring;
 	struct drm_i915_gem_request *cursor;
@@ -2273,31 +2283,40 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
 	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
 	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
 	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
-	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
-	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
-	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
-	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
-	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
-	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
-	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
-	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
-
-	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
-		/* 64b PPGTT (48bit canonical)
-		 * PDP0_DESCRIPTOR contains the base address to PML4 and
-		 * other PDP Descriptors are ignored.
-		 */
-		ASSIGN_CTX_PML4(ppgtt, reg_state);
+
+	if (ctx->is_svm) {
+		reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
+		reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
+		reg_state[CTX_PDP0_UDW+1] = 0;
+		reg_state[CTX_PDP0_LDW+1] = ctx->pasid;
 	} else {
-		/* 32b PPGTT
-		 * PDP*_DESCRIPTOR contains the base address of space supported.
-		 * With dynamic page allocation, PDPs may not be allocated at
-		 * this point. Point the unallocated PDPs to the scratch page
-		 */
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
-		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+		reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
+		reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
+		reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
+		reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
+		reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
+		reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
+		reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
+		reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
+
+		if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
+			/* 64b PPGTT (48bit canonical)
+			 * PDP0_DESCRIPTOR contains the base address to PML4 and
+			 * other PDP Descriptors are ignored.
+			 */
+			ASSIGN_CTX_PML4(ppgtt, reg_state);
+		} else {
+			/* 32b PPGTT
+			 * PDP*_DESCRIPTOR contains the base address of space
+			 * supported. With dynamic page allocation, PDPs may
+			 * not be allocated at this point. Point the
+			 * unallocated PDPs to the scratch page
+			 */
+			ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
+			ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
+			ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
+			ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
+		}
 	}
 
 	if (ring->id == RCS) {
@@ -2327,6 +2346,12 @@ void intel_lr_context_free(struct intel_context *ctx)
 {
 	int i;
 
+        if (ctx->is_svm) {
+                intel_free_pasid(ctx->ims->dev_priv->dev, ctx);
+                intel_unbind_mm(ctx);
+		put_task_struct(ctx->tsk);
+       }
+
 	for (i = 0; i < I915_NUM_RINGS; i++) {
 		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
 
@@ -2480,6 +2505,37 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 
 	}
 
+	if (ctx->is_svm) {
+		/* FIXME: just skip here, don't bail and trash the ctx */
+		if (ring->id != RCS) {
+			DRM_DEBUG_DRIVER("svm context only allowed on RCS\n");
+			ret = -EINVAL;
+			goto error_destroy_rbuf;
+		}
+
+		ret = intel_alloc_pasid(dev, ctx);
+		if (ret) {
+			DRM_ERROR("pasid alloc fail: %d\n", ret);
+			ret = -ENOSPC;
+			kfree(ring);
+			goto error_destroy_rbuf;
+		}
+
+		ctx->ims = intel_bind_mm(dev, ctx);
+		if (IS_ERR(ctx->ims)) {
+			intel_free_pasid(dev, ctx);
+			DRM_ERROR("bind mm call failed: %ld\n",
+				  PTR_ERR(ctx->ims));
+			ret = PTR_ERR(ctx->ims);
+			kfree(ring);
+			goto error_destroy_rbuf;
+		}
+
+		ctx->tsk = current;
+		get_task_struct(current);
+		ctx->ims->dev_priv->svm.pasid_ctx[ctx->pasid] = ctx;
+	}
+
 	ret = populate_lr_context(ctx, ctx_obj, ring, ringbuf);
 	if (ret) {
 		DRM_DEBUG_DRIVER("Failed to populate LRC: %d\n", ret);
@@ -2520,6 +2576,8 @@ error:
 	if (is_global_default_ctx)
 		intel_unpin_ringbuffer_obj(ringbuf);
 error_destroy_rbuf:
+       if (ctx->is_svm)
+                put_task_struct(current);
 	intel_destroy_ringbuffer_obj(ringbuf);
 error_free_rbuf:
 	kfree(ringbuf);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4cc54b3..dcaa65b 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -93,5 +93,6 @@ u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
+int execlists_context_queue(struct drm_i915_gem_request *request);
 
 #endif /* _INTEL_LRC_H_ */
-- 
1.9.1


[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 7/9] drm/i915: add fences to the request struct
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (5 preceding siblings ...)
  2015-09-04 16:59 ` [PATCH 6/9] drm/i915: driver based PASID handling Jesse Barnes
@ 2015-09-04 16:59 ` Jesse Barnes
  2015-10-09 13:29   ` David Woodhouse
  2015-09-04 16:59 ` [PATCH 8/9] drm/i915: Android sync points for i915 v4 (obsolete) Jesse Barnes
                   ` (3 subsequent siblings)
  10 siblings, 1 reply; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

This simplifies the sync code quite a bit.  I don't think we'll be able
to get away with using the core fence code's seqno support, since we'll
be moving away from simple seqno comparisions with the scheduler and
preemption, but the additional code is pretty minimal anyway, and lets
us add additional debugging as needed, so it's probably fine to keep
either way.

We still need to add support for other rings here; we ought to be able
to do that with the timeline field of the ioctl (which will include
other "rings" like the display flip queue for example).

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_drv.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ca38a7a..f4a363d 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -51,6 +51,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/fence.h>
 #include "intel_guc.h"
 
 /* General customization:
@@ -2286,6 +2287,10 @@ struct drm_i915_gem_request {
 	/** Execlists no. of times this request has been sent to the ELSP */
 	int elsp_submitted;
 
+	/* core fence obj for this request, may be exported */
+	struct fence fence;
+
+	wait_queue_t wait;
 };
 
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 8/9] drm/i915: Android sync points for i915 v4 (obsolete)
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (6 preceding siblings ...)
  2015-09-04 16:59 ` [PATCH 7/9] drm/i915: add fences to the request struct Jesse Barnes
@ 2015-09-04 16:59 ` Jesse Barnes
  2015-09-04 16:59 ` [PATCH 9/9] drm/i915: add bufferless execbuf ioctl Jesse Barnes
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 15305 bytes --]

Allow for sync point exposure in the i915 driver.  This

There are a couple of goals here:
  1) allow applications and libraries to create fences without an
     associated buffer
  2) re-use a common API so userspace doesn't have to impedance mismatch
     between different driver implementations too much
  3) allow applications and libraries to use explicit synchronization if
     they choose by exposing fences directly

v2: use struct fence directly using Maarten's new interface
v3: use new i915 request structure (Jesse)
    make i915 fences a compile time option since Android support is still
    in staging (Jesse)
    check for request complete after arming IRQs (Chris)
v4: "fix" for latest requeset handling changes, request fences (Jesse)

Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/Kconfig     |  14 ++
 drivers/gpu/drm/i915/Makefile    |   1 +
 drivers/gpu/drm/i915/i915_drv.h  |  20 +++
 drivers/gpu/drm/i915/i915_gem.c  |   4 +
 drivers/gpu/drm/i915/i915_irq.c  |   4 +-
 drivers/gpu/drm/i915/i915_sync.c | 334 +++++++++++++++++++++++++++++++++++++++
 6 files changed, 376 insertions(+), 1 deletion(-)
 create mode 100644 drivers/gpu/drm/i915/i915_sync.c

diff --git a/drivers/gpu/drm/i915/Kconfig b/drivers/gpu/drm/i915/Kconfig
index 051eab3..8176aa1 100644
--- a/drivers/gpu/drm/i915/Kconfig
+++ b/drivers/gpu/drm/i915/Kconfig
@@ -36,6 +36,20 @@ config DRM_I915
 	  i810 driver instead, and the Atom z5xx series has an entirely
 	  different implementation.
 
+config DRM_I915_SYNC
+	bool "Enable explicit sync support"
+	depends on DRM_I915
+	depends on STAGING
+	depends on ANDROID
+	depends on SYNC
+	default y
+	help
+	  Choose this option to enable Android native sync support and the
+	  corresponding i915 driver code to expose it.  Slightly increases
+	  driver size and pulls in sync support from staging.
+
+	  If in doubt, say "Y".
+
 config DRM_I915_PRELIMINARY_HW_SUPPORT
 	bool "Enable preliminary support for prerelease Intel hardware by default"
 	depends on DRM_I915
diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
index e4883a7..5a1f260 100644
--- a/drivers/gpu/drm/i915/Makefile
+++ b/drivers/gpu/drm/i915/Makefile
@@ -68,6 +68,7 @@ i915-y += intel_audio.o \
 	  intel_sprite.o
 i915-$(CONFIG_ACPI)		+= intel_acpi.o intel_opregion.o
 i915-$(CONFIG_DRM_FBDEV_EMULATION)	+= intel_fbdev.o
+i915-$(CONFIG_DRM_I915_SYNC)	+= i915_sync.o
 
 # modesetting output/encoder code
 i915-y += dvo_ch7017.o \
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index f4a363d..0c3047b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1736,6 +1736,8 @@ struct i915_execbuffer_params {
 	struct drm_i915_gem_request     *request;
 };
 
+struct i915_sync_timeline;
+
 struct drm_i915_private {
 	struct drm_device *dev;
 	struct kmem_cache *objects;
@@ -1781,6 +1783,9 @@ struct drm_i915_private {
 	uint32_t last_seqno, next_seqno;
 
 	struct drm_dma_handle *status_page_dmah;
+
+	struct i915_sync_timeline *sync_tl[I915_NUM_RINGS];
+
 	struct resource mch_res;
 
 	/* protects the irq masks */
@@ -2850,6 +2855,21 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #define PIN_USER	(1<<4)
 #define PIN_UPDATE	(1<<5)
 #define PIN_OFFSET_MASK (~4095)
+
+/* i915_sync.c */
+#ifdef CONFIG_DRM_I915_SYNC
+int i915_sync_init(struct drm_i915_private *dev_priv);
+void i915_sync_fini(struct drm_i915_private *dev_priv);
+int i915_sync_create_fence_ioctl(struct drm_device *dev, void *data,
+				 struct drm_file *file);
+#else
+static inline int i915_sync_init(struct drm_i915_private *dev_priv)
+{
+	return 0;
+}
+static inline void i915_sync_fini(struct drm_i915_private *dev_priv) { }
+#endif
+
 int __must_check
 i915_gem_object_pin(struct drm_i915_gem_object *obj,
 		    struct i915_address_space *vm,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5341591..6fe6a42 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4813,6 +4813,8 @@ int i915_gem_init(struct drm_device *dev)
 		ret = 0;
 	}
 
+	i915_sync_init(dev_priv);
+
 out_unlock:
 	intel_uncore_forcewake_put(dev_priv, FORCEWAKE_ALL);
 	mutex_unlock(&dev->struct_mutex);
@@ -4949,6 +4951,8 @@ void i915_gem_release(struct drm_device *dev, struct drm_file *file)
 		list_del(&file_priv->rps.link);
 		spin_unlock(&to_i915(dev)->rps.client_lock);
 	}
+
+	i915_sync_fini(dev->dev_private);
 }
 
 int i915_gem_open(struct drm_device *dev, struct drm_file *file)
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 192ad0c..2a28d43 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -33,6 +33,7 @@
 #include <linux/circ_buf.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
+#include "../../../staging/android/sync.h"
 #include "i915_drv.h"
 #include "i915_trace.h"
 #include "intel_drv.h"
@@ -2311,8 +2312,9 @@ static void i915_error_wake_up(struct drm_i915_private *dev_priv,
 	 */
 
 	/* Wake up __wait_seqno, potentially holding dev->struct_mutex. */
-	for_each_ring(ring, dev_priv, i)
+	for_each_ring(ring, dev_priv, i) {
 		wake_up_all(&ring->irq_queue);
+	}
 
 	/* Wake up intel_crtc_wait_for_pending_flips, holding crtc->mutex. */
 	wake_up_all(&dev_priv->pending_flip_queue);
diff --git a/drivers/gpu/drm/i915/i915_sync.c b/drivers/gpu/drm/i915/i915_sync.c
new file mode 100644
index 0000000..46cea44
--- /dev/null
+++ b/drivers/gpu/drm/i915/i915_sync.c
@@ -0,0 +1,334 @@
+/*
+ * Copyright © 2014 Intel Corporation
+ *
+ * Permission is hereby granted, free of charge, to any person obtaining a
+ * copy of this software and associated documentation files (the "Software"),
+ * to deal in the Software without restriction, including without limitation
+ * the rights to use, copy, modify, merge, publish, distribute, sublicense,
+ * and/or sell copies of the Software, and to permit persons to whom the
+ * Software is furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice (including the next
+ * paragraph) shall be included in all copies or substantial portions of the
+ * Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT.  IN NO EVENT SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING
+ * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS
+ * IN THE SOFTWARE.
+ *
+ * Authors:
+ *    Jesse Barnes <jbarnes@virtuousgeek.org>
+ *
+ */
+
+#include <drm/drmP.h>
+#include <drm/drm_vma_manager.h>
+#include <drm/i915_drm.h>
+#include "i915_drv.h"
+#include "i915_trace.h"
+#include "intel_drv.h"
+#include <linux/oom.h>
+#include <linux/shmem_fs.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/pci.h>
+#include <linux/dma-buf.h>
+#include "../../../staging/android/sync.h"
+
+/* Nothing really to protect here... */
+static spinlock_t fence_lock;
+
+struct i915_sync_timeline {
+	struct sync_timeline base;
+	struct intel_engine_cs *ring;
+};
+
+/*
+ * i915 fences on sync timelines
+ *
+ * We implement sync points in terms of i915 seqnos.  They're exposed
+ * through the new DRM_I915_GEM_FENCE ioctl, and can be mixed and matched
+ * with other Android timelines and aggregated into sync_fences, etc.
+ *
+ * TODO:
+ *   allow non-RCS fences (need ring/context association)
+ */
+
+#define to_i915_request(x) container_of(x, struct drm_i915_gem_request, fence)
+
+static const char *i915_fence_get_driver_name(struct fence *fence)
+{
+	return "i915";
+}
+
+static const char *i915_fence_ring_get_timeline_name(struct fence *fence)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+
+	return req->ring->name;
+}
+
+static int i915_fence_ring_check(wait_queue_t *wait, unsigned mode, int flags,
+			    void *key)
+{
+	struct drm_i915_gem_request *req = wait->private;
+	struct intel_engine_cs *ring = req->ring;
+
+	if (!i915_gem_request_completed(req, false))
+		return 0;
+
+	fence_signal_locked(&req->fence);
+
+	__remove_wait_queue(&ring->irq_queue, wait);
+	fence_put(&req->fence);
+	ring->irq_put(ring);
+
+	return 0;
+}
+
+static bool i915_fence_ring_enable_signaling(struct fence *fence)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	struct intel_engine_cs *ring = req->ring;
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	wait_queue_t *wait = &req->wait;
+
+	/* queue fence wait queue on irq queue and get fence */
+	if (i915_gem_request_completed(req, false) ||
+	    i915_terminally_wedged(&dev_priv->gpu_error))
+		return false;
+
+	if (!ring->irq_get(ring))
+		return false;
+
+	if (i915_gem_request_completed(req, false)) {
+		ring->irq_put(ring);
+		return true;
+	}
+
+	wait->flags = 0;
+	wait->private = req;
+	wait->func = i915_fence_ring_check;
+
+	__add_wait_queue(&ring->irq_queue, wait);
+	fence_get(fence);
+
+	return true;
+}
+
+static bool i915_fence_ring_signaled(struct fence *fence)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+
+	return i915_gem_request_completed(req, false);
+}
+
+static signed long i915_fence_ring_wait(struct fence *fence, bool intr,
+				   signed long timeout_js)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	struct drm_i915_private *dev_priv = req->ring->dev->dev_private;
+	int ret;
+	s64 timeout;
+
+	timeout = jiffies_to_nsecs(timeout_js);
+
+	ret = __i915_wait_request(req,
+				  atomic_read(&dev_priv->gpu_error.reset_counter),
+				  intr, &timeout, NULL);
+	if (ret == -ETIME)
+		return nsecs_to_jiffies(timeout);
+
+	return ret;
+}
+
+static int i915_fence_ring_fill_driver_data(struct fence *fence, void *data,
+				      int size)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+
+	if (size < sizeof(req->seqno))
+		return -ENOMEM;
+
+	memcpy(data, &req->seqno,
+	       sizeof(req->seqno));
+
+	return sizeof(req->seqno);
+}
+
+static void i915_fence_ring_value_str(struct fence *fence, char *str, int size)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+
+	snprintf(str, size, "%u", req->seqno);
+}
+
+static void i915_fence_ring_timeline_value_str(struct fence *fence, char *str,
+					  int size)
+{
+	struct drm_i915_gem_request *req = to_i915_request(fence);
+	struct intel_engine_cs *ring = req->ring;
+
+	snprintf(str, size, "%u", ring->get_seqno(ring, false));
+}
+
+static struct fence_ops i915_fence_ring_ops = {
+	.get_driver_name = 	i915_fence_get_driver_name,
+	.get_timeline_name =	i915_fence_ring_get_timeline_name,
+	.enable_signaling =	i915_fence_ring_enable_signaling,
+	.signaled =		i915_fence_ring_signaled,
+	.wait =			i915_fence_ring_wait,
+	.fill_driver_data =	i915_fence_ring_fill_driver_data,
+	.fence_value_str =	i915_fence_ring_value_str,
+	.timeline_value_str =	i915_fence_ring_timeline_value_str,
+};
+
+static struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
+					    struct intel_context *ctx)
+{
+	struct drm_i915_private *dev_priv = ring->dev->dev_private;
+	struct drm_i915_gem_request *request;
+	struct intel_ringbuffer *ringbuf;
+	u32 request_start;
+	int ret;
+
+	ret = i915_gem_request_alloc(ring, ctx, &request);
+	if (ret)
+		return ERR_PTR(ret);
+
+	if (i915.enable_execlists) {
+		ringbuf = ctx->engine[ring->id].ringbuf;
+	} else
+		ringbuf = ring->buffer;
+
+	request_start = intel_ring_get_tail(ringbuf);
+	/*
+	 * Emit any outstanding flushes - execbuf can fail to emit the flush
+	 * after having emitted the batchbuffer command. Hence we need to fix
+	 * things up similar to emitting the lazy request. The difference here
+	 * is that the flush _must_ happen before the next request, no matter
+	 * what.
+	 */
+	if (i915.enable_execlists)
+		ret = logical_ring_flush_all_caches(request);
+	else
+		ret = intel_ring_flush_all_caches(request);
+
+	if (ret) {
+		ret = -EIO;
+		goto err_cancel;
+	}
+
+	/* Record the position of the start of the request so that
+	 * should we detect the updated seqno part-way through the
+	 * GPU processing the request, we never over-estimate the
+	 * position of the head.
+	 */
+	request->postfix = intel_ring_get_tail(ringbuf);
+
+	if (i915.enable_execlists)
+		ret = ring->emit_request(request);
+	else
+		ret = ring->add_request(request);
+
+	if (ret) {
+		ret = -EIO;
+		goto err_cancel;
+	}
+
+	request->head = request_start;
+	request->tail = intel_ring_get_tail(ringbuf);
+	request->batch_obj = NULL;
+
+	request->emitted_jiffies = jiffies;
+	list_add_tail(&request->list, &ring->request_list);
+	request->file_priv = NULL;
+
+#if 0
+	if (file) {
+		struct drm_i915_file_private *file_priv = file->driver_priv;
+
+		spin_lock(&file_priv->mm.lock);
+		request->file_priv = file_priv;
+		list_add_tail(&request->client_list,
+			      &file_priv->mm.request_list);
+		spin_unlock(&file_priv->mm.lock);
+
+		request->pid = get_pid(task_pid(current));
+	}
+#endif
+
+	i915_queue_hangcheck(ring->dev);
+
+	cancel_delayed_work_sync(&dev_priv->mm.idle_work);
+	queue_delayed_work(dev_priv->wq,
+			   &dev_priv->mm.retire_work,
+			   round_jiffies_up_relative(HZ));
+	intel_mark_busy(dev_priv->dev);
+
+	fence_init(&request->fence, &i915_fence_ring_ops, &fence_lock,
+		   ctx->user_handle, request->seqno);
+
+	return &request->fence;
+
+err_cancel:
+	i915_gem_request_cancel(request);
+
+	return ERR_PTR(ret);
+}
+
+#if 0
+static struct fence_ops i915_fence_display_ops = {
+	.get_driver_name = 	i915_fence_get_driver_name,
+	.get_timeline_name =	i915_fence_display_get_timeline_name,
+	.enable_signaling =	i915_fence_display_enable_signaling,
+	.signaled =		i915_fence_display_signaled,
+	.wait =			i915_fence_display_wait,
+	.fill_driver_data =	i915_fence_display_fill_driver_data,
+	.fence_value_str =	i915_fence_display_value_str,
+	.timeline_value_str =	i915_fence_display_timeline_value_str,
+};
+
+static struct fence *i915_fence_create_display(struct intel_context *ctx)
+{
+	struct drm_i915_gem_request *req;
+	int ret;
+
+	ret = ring->add_request(ring);
+	if (ret) {
+		DRM_ERROR("add_request failed\n");
+		return NULL;
+	}
+
+	req = ring->outstanding_lazy_request;
+
+	fence_init(&req->fence, &i915_fence_ops, &fence_lock,
+		   ctx->user_handle, req->seqno);
+
+	return &req->fence;
+}
+#endif
+
+int i915_sync_init(struct drm_i915_private *dev_priv)
+{
+	struct intel_engine_cs *ring;
+	int i, ret = 0;
+
+	for_each_ring(ring, dev_priv, i) {
+		/* FIXME: non-RCS fences */
+	}
+
+	return ret;
+}
+
+void i915_sync_fini(struct drm_i915_private *dev_priv)
+{
+	int i;
+
+	for (i = 0; i < I915_NUM_RINGS; i++) {
+	}
+}
-- 
1.9.1


[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [PATCH 9/9] drm/i915: add bufferless execbuf ioctl
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (7 preceding siblings ...)
  2015-09-04 16:59 ` [PATCH 8/9] drm/i915: Android sync points for i915 v4 (obsolete) Jesse Barnes
@ 2015-09-04 16:59 ` Jesse Barnes
  2015-09-04 17:37   ` Chris Wilson
  2015-10-08 10:34   ` David Woodhouse
  2015-09-04 17:23 ` [RFC] Page table sharing and bufferless execbuf Chris Wilson
  2015-09-26 14:55 ` David Woodhouse
  10 siblings, 2 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 16:59 UTC (permalink / raw)
  To: intel-gfx; +Cc: dwmw2

We just need to pass in an address to execute and some flags, since we
don't have to worry about buffer relocation or any of the other usual
stuff.  Returns a fence to be used for synchronization.
---
 drivers/gpu/drm/i915/i915_dma.c            | 140 ++++++++++++++++++++++++++++-
 drivers/gpu/drm/i915/i915_drv.h            |   7 ++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
 drivers/gpu/drm/i915/i915_svm.c            |  10 ---
 drivers/gpu/drm/i915/i915_sync.c           |   4 +-
 include/uapi/drm/i915_drm.h                |  24 +++++
 6 files changed, 173 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index b868084..19b463a 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -50,7 +50,8 @@
 #include <linux/pm.h>
 #include <linux/pm_runtime.h>
 #include <linux/oom.h>
-
+#include <linux/dma-buf.h>
+#include "../../../staging/android/sync.h"
 
 static int i915_getparam(struct drm_device *dev, void *data,
 			 struct drm_file *file_priv)
@@ -1247,6 +1248,132 @@ i915_gem_reject_pin_ioctl(struct drm_device *dev, void *data,
 	return -ENODEV;
 }
 
+int intel_exec_mm_ioctl(struct drm_device *dev, void *data,
+			struct drm_file *file)
+{
+ 	struct drm_i915_private *dev_priv = dev->dev_private;
+	struct drm_i915_exec_mm *exec_mm = data;
+	struct intel_ringbuffer *ringbuf;
+	struct intel_engine_cs *ring;
+	struct intel_context *ctx;
+	struct drm_i915_gem_request *request;
+	struct fence *fence;
+	struct sync_fence *sfence;
+	u32 ctx_id = exec_mm->ctx_id;
+	int fd = get_unused_fd_flags(O_CLOEXEC);
+	int ret = 0;
+
+	if (exec_mm->batch_ptr & 3) {
+		DRM_ERROR("misaligned batch ptr\n");
+		ret = -ENOEXEC;
+		goto out;
+	}
+
+	if (!dev_priv->svm.svm_available) {
+		ret = -ENODEV;
+		goto out;
+	}
+
+	ret = i915_mutex_lock_interruptible(dev);
+	if (ret) {
+		DRM_ERROR("mutex interrupted\n");
+		goto out;
+	}
+
+	if (file == NULL) {
+		ret = -EINVAL;
+		goto out_unlock;
+	}
+
+	ctx = i915_gem_validate_context(dev, file, &dev_priv->ring[RCS],
+					ctx_id);
+	if (ctx == NULL) {
+		ret = -ENOENT;
+		DRM_ERROR("couldn't get context\n");
+		goto out_unlock;
+	}
+
+	if (!ctx->is_svm) {
+		ret = -EINVAL;
+		DRM_ERROR("context is not SVM enabled\n");
+		goto out_unlock;
+	}
+
+	i915_gem_context_reference(ctx);
+
+	ringbuf = ctx->engine[RCS].ringbuf;
+	ring = ringbuf->ring;
+	if (!ring) {
+		DRM_ERROR("context has no last ring\n");
+		ret = -EIO;
+		goto out_unref;
+	}
+
+	if (!ctx->rcs_initialized) {
+		DRM_DEBUG("ring not ready\n");
+		ret = -EIO;
+		goto out_unref;
+	}
+
+	ret = i915_gem_request_alloc(ring, ctx, &request);
+	if (ret) {
+		DRM_ERROR("request alloc failed\n");
+		goto out_unref;
+	}
+
+	ret = i915_gem_request_add_to_client(request, file);
+	if (ret) {
+		DRM_ERROR("failed to add request to client\n");
+		goto out_free_req;
+	}
+
+	fence = i915_fence_create_ring(ring, ctx);
+	if (!fence) {
+		ret = -ENOMEM;
+		DRM_ERROR("fence creation failed\n");
+		goto out_free_req;
+	}
+
+	sfence = sync_fence_create_dma("svm-execbuf", fence);
+	if (!sfence) {
+		ret = -ENOMEM;
+		DRM_ERROR("sfence creation failed\n");
+		goto out_free_req;
+	}
+
+	exec_mm->fence = fd;
+	sync_fence_install(sfence, fd);
+
+	ret = ring->emit_flush(request, 0, I915_GEM_GPU_DOMAINS);
+	if (ret) {
+		DRM_ERROR("ring flush failed: %d\n", ret);
+		goto out_free_req;
+	}
+
+	ret = ring->emit_bb_start(request, exec_mm->batch_ptr, 0);
+	if (ret) {
+		DRM_ERROR("ring dispatch execbuf failed: %d\n", ret);
+		goto out_free_req;
+	}
+
+	i915_gem_context_unreference(ctx);
+	mutex_unlock(&dev->struct_mutex);
+
+	return 0;
+
+out_free_req:
+	i915_gem_request_cancel(request);
+
+out_unref:
+	i915_gem_context_unreference(ctx);
+out_unlock:
+	mutex_unlock(&dev->struct_mutex);
+	put_unused_fd(fd);
+
+out:
+	return ret;
+}
+
 const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_INIT, drm_noop, DRM_AUTH|DRM_MASTER|DRM_ROOT_ONLY),
 	DRM_IOCTL_DEF_DRV(I915_FLUSH, drm_noop, DRM_AUTH),
@@ -1301,6 +1428,17 @@ const struct drm_ioctl_desc i915_ioctls[] = {
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_GETPARAM, i915_gem_context_getparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_SETPARAM, i915_gem_context_setparam_ioctl, DRM_UNLOCKED|DRM_RENDER_ALLOW),
 	DRM_IOCTL_DEF_DRV(I915_GEM_CONTEXT_CREATE2, i915_gem_context_create2_ioctl, DRM_UNLOCKED),
+	DRM_IOCTL_DEF_DRV(I915_EXEC_MM, intel_exec_mm_ioctl, DRM_UNLOCKED),
 };
 
 int i915_max_ioctl = ARRAY_SIZE(i915_ioctls);
+
+/*
+ * This is really ugly: Because old userspace abused the linux agp interface to
+ * manage the gtt, we need to claim that all intel devices are agp.  For
+ * otherwise the drm core refuses to initialize the agp support code.
+ */
+int i915_driver_device_is_agp(struct drm_device *dev)
+{
+	return 1;
+}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0c3047b..d89955c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2860,6 +2860,8 @@ void i915_gem_vma_destroy(struct i915_vma *vma);
 #ifdef CONFIG_DRM_I915_SYNC
 int i915_sync_init(struct drm_i915_private *dev_priv);
 void i915_sync_fini(struct drm_i915_private *dev_priv);
+struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
+				     struct intel_context *ctx);
 int i915_sync_create_fence_ioctl(struct drm_device *dev, void *data,
 				 struct drm_file *file);
 #else
@@ -3178,6 +3180,9 @@ i915_gem_context_get(struct drm_i915_file_private *file_priv, u32 id);
 void i915_gem_context_free(struct kref *ctx_ref);
 struct drm_i915_gem_object *
 i915_gem_alloc_context_obj(struct drm_device *dev, size_t size);
+struct intel_context *
+i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
+			  struct intel_engine_cs *ring, const u32 ctx_id);
 static inline void i915_gem_context_reference(struct intel_context *ctx)
 {
 	kref_get(&ctx->ref);
@@ -3402,6 +3407,8 @@ int i915_get_reset_stats_ioctl(struct drm_device *dev, void *data,
 extern struct intel_mm_struct *intel_bind_mm(struct drm_device *dev,
 					     struct intel_context *ctx);
 extern void intel_unbind_mm(struct intel_context *ctx);
+extern int intel_exec_mm_ioctl(struct drm_device *dev, void *data,
+			       struct drm_file *file);
 extern int intel_alloc_pasid(struct drm_device *dev,
 			     struct intel_context *ctx);
 extern void intel_free_pasid(struct drm_device *dev,
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index a953d49..b1f8819 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -988,7 +988,7 @@ validate_exec_list(struct drm_device *dev,
 	return 0;
 }
 
-static struct intel_context *
+struct intel_context *
 i915_gem_validate_context(struct drm_device *dev, struct drm_file *file,
 			  struct intel_engine_cs *ring, const u32 ctx_id)
 {
diff --git a/drivers/gpu/drm/i915/i915_svm.c b/drivers/gpu/drm/i915/i915_svm.c
index 1d05318..fdb45d6 100644
--- a/drivers/gpu/drm/i915/i915_svm.c
+++ b/drivers/gpu/drm/i915/i915_svm.c
@@ -757,16 +757,6 @@ void intel_unbind_mm(struct intel_context *ctx)
 	return;
 }
 
-int intel_exec_mm_ioctl(struct drm_device *dev, void *data,
-			struct drm_file *file)
-{
-//	struct drm_i915_exec_mm *exec_mm = data;
-//	struct drm_i915_private *dev_priv = dev->dev_private;
-
-	/* Load new context into context reg */
-	return 0;
-}
-
 /*
  * The PASID table has 32 entries in the current config, rotate through
  * them as needed.
diff --git a/drivers/gpu/drm/i915/i915_sync.c b/drivers/gpu/drm/i915/i915_sync.c
index 46cea44..085f1f9 100644
--- a/drivers/gpu/drm/i915/i915_sync.c
+++ b/drivers/gpu/drm/i915/i915_sync.c
@@ -187,8 +187,8 @@ static struct fence_ops i915_fence_ring_ops = {
 	.timeline_value_str =	i915_fence_ring_timeline_value_str,
 };
 
-static struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
-					    struct intel_context *ctx)
+struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
+				     struct intel_context *ctx)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	struct drm_i915_gem_request *request;
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index bac2fb2..9683e8c 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -231,6 +231,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_I915_GEM_CONTEXT_GETPARAM	0x34
 #define DRM_I915_GEM_CONTEXT_SETPARAM	0x35
 #define DRM_I915_GEM_CONTEXT_CREATE2	0x36
+#define DRM_I915_EXEC_MM		0x37
 
 #define DRM_IOCTL_I915_INIT		DRM_IOW( DRM_COMMAND_BASE + DRM_I915_INIT, drm_i915_init_t)
 #define DRM_IOCTL_I915_FLUSH		DRM_IO ( DRM_COMMAND_BASE + DRM_I915_FLUSH)
@@ -285,6 +286,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_GEM_CONTEXT_GETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_GETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_SETPARAM	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_SETPARAM, struct drm_i915_gem_context_param)
 #define DRM_IOCTL_I915_GEM_CONTEXT_CREATE2	DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_GEM_CONTEXT_CREATE2, struct drm_i915_gem_context_create2)
+#define DRM_IOCTL_I915_EXEC_MM			DRM_IOWR (DRM_COMMAND_BASE + DRM_I915_EXEC_MM, struct drm_i915_exec_mm)
 
 /* Allow drivers to submit batchbuffers directly to hardware, relying
  * on the security mechanisms provided by hardware.
@@ -1147,4 +1149,26 @@ struct drm_i915_gem_context_param {
 	__u64 value;
 };
 
+/**
+ * drm_i915_exec_mm - shared address space execbuf
+ * @batch_ptr: address of batch buffer (in context's CPU address space)
+ * @ctx_id: context to use for execution
+ * @flags: see flags
+ * @fence: returned fence handle
+ * @pad: unused
+ *
+ * This simlified execbuf just executes an MI_BATCH_BUFFER_START at
+ * @batch_ptr using @ctx_id as the context.  The context will indicate
+ * which address space the @batch_ptr will use.
+ *
+ * Note @batch_ptr must be dword aligned.
+ */
+struct drm_i915_exec_mm {
+	__u64 batch_ptr;
+	__u32 ctx_id;
+	__u32 flags;
+	__u32 fence;
+	__u32 pad;
+};
+
 #endif /* _UAPI_I915_DRM_H_ */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [RFC] Page table sharing and bufferless execbuf
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (8 preceding siblings ...)
  2015-09-04 16:59 ` [PATCH 9/9] drm/i915: add bufferless execbuf ioctl Jesse Barnes
@ 2015-09-04 17:23 ` Chris Wilson
  2015-09-04 19:08   ` Jesse Barnes
  2015-09-26 14:55 ` David Woodhouse
  10 siblings, 1 reply; 36+ messages in thread
From: Chris Wilson @ 2015-09-04 17:23 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx, dwmw2

On Fri, Sep 04, 2015 at 09:58:54AM -0700, Jesse Barnes wrote:
> A few things to think about:
>   - how to signal GPU hangs with the new execbuf (a signal might be more
>     natural as the execution appears more CPU-like?  what state do we
>     have to worry about restoring for bufferless contexts?

GPU hangs are not signalled via execbuf. The only time userspace is told
via execbuf is when the kernel rejects the cmdbuffer (EIO when it can't
program the hardware queues). EIO notification currently is at the wait
points, and with manually querying. Having a signal would fit in nicely
with the async nature of cmdbuffers and be useful today - though I don't
know how practical it is to extend the signal ABI.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 9/9] drm/i915: add bufferless execbuf ioctl
  2015-09-04 16:59 ` [PATCH 9/9] drm/i915: add bufferless execbuf ioctl Jesse Barnes
@ 2015-09-04 17:37   ` Chris Wilson
  2015-09-04 19:09     ` Jesse Barnes
  2015-10-08 10:34   ` David Woodhouse
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Wilson @ 2015-09-04 17:37 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx, dwmw2

On Fri, Sep 04, 2015 at 09:59:03AM -0700, Jesse Barnes wrote:
> We just need to pass in an address to execute and some flags, since we
> don't have to worry about buffer relocation or any of the other usual
> stuff.  Returns a fence to be used for synchronization.

There is no need for a flush+fence as part of the execbuf except for
when the client requests one (having it optional in the flags as a
syscall convenience). Just allow the client to inject fences
when it wants and even then it may just elect to do all fencing itself
from userspace. Kernel fencing is only interesting for interclient
operation.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC] Page table sharing and bufferless execbuf
  2015-09-04 17:23 ` [RFC] Page table sharing and bufferless execbuf Chris Wilson
@ 2015-09-04 19:08   ` Jesse Barnes
  0 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 19:08 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, dwmw2

On 09/04/2015 10:23 AM, Chris Wilson wrote:
> On Fri, Sep 04, 2015 at 09:58:54AM -0700, Jesse Barnes wrote:
>> A few things to think about:
>>   - how to signal GPU hangs with the new execbuf (a signal might be more
>>     natural as the execution appears more CPU-like?  what state do we
>>     have to worry about restoring for bufferless contexts?
> 
> GPU hangs are not signalled via execbuf. The only time userspace is told
> via execbuf is when the kernel rejects the cmdbuffer (EIO when it can't
> program the hardware queues). EIO notification currently is at the wait
> points, and with manually querying. Having a signal would fit in nicely
> with the async nature of cmdbuffers and be useful today - though I don't
> know how practical it is to extend the signal ABI.

Right; there's signalfd, but maybe a new type of event would be even
better.  I'm more worried about handling the internals of GPU hangs with
bufferless contexts though; I definitely haven't tested that much (only
accidentally :).

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 9/9] drm/i915: add bufferless execbuf ioctl
  2015-09-04 17:37   ` Chris Wilson
@ 2015-09-04 19:09     ` Jesse Barnes
  0 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-09-04 19:09 UTC (permalink / raw)
  To: Chris Wilson, intel-gfx, dwmw2

On 09/04/2015 10:37 AM, Chris Wilson wrote:
> On Fri, Sep 04, 2015 at 09:59:03AM -0700, Jesse Barnes wrote:
>> We just need to pass in an address to execute and some flags, since we
>> don't have to worry about buffer relocation or any of the other usual
>> stuff.  Returns a fence to be used for synchronization.
> 
> There is no need for a flush+fence as part of the execbuf except for
> when the client requests one (having it optional in the flags as a
> syscall convenience). Just allow the client to inject fences
> when it wants and even then it may just elect to do all fencing itself
> from userspace. Kernel fencing is only interesting for interclient
> operation.

Yeah that's easy enough to add; are you thinking userspace would just be
polling periodically on some ranges?  And only requesting fences for
specific batches for blocking purposes?

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [RFC] Page table sharing and bufferless execbuf
  2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
                   ` (9 preceding siblings ...)
  2015-09-04 17:23 ` [RFC] Page table sharing and bufferless execbuf Chris Wilson
@ 2015-09-26 14:55 ` David Woodhouse
  10 siblings, 0 replies; 36+ messages in thread
From: David Woodhouse @ 2015-09-26 14:55 UTC (permalink / raw)
  To: Jesse Barnes, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3810 bytes --]

On Fri, 2015-09-04 at 09:58 -0700, Jesse Barnes wrote:
> I've been carrying something looking rougly like this patchset around
> internally for a long time now, and with SKL out there now, I figured
> it's time to get it posted and start the process of integration.
> 
> David is working on pulling over most of the "driver based PASID
> handling" and other code into the IOMMU layer, but the integration with
> the rest of the driver needs some careful thought (handling task
> referencing along with context referencing, context creation, etc.)

I've pushed an early version of that to
http://git.infradead.org/users/dwmw2/linux-svm.git/shortlog/refs/heads/i915-svm

I don't have page fault handling yet, but I've at least got it to the
point where gem_svm_sanity will succeed. I'll work on page faults next.

However, I couldn't get your current tree to work (even with driver
-mode SVM); I did this based on an older internal (svm-on-iommu-fences)
branch and have blindly ported the patch across.

I get this:

 # ./gem_svm_sanity
using GPU to write 0xdead0000 to 0xca4860
value mismatch: read 0x00000000, expected 0xdead0000

And this:

[  105.490459] ------------[ cut here ]------------
[  105.495756] WARNING: CPU: 2 PID: 1254 at drivers/gpu/drm/i915/intel_ringbuffer.c:2214 intel_ring_reserved_space_reserve+0x47/0x80 [i915]()
[  105.509917] WARN_ON(ringbuf->reserved_size)
[  105.514497] Modules linked in: tun nf_conntrack_netbios_ns nf_conntrack_broadcast ip6t_rpfilter ip6t_REJECT nf_reject_ipv6 nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ebtable_nat ebtable_broi
[  105.614355] CPU: 2 PID: 1254 Comm: gem_svm_sanity Tainted: G        W I     4.2.0-rc8+ #2
[  105.623680] Hardware name: Intel Corporation Skylake Client platform/Skylake Y LPDDR3 RVP3, BIOS SKLSE2R1.R00.X090.B01.1506290657 06/29/2015
[  105.638023]  ffffffffa026ed38 ffff880080093b98 ffffffff81703e35 0000000000000000
[  105.646566]  ffff880080093be8 ffff880080093bd8 ffffffff810916a6 ffff880169e40000
[  105.655100]  ffff88003f96a400 00000000000000a0 ffff880080093ce0 ffff88003f45b200
[  105.663598] Call Trace:
[  105.666392]  [<ffffffff81703e35>] dump_stack+0x45/0x57
[  105.672252]  [<ffffffff810916a6>] warn_slowpath_common+0x86/0xc0
[  105.679119]  [<ffffffff81091726>] warn_slowpath_fmt+0x46/0x50
[  105.685705]  [<ffffffffa01e7c57>] intel_ring_reserved_space_reserve+0x47/0x80 [i915]
[  105.694567]  [<ffffffffa01e23bf>] intel_logical_ring_reserve_space+0x1f/0x30 [i915]
[  105.703303]  [<ffffffffa01d0ef7>] i915_gem_request_alloc+0x107/0x1b0 [i915]
[  105.711274]  [<ffffffffa0226c8d>] i915_fence_create_ring+0x2d/0x190 [i915]
[  105.719121]  [<ffffffffa024f334>] intel_exec_mm_ioctl+0x224/0x3a0 [i915]
[  105.726806]  [<ffffffffa00d7409>] drm_ioctl+0x129/0x4d0 [drm]
[  105.733372]  [<ffffffffa024f110>] ? i915_getparam+0x260/0x260 [i915]
[  105.740601]  [<ffffffff810bef2c>] ? __enqueue_entity+0x6c/0x70
[  105.747282]  [<ffffffff810c4afe>] ? set_next_entity+0x6e/0x400
[  105.753976]  [<ffffffff81209645>] do_vfs_ioctl+0x285/0x460
[  105.760233]  [<ffffffff812f711d>] ? selinux_file_ioctl+0x4d/0xc0
[  105.767079]  [<ffffffff812eecc3>] ? security_file_ioctl+0x43/0x60
[  105.774026]  [<ffffffff81209899>] SyS_ioctl+0x79/0x90
[  105.779812]  [<ffffffff81709c2e>] entry_SYSCALL_64_fastpath+0x12/0x71
[  105.787152] ---[ end trace 633908b4ba32bd71 ]---

And as we discussed a little while ago, I always see this:
     ioctl(5, SYNC_IOC_WAIT, 0xffffffff)     = -1 EFAULT (Bad address)
... so I've just stuck a sleep(5) in there to get the test to succeed.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-09-04 16:59 ` [PATCH 6/9] drm/i915: driver based PASID handling Jesse Barnes
@ 2015-10-07 13:00   ` David Woodhouse
  2015-10-07 15:16     ` Jesse Barnes
  2015-10-08 15:57   ` Chris Wilson
  1 sibling, 1 reply; 36+ messages in thread
From: David Woodhouse @ 2015-10-07 13:00 UTC (permalink / raw)
  To: Jesse Barnes, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 903 bytes --]

On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> +
> +       ret = handle_mm_fault(mm, vma, address,
> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
> +       if (ret & VM_FAULT_ERROR) {
> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
> +               goto out_unlock;
> +       }
> +

Hm, do you need to force the SEGV there, in what ought to be generic
IOMMU code?

Can you instead just let the fault handler return an appropriate
failure code to the IOMMU request queue and then deal with the
resulting error on the i915 device side?

That way, you should hopefully get to gracefully cope with reporting
errors for a specific *context*, rather than killing the whole process.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 13:00   ` David Woodhouse
@ 2015-10-07 15:16     ` Jesse Barnes
  2015-10-07 16:14       ` Daniel Vetter
  0 siblings, 1 reply; 36+ messages in thread
From: Jesse Barnes @ 2015-10-07 15:16 UTC (permalink / raw)
  To: David Woodhouse, intel-gfx

On 10/07/2015 06:00 AM, David Woodhouse wrote:
> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>> +
>> +       ret = handle_mm_fault(mm, vma, address,
>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
>> +       if (ret & VM_FAULT_ERROR) {
>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>> +               goto out_unlock;
>> +       }
>> +
> 
> Hm, do you need to force the SEGV there, in what ought to be generic
> IOMMU code?
> 
> Can you instead just let the fault handler return an appropriate
> failure code to the IOMMU request queue and then deal with the
> resulting error on the i915 device side?

I'm not sure if we get enough info on the i915 side to handle it
reasonably, we'll have to test that out.

> That way, you should hopefully get to gracefully cope with reporting
> errors for a specific *context*, rather than killing the whole process.

It would be best to get per-context error info, but killing the process
may be unavoidable (just as if a single thread clobbers memory in your
process).

Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 15:16     ` Jesse Barnes
@ 2015-10-07 16:14       ` Daniel Vetter
  2015-10-07 16:28         ` Jesse Barnes
                           ` (2 more replies)
  0 siblings, 3 replies; 36+ messages in thread
From: Daniel Vetter @ 2015-10-07 16:14 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx, David Woodhouse

On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
> On 10/07/2015 06:00 AM, David Woodhouse wrote:
> > On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> >> +
> >> +       ret = handle_mm_fault(mm, vma, address,
> >> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
> >> +       if (ret & VM_FAULT_ERROR) {
> >> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
> >> +               goto out_unlock;
> >> +       }
> >> +
> > 
> > Hm, do you need to force the SEGV there, in what ought to be generic
> > IOMMU code?
> > 
> > Can you instead just let the fault handler return an appropriate
> > failure code to the IOMMU request queue and then deal with the
> > resulting error on the i915 device side?
> 
> I'm not sure if we get enough info on the i915 side to handle it
> reasonably, we'll have to test that out.

We do know precisely which context blew up, but without the TDR work we
can't yet just kill the offender selective without affecting the other
active gpu contexts.

But besides that I really don't see a reason why we need to kill the
process if the gpu faults. After all if a thread sigfaults then signal
goes to that thread and not some random one (or the one thread that forked
the thread that blew up). And we do have interfaces to tell userspace that
something bad happened with the gpu work it submitted.

Chris made a similar patch for userptr and I didn't like that one either.
Worst case userspace has a special SEGV handler and then things really go
down badly when that handler gets triggered at an unexpected place.
-Daniel


> > That way, you should hopefully get to gracefully cope with reporting
> > errors for a specific *context*, rather than killing the whole process.
> 
> It would be best to get per-context error info, but killing the process
> may be unavoidable (just as if a single thread clobbers memory in your
> process).
> 
> Jesse
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 16:14       ` Daniel Vetter
@ 2015-10-07 16:28         ` Jesse Barnes
  2015-10-07 17:17           ` David Woodhouse
  2015-10-08 10:28         ` Tomas Elf
  2015-10-08 11:29         ` Tomas Elf
  2 siblings, 1 reply; 36+ messages in thread
From: Jesse Barnes @ 2015-10-07 16:28 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx, David Woodhouse

On 10/07/2015 09:14 AM, Daniel Vetter wrote:
> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>>> +
>>>> +       ret = handle_mm_fault(mm, vma, address,
>>>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
>>>> +       if (ret & VM_FAULT_ERROR) {
>>>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>>>> +               goto out_unlock;
>>>> +       }
>>>> +
>>>
>>> Hm, do you need to force the SEGV there, in what ought to be generic
>>> IOMMU code?
>>>
>>> Can you instead just let the fault handler return an appropriate
>>> failure code to the IOMMU request queue and then deal with the
>>> resulting error on the i915 device side?
>>
>> I'm not sure if we get enough info on the i915 side to handle it
>> reasonably, we'll have to test that out.
> 
> We do know precisely which context blew up, but without the TDR work we
> can't yet just kill the offender selective without affecting the other
> active gpu contexts.

How?  The notification from the IOMMU queue is asynchronous...

> But besides that I really don't see a reason why we need to kill the
> process if the gpu faults. After all if a thread sigfaults then signal
> goes to that thread and not some random one (or the one thread that forked
> the thread that blew up). And we do have interfaces to tell userspace that
> something bad happened with the gpu work it submitted.

We will send a signal, just as in the thread case.  That generally kills
the process, but the process is free to install a handler and try to do
something of course.  The trouble is that a fault like this indicates a
bug, just as it would in the multithreaded case (processors manipulating
the address space without locking for example, or a use after free, or a
simple bad pointer reference).

> Chris made a similar patch for userptr and I didn't like that one either.
> Worst case userspace has a special SEGV handler and then things really go
> down badly when that handler gets triggered at an unexpected place.

Not sure what you're suggesting as an alternative; just let things keep
running somehow?

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 16:28         ` Jesse Barnes
@ 2015-10-07 17:17           ` David Woodhouse
  2015-10-07 17:26             ` Jesse Barnes
  0 siblings, 1 reply; 36+ messages in thread
From: David Woodhouse @ 2015-10-07 17:17 UTC (permalink / raw)
  To: Jesse Barnes, Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 2619 bytes --]

On Wed, 2015-10-07 at 09:28 -0700, Jesse Barnes wrote:
> On 10/07/2015 09:14 AM, Daniel Vetter wrote:
> > On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
> > > On 10/07/2015 06:00 AM, David Woodhouse wrote:
> > > > On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> > > > > +
> > > > > +       ret = handle_mm_fault(mm, vma, address,
> > > > > +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
> > > > > +       if (ret & VM_FAULT_ERROR) {
> > > > > +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
> > > > > +               goto out_unlock;
> > > > > +       }
> > > > > +
> > > > 
> > > > Hm, do you need to force the SEGV there, in what ought to be generic
> > > > IOMMU code?
> > > > 
> > > > Can you instead just let the fault handler return an appropriate
> > > > failure code to the IOMMU request queue and then deal with the
> > > > resulting error on the i915 device side?
> > > 
> > > I'm not sure if we get enough info on the i915 side to handle it
> > > reasonably, we'll have to test that out.
> > 
> > We do know precisely which context blew up, but without the TDR work we
> > can't yet just kill the offender selective without affecting the other
> > active gpu contexts.
> 
> How?  The notification from the IOMMU queue is asynchronous...

The page request, and the response, include 'private data' which an
endpoint can use to carry that kind of information.

In $7.5.1.1 of the VT-d specification it tells us:

	"Private Data: The Private Data field can be used by 
	 Root-Complex integrated endpoints to uniquely identify
	 device-specific private information associated with an 
	 individual page request.

	"For Intel ® Processor Graphics device, the Private Data field 
	 specifies the identity of the GPU advanced-context (see 
	 Section 3.10) sending the page request."

> > But besides that I really don't see a reason why we need to kill the
> > process if the gpu faults. After all if a thread sigfaults then signal
> > goes to that thread and not some random one (or the one thread that forked
> > the thread that blew up). And we do have interfaces to tell userspace that
> > something bad happened with the gpu work it submitted.

I certainly don't want the core IOMMU code killing things. I really
want to just complete the page request with an appropriate failure
code, and let the endpoint device deal with it from there.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 17:17           ` David Woodhouse
@ 2015-10-07 17:26             ` Jesse Barnes
  2015-10-08  8:12               ` Daniel Vetter
  0 siblings, 1 reply; 36+ messages in thread
From: Jesse Barnes @ 2015-10-07 17:26 UTC (permalink / raw)
  To: David Woodhouse, Daniel Vetter; +Cc: intel-gfx

On 10/07/2015 10:17 AM, David Woodhouse wrote:
> On Wed, 2015-10-07 at 09:28 -0700, Jesse Barnes wrote:
>> On 10/07/2015 09:14 AM, Daniel Vetter wrote:
>>> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
>>>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
>>>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>>>>> +
>>>>>> +       ret = handle_mm_fault(mm, vma, address,
>>>>>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
>>>>>> +       if (ret & VM_FAULT_ERROR) {
>>>>>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>>>>>> +               goto out_unlock;
>>>>>> +       }
>>>>>> +
>>>>>
>>>>> Hm, do you need to force the SEGV there, in what ought to be generic
>>>>> IOMMU code?
>>>>>
>>>>> Can you instead just let the fault handler return an appropriate
>>>>> failure code to the IOMMU request queue and then deal with the
>>>>> resulting error on the i915 device side?
>>>>
>>>> I'm not sure if we get enough info on the i915 side to handle it
>>>> reasonably, we'll have to test that out.
>>>
>>> We do know precisely which context blew up, but without the TDR work we
>>> can't yet just kill the offender selective without affecting the other
>>> active gpu contexts.
>>
>> How?  The notification from the IOMMU queue is asynchronous...
> 
> The page request, and the response, include 'private data' which an
> endpoint can use to carry that kind of information.
> 
> In $7.5.1.1 of the VT-d specification it tells us:
> 
> 	"Private Data: The Private Data field can be used by 
> 	 Root-Complex integrated endpoints to uniquely identify
> 	 device-specific private information associated with an 
> 	 individual page request.
> 
> 	"For Intel ® Processor Graphics device, the Private Data field 
> 	 specifies the identity of the GPU advanced-context (see 
> 	 Section 3.10) sending the page request."

Ah right so we could put our private context ID in there if the PASID doesn't end up being per-context.  That would work fine (though as Daniel said we still need fancier reset to handle things more gracefully).

>>> But besides that I really don't see a reason why we need to kill the
>>> process if the gpu faults. After all if a thread sigfaults then signal
>>> goes to that thread and not some random one (or the one thread that forked
>>> the thread that blew up). And we do have interfaces to tell userspace that
>>> something bad happened with the gpu work it submitted.
> 
> I certainly don't want the core IOMMU code killing things. I really
> want to just complete the page request with an appropriate failure
> code, and let the endpoint device deal with it from there.

I think that will work, but I want to test and make sure.  In the driver mode version I took advantage of the fact that I got an unambiguous page request failure from the IOMMU along with a unique PASID to send the signal.  Getting it on the GPU side means looking at some of our existing error state bits, which is something I've been avoiding...

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 17:26             ` Jesse Barnes
@ 2015-10-08  8:12               ` Daniel Vetter
  0 siblings, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2015-10-08  8:12 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx, David Woodhouse

On Wed, Oct 07, 2015 at 10:26:20AM -0700, Jesse Barnes wrote:
> On 10/07/2015 10:17 AM, David Woodhouse wrote:
> > On Wed, 2015-10-07 at 09:28 -0700, Jesse Barnes wrote:
> >> On 10/07/2015 09:14 AM, Daniel Vetter wrote:
> >>> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
> >>>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
> >>>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> >>>>>> +
> >>>>>> +       ret = handle_mm_fault(mm, vma, address,
> >>>>>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
> >>>>>> +       if (ret & VM_FAULT_ERROR) {
> >>>>>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
> >>>>>> +               goto out_unlock;
> >>>>>> +       }
> >>>>>> +
> >>>>>
> >>>>> Hm, do you need to force the SEGV there, in what ought to be generic
> >>>>> IOMMU code?
> >>>>>
> >>>>> Can you instead just let the fault handler return an appropriate
> >>>>> failure code to the IOMMU request queue and then deal with the
> >>>>> resulting error on the i915 device side?
> >>>>
> >>>> I'm not sure if we get enough info on the i915 side to handle it
> >>>> reasonably, we'll have to test that out.
> >>>
> >>> We do know precisely which context blew up, but without the TDR work we
> >>> can't yet just kill the offender selective without affecting the other
> >>> active gpu contexts.
> >>
> >> How?  The notification from the IOMMU queue is asynchronous...
> > 
> > The page request, and the response, include 'private data' which an
> > endpoint can use to carry that kind of information.
> > 
> > In $7.5.1.1 of the VT-d specification it tells us:
> > 
> > 	"Private Data: The Private Data field can be used by 
> > 	 Root-Complex integrated endpoints to uniquely identify
> > 	 device-specific private information associated with an 
> > 	 individual page request.
> > 
> > 	"For Intel ® Processor Graphics device, the Private Data field 
> > 	 specifies the identity of the GPU advanced-context (see 
> > 	 Section 3.10) sending the page request."
> 
> Ah right so we could put our private context ID in there if the PASID
> doesn't end up being per-context.  That would work fine (though as
> Daniel said we still need fancier reset to handle things more
> gracefully).

I'd hope we can be even more lazy: If we complete the page request with a
failure then hopefully the gpu read/write transaction never completes in
the EU/ff-unit which means it'll be stuck forever and our hangcheck will
get around to clean up the mess. That should work with 0 code changes (but
needs a testcase ofc).

Later on we can get fancy and try to immediate preempt the ctx and kill it
if it faults. But there's a bit of infrastructure missing for that, and I
think it'd be better to not stall svm on that.

> >>> But besides that I really don't see a reason why we need to kill the
> >>> process if the gpu faults. After all if a thread sigfaults then signal
> >>> goes to that thread and not some random one (or the one thread that forked
> >>> the thread that blew up). And we do have interfaces to tell userspace that
> >>> something bad happened with the gpu work it submitted.
> > 
> > I certainly don't want the core IOMMU code killing things. I really
> > want to just complete the page request with an appropriate failure
> > code, and let the endpoint device deal with it from there.
> 
> I think that will work, but I want to test and make sure.  In the driver
> mode version I took advantage of the fact that I got an unambiguous page
> request failure from the IOMMU along with a unique PASID to send the
> signal.  Getting it on the GPU side means looking at some of our
> existing error state bits, which is something I've been avoiding...

gem_reset_stats does this for your already, including test coverage and
all. What might be missing is getting reset events from a pollable fd,
since the current needs explicit polling. It works that way since that's
all arb_robustness wants.

And with an fd we can always use the generic SIG_IO stuff for userspace
that wants a signal, but by default I don't think we should use signals at
all for gpu page faults: The current SIG_SEGV can't be used (since it's
clearly for thread-local faults only), same for SIG_BUS, SIG_IO is for fd
polling and there's nothing else available.

But even the pollable fd is probably best done in a follow-up series, if
we even have a need for it.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 16:14       ` Daniel Vetter
  2015-10-07 16:28         ` Jesse Barnes
@ 2015-10-08 10:28         ` Tomas Elf
  2015-10-08 11:29         ` Tomas Elf
  2 siblings, 0 replies; 36+ messages in thread
From: Tomas Elf @ 2015-10-08 10:28 UTC (permalink / raw)
  To: Daniel Vetter, Jesse Barnes; +Cc: intel-gfx, David Woodhouse

On 07/10/2015 17:14, Daniel Vetter wrote:
> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>>> +
>>>> +       ret = handle_mm_fault(mm, vma, address,
>>>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
>>>> +       if (ret & VM_FAULT_ERROR) {
>>>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>>>> +               goto out_unlock;
>>>> +       }
>>>> +
>>>
>>> Hm, do you need to force the SEGV there, in what ought to be generic
>>> IOMMU code?
>>>
>>> Can you instead just let the fault handler return an appropriate
>>> failure code to the IOMMU request queue and then deal with the
>>> resulting error on the i915 device side?
>>
>> I'm not sure if we get enough info on the i915 side to handle it
>> reasonably, we'll have to test that out.
>
> We do know precisely which context blew up, but without the TDR work we
> can't yet just kill the offender selective without affecting the other
> active gpu contexts.

Could someone clarify what this means from the TDR point of view, 
please? When you say "context blew up" I'm guessing that you mean that 
come context caused the fault handler to get involved somehow?

Does this imply that the offending context will hang and the driver will 
have to detect this hang? If so, then yes - if we have the per-engine 
hang recovery mode as part of the upcoming TDR work in place then we 
could handle it by stepping over the offending batch buffer and moving 
on with a minimum of side-effects on the rest of the driver/GPU.

Or does this imply that we have some new integration path to deal with? 
(something that I should be aware of when upstreaming TDR?)

Sorry if I missed something blatantly obvious in the patch ;).

Thanks,
Tomas

>
> But besides that I really don't see a reason why we need to kill the
> process if the gpu faults. After all if a thread sigfaults then signal
> goes to that thread and not some random one (or the one thread that forked
> the thread that blew up). And we do have interfaces to tell userspace that
> something bad happened with the gpu work it submitted.
>
> Chris made a similar patch for userptr and I didn't like that one either.
> Worst case userspace has a special SEGV handler and then things really go
> down badly when that handler gets triggered at an unexpected place.
> -Daniel
>
>
>>> That way, you should hopefully get to gracefully cope with reporting
>>> errors for a specific *context*, rather than killing the whole process.
>>
>> It would be best to get per-context error info, but killing the process
>> may be unavoidable (just as if a single thread clobbers memory in your
>> process).
>>
>> Jesse
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 9/9] drm/i915: add bufferless execbuf ioctl
  2015-09-04 16:59 ` [PATCH 9/9] drm/i915: add bufferless execbuf ioctl Jesse Barnes
  2015-09-04 17:37   ` Chris Wilson
@ 2015-10-08 10:34   ` David Woodhouse
  1 sibling, 0 replies; 36+ messages in thread
From: David Woodhouse @ 2015-10-08 10:34 UTC (permalink / raw)
  To: Jesse Barnes, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 4995 bytes --]

On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> We just need to pass in an address to execute and some flags, since we
> don't have to worry about buffer relocation or any of the other usual
> stuff.  Returns a fence to be used for synchronization.
> ---
>  drivers/gpu/drm/i915/i915_dma.c            | 140 ++++++++++++++++++++++++++++-
>  drivers/gpu/drm/i915/i915_drv.h            |   7 ++
>  drivers/gpu/drm/i915/i915_gem_execbuffer.c |   2 +-
>  drivers/gpu/drm/i915/i915_svm.c            |  10 ---
>  drivers/gpu/drm/i915/i915_sync.c           |   4 +-
>  include/uapi/drm/i915_drm.h                |  24 +++++
>  6 files changed, 173 insertions(+), 14 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index b868084..19b463a 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -50,7 +50,8 @@
>  #include 
>  #include 
>  #include 
> -
> +#include 
> +#include "../../../staging/android/sync.h"
>  
>  static int i915_getparam(struct drm_device *dev, void *data,
>  > 	> 	> 	>  struct drm_file *file_priv)
> @@ -1247,6 +1248,132 @@ i915_gem_reject_pin_ioctl(struct drm_device *dev, void *data,
>  > 	> return -ENODEV;
>  }
>  
> +int intel_exec_mm_ioctl(struct drm_device *dev, void *data,
> +> 	> 	> 	> struct drm_file *file)
> +{
> + > 	> struct drm_i915_private *dev_priv = dev->dev_private;
> +> 	> struct drm_i915_exec_mm *exec_mm = data;
> +> 	> struct intel_ringbuffer *ringbuf;
> +> 	> struct intel_engine_cs *ring;
> +> 	> struct intel_context *ctx;
> +> 	> struct drm_i915_gem_request *request;
> +> 	> struct fence *fence;
> +> 	> struct sync_fence *sfence;
> +> 	> u32 ctx_id = exec_mm->ctx_id;
> +> 	> int fd = get_unused_fd_flags(O_CLOEXEC);
> +> 	> int ret = 0;
> +
> +> 	> if (exec_mm->batch_ptr & 3) {
> +> 	> 	> DRM_ERROR("misaligned batch ptr\n");
> +> 	> 	> ret = -ENOEXEC;
> +> 	> 	> goto out;
> +> 	> }
> +
> +> 	> if (!dev_priv->svm.svm_available) {
> +> 	> 	> ret = -ENODEV;
> +> 	> 	> goto out;
> +> 	> }
> +
> +> 	> ret = i915_mutex_lock_interruptible(dev);
> +> 	> if (ret) {
> +> 	> 	> DRM_ERROR("mutex interrupted\n");
> +> 	> 	> goto out;
> +> 	> }
> +
> +> 	> if (file == NULL) {
> +> 	> 	> ret = -EINVAL;
> +> 	> 	> goto out_unlock;
> +> 	> }
> +
> +> 	> ctx = i915_gem_validate_context(dev, file, &dev_priv->ring[RCS],
> +> 	> 	> 	> 	> 	> ctx_id);
> +> 	> if (ctx == NULL) {
> +> 	> 	> ret = -ENOENT;
> +> 	> 	> DRM_ERROR("couldn't get context\n");
> +> 	> 	> goto out_unlock;
> +> 	> }
> +
> +> 	> if (!ctx->is_svm) {
> +> 	> 	> ret = -EINVAL;
> +> 	> 	> DRM_ERROR("context is not SVM enabled\n");
> +> 	> 	> goto out_unlock;
> +> 	> }
> +
> +> 	> i915_gem_context_reference(ctx);
> +
> +> 	> ringbuf = ctx->engine[RCS].ringbuf;
> +> 	> ring = ringbuf->ring;
> +> 	> if (!ring) {
> +> 	> 	> DRM_ERROR("context has no last ring\n");
> +> 	> 	> ret = -EIO;
> +> 	> 	> goto out_unref;
> +> 	> }
> +
> +> 	> if (!ctx->rcs_initialized) {
> +> 	> 	> DRM_DEBUG("ring not ready\n");
> +> 	> 	> ret = -EIO;
> +> 	> 	> goto out_unref;
> +> 	> }
> +
> +> 	> ret = i915_gem_request_alloc(ring, ctx, &request);
> +> 	> if (ret) {
> +> 	> 	> DRM_ERROR("request alloc failed\n");
> +> 	> 	> goto out_unref;
> +> 	> }
> +
> +> 	> ret = i915_gem_request_add_to_client(request, file);
> +> 	> if (ret) {
> +> 	> 	> DRM_ERROR("failed to add request to client\n");
> +> 	> 	> goto out_free_req;
> +> 	> }
> +
> +> 	> fence = i915_fence_create_ring(ring, ctx);
> +> 	> if (!fence) {
> +> 	> 	> ret = -ENOMEM;
> +> 	> 	> DRM_ERROR("fence creation failed\n");
> +> 	> 	> goto out_free_req;
> +> 	> }
> +
> +> 	> sfence = sync_fence_create_dma("svm-execbuf", fence);
> +> 	> if (!sfence) {
> +> 	> 	> ret = -ENOMEM;
> +> 	> 	> DRM_ERROR("sfence creation failed\n");
> +> 	> 	> goto out_free_req;
> +> 	> }
> +
> +> 	> exec_mm->fence = fd;
> +> 	> sync_fence_install(sfence, fd);
> +
> +> 	> ret = ring->emit_flush(request, 0, I915_GEM_GPU_DOMAINS);
> +> 	> if (ret) {
> +> 	> 	> DRM_ERROR("ring flush failed: %d\n", ret);
> +> 	> 	> goto out_free_req;
> +> 	> }
> +
> +> 	> ret = ring->emit_bb_start(request, exec_mm->batch_ptr, 0);
> +> 	> if (ret) {
> +> 	> 	> DRM_ERROR("ring dispatch execbuf failed: %d\n", ret);
> +> 	> 	> goto out_free_req;
> +> 	> }

FWIW naïvely adding a call to 

 __i915_add_request(request, NULL, true);

here in the hope that it would actually do something *useful* rather
than just reserving space in the ring and never using it, causing the
warning in intel_ring_reserved_space_reserve() that I observed next
time we do anything... didn't work. With various and confusing failure
modes, about which I won't go into detail.

I'm going to leave it to the big boys and keep hacking on my backported
version of the IOMMU code to 4.0, where the SVM bits in the i915 driver
*did* work :)

-- 
dwmw2


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-07 16:14       ` Daniel Vetter
  2015-10-07 16:28         ` Jesse Barnes
  2015-10-08 10:28         ` Tomas Elf
@ 2015-10-08 11:29         ` Tomas Elf
  2015-10-08 22:46           ` David Woodhouse
  2 siblings, 1 reply; 36+ messages in thread
From: Tomas Elf @ 2015-10-08 11:29 UTC (permalink / raw)
  To: Daniel Vetter, Jesse Barnes; +Cc: intel-gfx, David Woodhouse

On 07/10/2015 17:14, Daniel Vetter wrote:
> On Wed, Oct 07, 2015 at 08:16:42AM -0700, Jesse Barnes wrote:
>> On 10/07/2015 06:00 AM, David Woodhouse wrote:
>>> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>>> +
>>>> +       ret = handle_mm_fault(mm, vma, address,
>>>> +                             desc.wr_req ? FAULT_FLAG_WRITE : 0);
>>>> +       if (ret & VM_FAULT_ERROR) {
>>>> +               gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
>>>> +               goto out_unlock;
>>>> +       }
>>>> +
>>>
>>> Hm, do you need to force the SEGV there, in what ought to be generic
>>> IOMMU code?
>>>
>>> Can you instead just let the fault handler return an appropriate
>>> failure code to the IOMMU request queue and then deal with the
>>> resulting error on the i915 device side?
>>
>> I'm not sure if we get enough info on the i915 side to handle it
>> reasonably, we'll have to test that out.
>
> We do know precisely which context blew up, but without the TDR work we
> can't yet just kill the offender selective without affecting the other
> active gpu contexts.

Could someone clarify what this means from the TDR point of view, 
please? When you say "context blew up" I'm guessing that you mean that 
come context caused the fault handler to get involved somehow?

Does this imply that the offending context will hang and the driver will 
have to detect this hang? If so, then yes - if we have the per-engine 
hang recovery mode as part of the upcoming TDR work in place then we 
could handle it by stepping over the offending batch buffer and moving 
on with a minimum of side-effects on the rest of the driver/GPU.

Or does this imply that we have some new integration path to deal with? 
(something that I should be aware of when upstreaming TDR?)

Sorry if I missed something blatantly obvious in the patch ;).

Thanks,
Tomas

>
> But besides that I really don't see a reason why we need to kill the
> process if the gpu faults. After all if a thread sigfaults then signal
> goes to that thread and not some random one (or the one thread that forked
> the thread that blew up). And we do have interfaces to tell userspace that
> something bad happened with the gpu work it submitted.
>
> Chris made a similar patch for userptr and I didn't like that one either.
> Worst case userspace has a special SEGV handler and then things really go
> down badly when that handler gets triggered at an unexpected place.
> -Daniel
>
>
>>> That way, you should hopefully get to gracefully cope with reporting
>>> errors for a specific *context*, rather than killing the whole process.
>>
>> It would be best to get per-context error info, but killing the process
>> may be unavoidable (just as if a single thread clobbers memory in your
>> process).
>>
>> Jesse
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-09-04 16:59 ` [PATCH 6/9] drm/i915: driver based PASID handling Jesse Barnes
  2015-10-07 13:00   ` David Woodhouse
@ 2015-10-08 15:57   ` Chris Wilson
  2015-10-09  7:24     ` David Woodhouse
  1 sibling, 1 reply; 36+ messages in thread
From: Chris Wilson @ 2015-10-08 15:57 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: intel-gfx, dwmw2

On Fri, Sep 04, 2015 at 09:59:00AM -0700, Jesse Barnes wrote:
> New file with VT-d SVM and PASID handling functions and page table
> management.  This belongs in the IOMMU code (along with some extra bits
> for waiting for invalidations and page faults to complete, flushing the
> device IOTLB, etc.)
> 
> FIXME:
>   need work queue for re-submitting contexts
>   TE bit handling on SKL
> ---
>  drivers/gpu/drm/i915/Makefile           |    5 +-
>  drivers/gpu/drm/i915/i915_drv.h         |   43 ++
>  drivers/gpu/drm/i915/i915_gem.c         |    3 +
>  drivers/gpu/drm/i915/i915_gem_context.c |    3 +
>  drivers/gpu/drm/i915/i915_irq.c         |    7 +
>  drivers/gpu/drm/i915/i915_reg.h         |   47 ++
>  drivers/gpu/drm/i915/i915_svm.c         | 1102 +++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        |  120 +++-
>  drivers/gpu/drm/i915/intel_lrc.h        |    1 +
>  9 files changed, 1299 insertions(+), 32 deletions(-)
>  create mode 100644 drivers/gpu/drm/i915/i915_svm.c
> 
> diff --git a/drivers/gpu/drm/i915/Makefile b/drivers/gpu/drm/i915/Makefile
> index 44d290a..e4883a7 100644
> --- a/drivers/gpu/drm/i915/Makefile
> +++ b/drivers/gpu/drm/i915/Makefile
> @@ -38,7 +38,8 @@ i915-y += i915_cmd_parser.o \
>  	  intel_lrc.o \
>  	  intel_mocs.o \
>  	  intel_ringbuffer.o \
> -	  intel_uncore.o
> +	  intel_uncore.o \
> +	  i915_svm.o

Correct me if I am wrong, but it looks like i915_svm implements the
lowlevel interface with the hardware, so by convention is intel_svm.c
  
>  # general-purpose microcontroller (GuC) support
>  i915-y += intel_guc_loader.o \
> @@ -93,6 +94,8 @@ i915-y += dvo_ch7017.o \
>  # virtual gpu code
>  i915-y += i915_vgpu.o
>  
> +i915-$(CONFIG_MMU_NOTIFIER) += i915_svm.o

Added twice?

> +
>  # legacy horrors
>  i915-y += i915_dma.o
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 20beb51..ca38a7a 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -47,6 +47,7 @@
>  #include <drm/drm_gem.h>
>  #include <linux/backlight.h>
>  #include <linux/hashtable.h>
> +#include <linux/mmu_notifier.h>
>  #include <linux/intel-iommu.h>
>  #include <linux/kref.h>
>  #include <linux/pm_qos.h>
> @@ -848,6 +849,13 @@ struct i915_ctx_hang_stats {
>  	bool banned;
>  };
>  
> +struct intel_mm_struct {
> +	struct kref kref;
> +	struct mmu_notifier notifier;
> +	struct drm_i915_private *dev_priv;
> +	struct list_head context_list;
> +};

Doesn't this look kind of familiar? struct i915_mm_struct perhaps?

> +
>  /* This must match up with the value previously used for execbuf2.rsvd1. */
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
> @@ -874,6 +882,9 @@ struct i915_ctx_hang_stats {
>  struct intel_context {
>  	struct kref ref;
>  	int user_handle;
> +	bool is_svm; /* shares x86 page tables */
> +	u32 pasid; /* 20 bits */
> +	struct intel_mm_struct *ims;
>  	uint8_t remap_slice;
>  	struct drm_i915_private *i915;
>  	int flags;
> @@ -895,6 +906,9 @@ struct intel_context {
>  		int pin_count;
>  	} engine[I915_NUM_RINGS];
>  
> +	struct list_head mm_list;

This is a link, name it so.

> +	struct task_struct *tsk;

One task? A context can be passed by the device fd to another process.
Do we inherit the VM along with the context? I don't anything to prevent
such.

> +static void gpu_mm_segv(struct task_struct *tsk, unsigned long address,
> +			int si_code)
> +{
> +	siginfo_t info;
> +
> +	/* Need specific signal info here */
> +	info.si_signo	= SIGSEGV;
> +	info.si_errno	= EIO;
> +	info.si_code	= si_code;
> +	info.si_addr	= (void __user *)address;
> +
> +	force_sig_info(SIGSEGV, &info, tsk);

force_sig_info() is not exported, ah you builtin i915-svm.c

> +}
> +
> +/*
> + * Read the fault descriptor and handle the fault:
> + *   get PML4 from PASID
> + *   get mm struct
> + *   get the vma
> + *   verify the address is valid
> + *   call handle_mm_fault after taking the mm->mmap_sem
> + */
> +void intel_gpu_fault_work(struct work_struct *work)
> +{
> +	struct i915_svm_state *svm = container_of(work, struct i915_svm_state,
> +						  work);
> +	struct drm_i915_private *dev_priv =
> +		container_of(svm, struct drm_i915_private, svm);
> +	struct drm_device *dev = dev_priv->dev;
> +	struct intel_ringbuffer *ringbuf;
> +	struct page_request_dsc desc;
> +	struct page_group_response_dsc resp;
> +	struct intel_context *ctx;
> +	struct task_struct *tsk;
> +	struct mm_struct *mm;
> +	struct vm_area_struct *vma;
> +	u64 address;
> +	int ret;
> +
> +	DRM_ERROR("PRQ updated, head 0x%08x, tail 0x%08x\n",
> +		  I915_READ(SVM_PRQ_HEAD), I915_READ(SVM_PRQ_TAIL));
> +	prq_read_descriptor(dev, &desc);
> +	DRM_ERROR("page fault on addr 0x%016llx, PASID %d, srr %d\n",
> +		  (u64)(desc.addr << PAGE_SHIFT), desc.pasid, desc.srr);
> +
> +	spin_lock(&dev_priv->svm.lock);
> +	ctx = dev_priv->svm.pasid_ctx[desc.pasid];
> +	tsk = ctx->tsk;
> +	mm = tsk->mm;
> +	address = desc.addr << PAGE_SHIFT;
> +	ringbuf = ctx->engine[RCS].ringbuf;
> +	spin_unlock(&dev_priv->svm.lock);

All of the above can disappear at anytime after the unlock?

> +
> +	down_read_trylock(&mm->mmap_sem);
> +	vma = find_extend_vma(mm, address);
> +	if (!vma || address < vma->vm_start) {
> +		DRM_ERROR("bad VMA or address out of range\n");
> +		gpu_mm_segv(tsk, address, SEGV_MAPERR);
> +		goto out_unlock; /* need to kill process */
> +	}
> +
> +	ret = handle_mm_fault(mm, vma, address,
> +			      desc.wr_req ? FAULT_FLAG_WRITE : 0);
> +	if (ret & VM_FAULT_ERROR) {
> +		gpu_mm_segv(tsk, address, SEGV_ACCERR); /* ? */
> +		goto out_unlock;
> +	}
> +
> +	if (ret & VM_FAULT_MAJOR)
> +		tsk->maj_flt++;
> +	else
> +		tsk->min_flt++;
> +
> +	if (desc.srr)
> +		resp.dsc_type = PAGE_STREAM_RESP_DSC;
> +	else
> +		resp.dsc_type = PAGE_GRP_RESP_DSC;
> +	resp.pasid = desc.pasid;
> +	resp.pasid_present = 1;
> +	resp.requestor_id = PCI_DEVID(0, PCI_DEVFN(2,0));
> +	resp.resp_code = RESP_CODE_SUCCESS;
> +	resp.prg_index = desc.prg_index;
> +	resp.private = desc.private;
> +	ivq_write_resp_descriptor(dev, &resp);
> +out_unlock:
> +	up_read(&mm->mmap_sem);
> +
> +	/* FIXME: wait for page response to be serviced */
> +
> +	/* FIXME: queue context for re-submit */
> +	/* execlists_context_queue(req); */
> +}

> +/* Make sure GPU writes can't hit the mm that's about to go away */
> +static void intel_mm_release(struct mmu_notifier *mn, struct mm_struct *mm)
> +{
> +	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> +						   notifier);
> +	struct drm_i915_private *dev_priv = ims->dev_priv;
> +	struct drm_device *dev = dev_priv->dev;
> +	struct intel_context *ctx;
> +
> +	/*
> +	 * Wait for any outstanding activity and unbind the mm.  Since
> +	 * each context has its own ring, we can simply wait for the ring
> +	 * to idle before invalidating the PASID and flushing the TLB.
> +	 */
> +	mutex_lock(&dev->struct_mutex);
> +	list_for_each_entry(ctx, &ims->context_list, mm_list) {
> +		intel_ring_idle(ctx->engine[RCS].ringbuf->ring);
> +	}
> +
> +	intel_iommu_tlb_flush(dev_priv->dev);
> +	mutex_unlock(&dev->struct_mutex);

Erm, what! So you halt the GPU everytime? But you've already invalidated
the shadow PTE -- ah, invalidate-range looks to be a wip.

> +static void intel_flush_page_locked(struct drm_device *dev, int pasid,
> +				    unsigned long address)
> +{
> +	struct ext_iotlb_inv_dsc dsc = { 0 };
> +
> +	dsc.dsc_type = EXT_IOTLB_INV_DSC;
> +	dsc.g = EXT_IOTLB_INV_G_PASID_PAGE_SELECT;
> +	dsc.pasid = pasid;
> +	dsc.ih = 0;
> +	dsc.addr = address;
> +	dsc.am = 1;
> +	ivq_write_ext_iotlb_inv_descriptor(dev, &dsc);
> +}
> +
> +static void intel_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
> +			     unsigned long address, pte_t pte)
> +{
> +	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> +						   notifier);
> +	struct drm_i915_private *dev_priv = ims->dev_priv;
> +	struct drm_device *dev = dev_priv->dev;
> +
> +	struct intel_context *ctx;
> +
> +	mutex_lock(&dev->struct_mutex);
> +	list_for_each_entry(ctx, &ims->context_list, mm_list)
> +		intel_flush_page_locked(dev, ctx->pasid, address);
> +	mutex_unlock(&dev->struct_mutex);

Suggests you really want a ims->spinlock for context_list instead.

> +}
> +
> +static void intel_invalidate_page(struct mmu_notifier *mn,
> +				  struct mm_struct *mm,
> +				  unsigned long address)
> +{
> +	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> +						   notifier);
> +	struct drm_i915_private *dev_priv = ims->dev_priv;
> +	struct drm_device *dev = dev_priv->dev;
> +	struct intel_context *ctx;
> +
> +	mutex_lock(&dev->struct_mutex);
> +	list_for_each_entry(ctx, &ims->context_list, mm_list)
> +		intel_flush_page_locked(dev, ctx->pasid, address);
> +	mutex_unlock(&dev->struct_mutex);
> +}
> +
> +/* Need to unmap this range and make sure it doesn't get re-faulted */
> +static void intel_invalidate_range_start(struct mmu_notifier *mn,
> +					 struct mm_struct *mm,
> +					 unsigned long start, unsigned long end)
> +{
> +	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> +						   notifier);
> +	struct drm_i915_private *dev_priv = ims->dev_priv;
> +	struct drm_device *dev = dev_priv->dev;
> +
> +	/* FIXME: invalidate page only */
> +	intel_iommu_tlb_flush(dev);
> +}
> +
> +/* Pages have been freed at this point */
> +static void intel_invalidate_range_end(struct mmu_notifier *mn,
> +				       struct mm_struct *mm,
> +				       unsigned long start, unsigned long end)
> +{
> +	struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> +						   notifier);
> +	struct drm_i915_private *dev_priv = ims->dev_priv;
> +	struct drm_device *dev = dev_priv->dev;
> +
> +	/* FIXME: invalidate page only */
> +	intel_iommu_tlb_flush(dev);
> +}
> +
> +static const struct mmu_notifier_ops intel_mmuops = {
> +	.release = intel_mm_release,
> +	/* no clear_flush_young, we just share the x86 bits */
> +	/* no test_young, we just share the x86 bits */
> +	.change_pte = intel_change_pte,
> +	.invalidate_page = intel_invalidate_page,
> +	.invalidate_range_start = intel_invalidate_range_start,
> +	.invalidate_range_end = intel_invalidate_range_end,
> +};
> +
> +struct intel_mm_struct *intel_bind_mm(struct drm_device *dev,
> +				      struct intel_context *ctx)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct intel_mm_struct *ims;
> +	struct mmu_notifier *mn;
> +	int ret;
> +
> +	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> +
> +	mn = mmu_find_ops(current->mm, &intel_mmuops);

Magic function, I am missing its definition

> +	if (mn) {
> +		ims = container_of(mn, struct intel_mm_struct, notifier);
> +		kref_get(&ims->kref);
> +		goto out;
> +	}
> +
> +	ims = kzalloc(sizeof(*ims), GFP_KERNEL);
> +	if (!ims) {
> +		ret = -ENOMEM;
> +		goto error;
> +	}
> +	INIT_LIST_HEAD(&ims->context_list);
> +
> +	ims->notifier.ops = &intel_mmuops;
> +
> +	ret = mmu_notifier_register(&ims->notifier, current->mm);

This has lock inversion between struct_mutex and mm->mmap_sem.

> +	if (ret)
> +		goto error;
> +
> +	ims->dev_priv = dev->dev_private;
> +
> +out:
> +	list_add(&ctx->mm_list, &ims->context_list);
> +	return ims;
> +error:
> +	kfree(ims);
> +	return ERR_PTR(ret);
> +}
> +
> +static void intel_mm_free(struct kref *ims_ref)
> +{
> +	struct intel_mm_struct *ims =
> +		container_of(ims_ref, struct intel_mm_struct, kref);
> +
> +	mmu_notifier_unregister(&ims->notifier, current->mm);

More lock inversion.

> +	kfree(ims);
> +}
> +
> +void intel_unbind_mm(struct intel_context *ctx)
> +{
> +	struct drm_i915_private *dev_priv = ctx->ims->dev_priv;
> +
> +	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> +
> +	list_del(&ctx->mm_list);
> +	kref_put(&ctx->ims->kref, intel_mm_free);
> +
> +	return;
> +}
> +
> +int intel_exec_mm_ioctl(struct drm_device *dev, void *data,
> +			struct drm_file *file)
> +{
> +//	struct drm_i915_exec_mm *exec_mm = data;
> +//	struct drm_i915_private *dev_priv = dev->dev_private;
> +
> +	/* Load new context into context reg */

Ah, there is a modicum of user API here.

> +	return 0;
> +}
> +
> +/*
> + * The PASID table has 32 entries in the current config, rotate through
> + * them as needed.
> + */
> +int intel_alloc_pasid(struct drm_device *dev, struct intel_context *ctx)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct pasid_table_entry *table;
> +	int i;
> +
> +	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> +
> +	spin_lock(&dev_priv->svm.lock);
> +	table = dev_priv->svm.pasid_table;
> +
> +	for (i = 0; i < PASID_COUNT; i++) {
> +		if (!table[i].present)
> +			goto found;
> +	}
> +
> +	spin_unlock(&dev_priv->svm.lock);
> +	return -1;
> +
> +found:
> +	table[i].pml4 = __pa(current->mm->pgd) >> PAGE_SHIFT;
> +	table[i].present = 1;
> +
> +	ctx->pasid = i;
> +	dev_priv->svm.pasid_ctx[ctx->pasid] = NULL;
> +	spin_unlock(&dev_priv->svm.lock);
> +
> +	intel_iommu_tlb_flush(dev);
> +
> +	return 0;
> +}
> +
> +void intel_free_pasid(struct drm_device *dev, struct intel_context *ctx)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct pasid_table_entry *table;
> +
> +	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> +
> +	if (ctx->pasid >= PASID_COUNT)
> +		return;
> +
> +	spin_lock(&dev_priv->svm.lock);
> +	table = dev_priv->svm.pasid_table;
> +	memset(&table[ctx->pasid], 0, sizeof(struct pasid_table_entry));
> +	dev_priv->svm.pasid_ctx[ctx->pasid] = NULL;
> +	ctx->pasid = -1;
> +	spin_unlock(&dev_priv->svm.lock);
> +
> +	intel_iommu_tlb_flush(dev);
> +}
> +
> +/*
> + * Each root table entry is 16 bytes wide.  In legacy mode, only
> + * the lower 64 bits are used:
> + *   Bits 38:12: context table pointer
> + *   Bit 0: present
> + *   all other bits reserved
> + * In extended mode (what we use for SVM):
> + *   Bits 102:76: upper context table pointer
> + *   Bit 64: upper present
> + *   Bits 38:12: lower context table pointer
> + *   Bit 0: lower present
> + *   all other bits reserved
> + *
> + * The context entries are 128 bit in legacy mode:
> + *   Bits 87:72: Domain ID
> + *   Bits 70:67: Available
> + *   Bits 66:64: Address width
> + *   Bits 38:12: Page table pointer
> + *   Bits 3:2: Translation type
> + *     00 - only untranslated DMA requests go through this table
> + *          translated and translation requests are blocked
> + *     01 - untranslated, translated, and translation requests supported
> + *     10 - untranslated requests are treated as pass through (HPA == GPA),
> + *          translated DMA requests and translation requests are blocked
> + *     11 - reserved
> + *   Bit 1: fault disable
> + *   Bit 0: Present
> + * and 256 bit in extended:
> + *   Bits 230:204: PASID state table pointer
> + *   Bits 166:140: PASID table pointer
> + *   Bits 131:128: PASID table size
> + *   Bits 127:96: Page table attribute (PAT)
> + *   Bit 92: SL64KPE
> + *   Bit 91: SLEE
> + *   Bit 90: ERE
> + *   Bit 89: SRE
> + *   Bit 88: SMEP
> + *   Bits 87:72: Domain ID
> + *   Bit 71: Extended memory type enable
> + *   Bit 70: cache disable (CD)
> + *   Bit 69: write protect (WP)
> + *   Bit 68: no execute enable (NXE)
> + *   Bit 67: page global enable (PGE)
> + *   Bits 66:64: address width
> + *   Bits 38:12: 2nd level (VT-d) page table pointer
> + *   Bit 11: PASID enable
> + *   Bit 10: Nesting enable
> + *   Bit 9: Page Request enable
> + *   Bit 8: Lazy-Invalidate enable
> + *   Bits 7:5: Extended Memory Type (VT-d)
> + *   Bits 4:2: Translation type
> + *     000 - Only Untranslated DMA requests are translated through this page
> + *           table. Translated DMA requests and Translation Requests are
> + *           blocked.  Untranslated requests-without-PASID are remapped using
> + *           the second-level page-table referenced through SLPTPTR field.
> + *           If PASIDE field is Set, Untranslated requests-with-PASID are
> + *           remapped using the PASID Table referenced through PASIDPTPTR
> + *           field. If PASIDE field is Clear, Untranslated requests-with-PASID
> + *           are blocked.  Translation requests (with or without PASID), and
> + *           Translated Requests are blocked.
> + *     001 - Un-translated and Translation requests without PASID supported
> + *           (and with PASID supported, if PASID Enable Set); Translate
> + *           requests bypass address translation.  Untranslated
> + *           requests-without-PASID and Translation requests-without-PASID are
> + *           remapped using the second level page-table referenced through
> + *           SLPTPTR field. If PASIDE field is Set, Untranslated
> + *           requests-with-PASID and Translation requests-with-PASID are
> + *           remapped using the PASID Table referenced through PASIDPTPTR
> + *           field. If PASIDE field is Clear, Untranslated requests-with-PASID,
> + *           and Translation requests-with-PASID, are blocked. Translated
> + *           requests bypass address translation.
> + *     010 - If Pass-through Supported (GT supports pass-through),
> + *           Un-translated requests without PASID bypass address translation;
> + *           All other requests (with or without PASID) blocked. Untranslated
> + *           requests-without-PASID bypass address translation and are
> + *           processed as passthrough. SLPTPTR field is ignored by hardware.
> + *           Untranslated requests-with-PASID, Translation requests (with or
> + *           without PASID), and Translated requests are blocked.
> + *     011 - Reserved.
> + *     100 - Un-translated requests without PASID bypass address translation;
> + *           Un-translated requests with PASID supported, if PASID Enable Set;
> + *           All other requests blocked. Untranslated requests-without-PASID
> + *           bypass address translation and are processed as passthrough.
> + *           SLPTPTR field is ignored by hardware. Untranslated
> + *           requests-with-PASID are remapped using the PASID Table referenced
> + *           through PASIDPTPTR field. Translation requests (with or without
> + *           PASID) and Translated requests are blocked.
> + *     101 - Un-translated and Translation requests without PASID bypass
> + *           address translation; Un-translated and Translation requests with
> + *           PASID supported, if PASID Enable Set; Translated requests bypass
> + *           address translation.  Untranslated requests-without-PASID bypass
> + *           address translation and are processed as passthrough. SLPTPTR
> + *           field is ignored by hardware.  Translation requests-without-PASID
> + *           are responded with Untranslated access only bit Set (U=1) along
> + *           with read and write permissions (R=W=1). SLPTPTR field is ignored
> + *           by hardware. Untranslated requests-with-PASID, and Translation
> + *           requests-with-PASID are remapped using the PASID Table referenced
> + *           through PASIDPTPTR field.  Translated requests bypass address
> + *           translation.
> + *     110 - Un-translated requests without PASID are blocked; Un-translated
> + *           requests with PASID supported, if PASID Enable Set; All other
> + *           requests blocked – Not applicable to GFX, GT should treat this as
> + *           reserved.
> + *     111 - Un-translated and Translation requests without PASID blocked;
> + *           Un-translated and Translation requests with PASID supported, if
> + *           PASID Enable Set; Translated requests bypass address translation.
> + *           Note: Not applicable to GFX, GT should treat this as reserved.
> + *   Bit 1: Fault disable
> + *   Bit 0: Present
> + *
> + * Page walks for graphics addresses can go through one or two levels of
> + * translation, depending on whether VT-d is enabled.
> + *
> + * If we're in driver mode (currently the only supported mode), we always
> + * use a single level of translation, meaning the second level page table
> + * pointer (if present) is ignored.
> + *
> + * The full walk starts at the root table, which indexes into the upper
> + * and lower context tables.  Those tables point to PASID mapping and state
> + * tables and potentially a second level page table for VT-d (which, as noted
> + * above, is unused currently).  The PASID mapping table points to a PML4
> + * (x86 compatible) page table, while the state table indicates other
> + * information about the PASID involved in the request, which ultimately comes
> + * from the execlist port submission of the context descriptor.
> + *
> + * To enable a shared CPU/GPU address space, we can use a couple of different
> + * translation types, either 101 or 01 w/o nesting.  The main requirement
> + * is that requests with PASID are translated through the page tables provided,
> + * potentially with nesting if we're running in a VT-d context (which we
> + * don't currently support).
> + */
> +#define CONTEXT_OFFSET (PAGE_SIZE * 1)
> +#define PASID_OFFSET (PAGE_SIZE * 2)
> +#define PASID_STATE_OFFSET (PAGE_SIZE * 3)
> +#define PRQ_OFFSET (PAGE_SIZE * 4)
> +#define IVQ_OFFSET (PAGE_SIZE * 5)
> +static void intel_init_svm_root_table(struct drm_device *dev,
> +				      drm_dma_handle_t *tables)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	struct extended_root_table_entry *root_table;
> +	struct extended_context_table_entry *context;
> +	struct pasid_table_entry *pasid_table;
> +	struct pasid_state_table_entry *pasid_state_table;
> +	u64 *tmp;
> +
> +	root_table = tables->vaddr;
> +	context = tables->vaddr + CONTEXT_OFFSET;
> +        pasid_table = tables->vaddr + PASID_OFFSET;
> +	pasid_state_table = tables->vaddr + PASID_STATE_OFFSET;
> +
> +	DRM_ERROR("programmed PASID table, vaddr %p, busaddr 0x%16llx\n",
> +		  pasid_table, tables->busaddr + PASID_OFFSET);
> +
> +	/* Context entry for gfx device */
> +	context[16].pat = 0x66666666;
> +	context[16].ere = 1;
> +	context[16].sre = 1;
> +	context[16].smep = 1;
> +	context[16].domain_id = 1;
> +	context[16].addr_width = AGAW_48; /* full x86 walk */
> +	context[16].pasid_en = 1;
> +	context[16].nesting_en = 0; /* not yet */
> +	context[16].pg_req_en = 1;
> +	context[16].lazy_invalidate_en = 1;
> +	context[16].ext_mem_type = EXTENDED_MTYPE_WB;
> +	context[16].translation_type = EXTENDED_TTYPE_UT_TR_PASID_PT;
> +	context[16].fault_disable = 0;
> +	context[16].present = 1;
> +	context[16].pasid_state_table_addr = (tables->busaddr + PASID_STATE_OFFSET) >> PAGE_SHIFT;
> +	context[16].pasid_table_addr = (tables->busaddr + PASID_OFFSET) >>
> +		PAGE_SHIFT;
> +	context[16].pasid_table_size = 0; /* 2^(5+x) */
> +
> +	tmp = (u64 *)&context[16];
> +	DRM_ERROR("root entry: 0x%016llx%016llx\n", tmp[1], tmp[0]);
> +
> +	DRM_ERROR("programmed context table, vaddr %p, busaddr 0x%16llx\n",
> +		  context, tables->busaddr + CONTEXT_OFFSET);
> +
> +	/* Root table */
> +	root_table[0].lo_ctx_addr = (tables->busaddr + CONTEXT_OFFSET) >>
> +		PAGE_SHIFT;
> +	root_table[0].lo_present = 1;
> +	root_table[0].hi_present = 0;
> +
> +	tmp = (u64 *)&root_table[0];
> +	DRM_ERROR("root entry: 0x%016llx%016llx\n", tmp[1], tmp[0]);
> +
> +	dev_priv->svm.root_table = root_table;
> +	dev_priv->svm.context = context;
> +        dev_priv->svm.pasid_table = pasid_table;
> +	dev_priv->svm.pasid_state_table = pasid_state_table;
> +	dev_priv->svm.prq_ring = tables->vaddr + PRQ_OFFSET;
> +	dev_priv->svm.ivq_ring = tables->vaddr + IVQ_OFFSET;
> +
> +	/* Enable the page request queue */
> +	I915_WRITE64(SVM_PRQA, tables->busaddr + PRQ_OFFSET);
> +	I915_WRITE(SVM_PRQ_HEAD, 0);
> +	I915_WRITE(SVM_PRQ_TAIL, 0);
> +	I915_WRITE(SVM_PRECTL, 0);
> +
> +	/* Set up the invalidation request queue */
> +	I915_WRITE64(SVM_IQA, tables->busaddr + IVQ_OFFSET);
> +	I915_WRITE(SVM_IVQ_HEAD, 0);
> +	I915_WRITE(SVM_IVQ_TAIL, 0);
> +	I915_WRITE(SVM_IECTL, 0);
> +
> +	I915_WRITE(SVM_GCMD, GCMD_QIE);
> +	if (wait_for(I915_READ(SVM_GSTS) & GSTS_QIES, 500))
> +		DRM_ERROR("timed out waiting for queued invalidation enable\n");
> +
> +	/* All set, program the root */
> +	I915_WRITE(SVM_RTADDR, tables->busaddr | SVM_RTT_TYPE_EXT);
> +
> +	I915_WRITE(SVM_GCMD, GCMD_SRTP);
> +	if (wait_for(I915_READ(SVM_GSTS) & GSTS_RTPS, 500))
> +		DRM_ERROR("timed out waiting for root table to load\n");
> +
> +	DRM_ERROR("programmed SVM root, vaddr %p, busaddr 0x%16llx\n",
> +		  tables->vaddr, tables->busaddr);
> +
> +	intel_iommu_tlb_flush(dev);
> +}
> +
> +/*
> + * Probe for SVM capability.  If found:
> + *  - try to switch to driver mode
> + *  - set up root PASID table
> + *  - enable page fault and error handling interrupts
> + *  - allow SVM ioctls
> + */
> +void intel_init_svm(struct drm_device *dev)
> +{
> +	struct drm_i915_private *dev_priv = dev->dev_private;
> +	drm_dma_handle_t *tables;
> +	u32 dev_mode;
> +	int num_tables = 6;
> +
> +	dev_mode = I915_READ(BDW_SVM_DEV_MODE_CNFG);
> +	I915_WRITE(BDW_SVM_DEV_MODE_CNFG, dev_mode | BDW_SVM_MODE_DRIVER);
> +	dev_mode = I915_READ(BDW_SVM_DEV_MODE_CNFG);
> +#if defined(CONFIG_INTEL_IOMMU) || defined(IOMMU_SUPPORT)
> +#error must disable IOMMU support
> +#endif
> +	if (!dev_mode & BDW_SVM_MODE_DRIVER) {
> +		DRM_ERROR("driver mode not available, disabling SVM\n");
> +		goto err;
> +	}
> +
> +	tables = drm_pci_alloc(dev, PAGE_SIZE*num_tables, PAGE_SIZE);
> +	if (!tables) {
> +		DRM_ERROR("table alloc failed, disabling SVM\n");
> +		goto err;
> +	}
> +
> +	memset(tables->vaddr, 0, PAGE_SIZE*num_tables);
> +
> +	intel_init_svm_root_table(dev, tables);
> +
> +	spin_lock_init(&dev_priv->svm.lock);
> +
> +#if 0
> +	I915_WRITE(SVM_GCMD, GCMD_TE);
> +	if (wait_for(I915_READ(SVM_GSTS) & GSTS_TES, 500))
> +		DRM_ERROR("timed out waiting for translation enable\n");
> +#endif
> +	INIT_WORK(&dev_priv->svm.work, intel_gpu_fault_work);
> +
> +	DRM_ERROR("SVM driver mode enabled\n");
> +	dev_priv->svm.svm_available = true;
> +	return;
> +
> +err:
> +	dev_priv->svm.svm_available = false;
> +	return;
> +}
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 40cbba4..1450491 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -217,6 +217,7 @@ enum {
>  	FAULT_AND_STREAM,
>  	FAULT_AND_CONTINUE /* Unsupported */
>  };
> +#define GEN8_CTX_FAULT_SHIFT 6
>  #define GEN8_CTX_ID_SHIFT 32
>  #define CTX_RCS_INDIRECT_CTX_OFFSET_DEFAULT  0x17
>  
> @@ -289,12 +290,21 @@ uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
>  	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
>  
>  	desc = GEN8_CTX_VALID;
> -	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
> -	if (IS_GEN8(ctx_obj->base.dev))
> -		desc |= GEN8_CTX_L3LLC_COHERENT;
> -	desc |= GEN8_CTX_PRIVILEGE;
> -	desc |= lrca;
> -	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
> +	if (ctx->is_svm) {
> +		desc |= ADVANCED_CONTEXT << GEN8_CTX_ADDRESSING_MODE_SHIFT;
> +		desc |= FAULT_AND_STREAM << GEN8_CTX_FAULT_SHIFT;
> +		desc |= lrca;
> +		desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
> +	} else {
> +		desc |= GEN8_CTX_ADDRESSING_MODE(dev) <<
> +			GEN8_CTX_ADDRESSING_MODE_SHIFT;
> +		if (IS_GEN8(ctx_obj->base.dev))
> +			desc |= GEN8_CTX_L3LLC_COHERENT;
> +		desc |= GEN8_CTX_PRIVILEGE;
> +		desc |= lrca;
> +		desc |= (u64)intel_execlists_ctx_id(ctx_obj) <<
> +			GEN8_CTX_ID_SHIFT;
> +	}
>  
>  	/* TODO: WaDisableLiteRestore when we start using semaphore
>  	 * signalling between Command Streamers */
> @@ -545,7 +555,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
>  		   _MASKED_FIELD(0x07 << 8, ((u32)ring->next_context_status_buffer & 0x07) << 8));
>  }
>  
> -static int execlists_context_queue(struct drm_i915_gem_request *request)
> +int execlists_context_queue(struct drm_i915_gem_request *request)
>  {
>  	struct intel_engine_cs *ring = request->ring;
>  	struct drm_i915_gem_request *cursor;
> @@ -2273,31 +2283,40 @@ populate_lr_context(struct intel_context *ctx, struct drm_i915_gem_object *ctx_o
>  	reg_state[CTX_LRI_HEADER_1] |= MI_LRI_FORCE_POSTED;
>  	reg_state[CTX_CTX_TIMESTAMP] = ring->mmio_base + 0x3a8;
>  	reg_state[CTX_CTX_TIMESTAMP+1] = 0;
> -	reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
> -	reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
> -	reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
> -	reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
> -	reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
> -	reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
> -	reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> -	reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> -
> -	if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> -		/* 64b PPGTT (48bit canonical)
> -		 * PDP0_DESCRIPTOR contains the base address to PML4 and
> -		 * other PDP Descriptors are ignored.
> -		 */
> -		ASSIGN_CTX_PML4(ppgtt, reg_state);
> +
> +	if (ctx->is_svm) {
> +		reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> +		reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> +		reg_state[CTX_PDP0_UDW+1] = 0;
> +		reg_state[CTX_PDP0_LDW+1] = ctx->pasid;
>  	} else {
> -		/* 32b PPGTT
> -		 * PDP*_DESCRIPTOR contains the base address of space supported.
> -		 * With dynamic page allocation, PDPs may not be allocated at
> -		 * this point. Point the unallocated PDPs to the scratch page
> -		 */
> -		ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> -		ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> -		ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> -		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> +		reg_state[CTX_PDP3_UDW] = GEN8_RING_PDP_UDW(ring, 3);
> +		reg_state[CTX_PDP3_LDW] = GEN8_RING_PDP_LDW(ring, 3);
> +		reg_state[CTX_PDP2_UDW] = GEN8_RING_PDP_UDW(ring, 2);
> +		reg_state[CTX_PDP2_LDW] = GEN8_RING_PDP_LDW(ring, 2);
> +		reg_state[CTX_PDP1_UDW] = GEN8_RING_PDP_UDW(ring, 1);
> +		reg_state[CTX_PDP1_LDW] = GEN8_RING_PDP_LDW(ring, 1);
> +		reg_state[CTX_PDP0_UDW] = GEN8_RING_PDP_UDW(ring, 0);
> +		reg_state[CTX_PDP0_LDW] = GEN8_RING_PDP_LDW(ring, 0);
> +
> +		if (USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
> +			/* 64b PPGTT (48bit canonical)
> +			 * PDP0_DESCRIPTOR contains the base address to PML4 and
> +			 * other PDP Descriptors are ignored.
> +			 */
> +			ASSIGN_CTX_PML4(ppgtt, reg_state);
> +		} else {
> +			/* 32b PPGTT
> +			 * PDP*_DESCRIPTOR contains the base address of space
> +			 * supported. With dynamic page allocation, PDPs may
> +			 * not be allocated at this point. Point the
> +			 * unallocated PDPs to the scratch page
> +			 */
> +			ASSIGN_CTX_PDP(ppgtt, reg_state, 3);
> +			ASSIGN_CTX_PDP(ppgtt, reg_state, 2);
> +			ASSIGN_CTX_PDP(ppgtt, reg_state, 1);
> +			ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
> +		}
>  	}
>  
>  	if (ring->id == RCS) {
> @@ -2327,6 +2346,12 @@ void intel_lr_context_free(struct intel_context *ctx)
>  {
>  	int i;
>  
> +        if (ctx->is_svm) {
> +                intel_free_pasid(ctx->ims->dev_priv->dev, ctx);
> +                intel_unbind_mm(ctx);
> +		put_task_struct(ctx->tsk);
> +       }
> +
>  	for (i = 0; i < I915_NUM_RINGS; i++) {
>  		struct drm_i915_gem_object *ctx_obj = ctx->engine[i].state;
>  
> @@ -2480,6 +2505,37 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>  
>  	}
>  
> +	if (ctx->is_svm) {
> +		/* FIXME: just skip here, don't bail and trash the ctx */
> +		if (ring->id != RCS) {
> +			DRM_DEBUG_DRIVER("svm context only allowed on RCS\n");

That's fairly useless then :)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-08 11:29         ` Tomas Elf
@ 2015-10-08 22:46           ` David Woodhouse
  2015-10-09  7:28             ` Daniel Vetter
  0 siblings, 1 reply; 36+ messages in thread
From: David Woodhouse @ 2015-10-08 22:46 UTC (permalink / raw)
  To: Tomas Elf, Daniel Vetter, Jesse Barnes; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1034 bytes --]

On Thu, 2015-10-08 at 12:29 +0100, Tomas Elf wrote:
> 
> Could someone clarify what this means from the TDR point of view, 
> please? When you say "context blew up" I'm guessing that you mean that 
> come context caused the fault handler to get involved somehow?
> 
> Does this imply that the offending context will hang and the driver will 
> have to detect this hang? If so, then yes - if we have the per-engine 
> hang recovery mode as part of the upcoming TDR work in place then we 
> could handle it by stepping over the offending batch buffer and moving 
> on with a minimum of side-effects on the rest of the driver/GPU.

I don't think the context does hang.

I've made the page-request code artificially fail and report that it
was an invalid page fault. The gem_svm_fault test seems to complete
(albeit complaining that the test failed). Whereas if I just don't
service the page-request at all, *then* the GPU hang is detected.

I haven't actually looked at precisely what *is* happening.

-- 
dwmw2


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-08 15:57   ` Chris Wilson
@ 2015-10-09  7:24     ` David Woodhouse
  0 siblings, 0 replies; 36+ messages in thread
From: David Woodhouse @ 2015-10-09  7:24 UTC (permalink / raw)
  To: Chris Wilson, Jesse Barnes; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 7422 bytes --]

On Thu, 2015-10-08 at 16:57 +0100, Chris Wilson wrote:
> Correct me if I am wrong, but it looks like i915_svm implements the
> lowlevel interface with the hardware, so by convention is intel_svm.c

I think the plan is to drop the driver-mode interface altogether and
use the OS-mode version that I posted a few hours ago. So some of your
review may not apply; other parts might.

> > +	struct task_struct *tsk;
> 
> One task? A context can be passed by the device fd to another process.
> Do we inherit the VM along with the context? I don't anything to prevent
> such.

You still get to deal with this, and yes — the assigned PASID value
(given by the intel_svm_bind_mm() call) will continue to refer to the
VM of the process which called intel_svm_bind_mm(). Even when it's used
from elsewhere.

Note that you *could* bind the PASID at the time you actually submit
the request, and in that case you could bind it to the VM of the
process which is submitting the request, rather than the process which
did the initial setup.

> > +static void gpu_mm_segv(struct task_struct *tsk, unsigned long
> > address,
> > +			int si_code)
> > +{
> > +	siginfo_t info;
> > +
> > +	/* Need specific signal info here */
> > +	info.si_signo	= SIGSEGV;
> > +	info.si_errno	= EIO;
> > +	info.si_code	= si_code;
> > +	info.si_addr	= (void __user *)address;
> > +
> > +	force_sig_info(SIGSEGV, &info, tsk);
> 
> force_sig_info() is not exported, ah you builtin i915-svm.c

I'm leaving this with you too, as discussed in a separate thread.
The OS-mode IOMMU support is just returning an appropriate 'failed'
response code to the page request, and letting the endpoint device
handle that as it sees fit.


> > +	spin_lock(&dev_priv->svm.lock);
> > +	ctx = dev_priv->svm.pasid_ctx[desc.pasid];
> > +	tsk = ctx->tsk;
> > +	mm = tsk->mm;
> > +	address = desc.addr << PAGE_SHIFT;
> > +	ringbuf = ctx->engine[RCS].ringbuf;
> > +	spin_unlock(&dev_priv->svm.lock);
> 
> All of the above can disappear at anytime after the unlock?

Answering for my own code... I don't touch any of the gfx context
stuff, obviously, and I don't even use the task. I do have a reference
on the mm and it isn't going away.

> > +
> > +	down_read_trylock(&mm->mmap_sem);
> > +	vma = find_extend_vma(mm, address);
> > +	if (!vma || address < vma->vm_start) {
> > +		DRM_ERROR("bad VMA or address out of range\n");

Um... Jesse, I used that same 'address < vma->vm_start' check in my own
version. But is that going to prevent us from allowing the GPU to grow
the stack VMA downwards? I note the AMD one does it too.

> > +/* Make sure GPU writes can't hit the mm that's about to go away
> > */
> > +static void intel_mm_release(struct mmu_notifier *mn, struct
> > mm_struct *mm)
> > +{
> > +	struct intel_mm_struct *ims = container_of(mn, struct
> > intel_mm_struct,
> > +						   notifier);
> > +	struct drm_i915_private *dev_priv = ims->dev_priv;
> > +	struct drm_device *dev = dev_priv->dev;
> > +	struct intel_context *ctx;
> > +
> > +> > 	> > /*
> > +> > 	> >  * Wait for any outstanding activity and unbind the mm.  Since
> > +> > 	> >  * each context has its own ring, we can simply wait for the ring
> > +> > 	> >  * to idle before invalidating the PASID and flushing the TLB.
> > +> > 	> >  */
> > +	mutex_lock(&dev->struct_mutex);
> > +	list_for_each_entry(ctx, &ims->context_list, mm_list) {
> > +		intel_ring_idle(ctx->engine[RCS].ringbuf->ring);
> > +	}
> > +
> > +	intel_iommu_tlb_flush(dev_priv->dev);
> > +	mutex_unlock(&dev->struct_mutex);
> 
> Erm, what! So you halt the GPU everytime? But you've already 
> invalidated the shadow PTE -- ah, invalidate-range looks to be a wip.

That's the callback for when the MM goes away — on process exit, or
when we unbind the PASID. But it's still suboptimal, and I am strongly
resisting the temptation to have a callback from the OS-mode code into
the device driver in this code path.

For the normal case of unbinding the PASID cleanly, device drivers are
expected to flush their queue of requests for a given PASID *before*
calling intel_svm_unbind_pasid(). That includes contexts which might be
stalled, waiting for a page request. So you're expected to do any ring
management in advance.

If a process exits uncleanly, exit_mm() happens before exit_files(). So
the MM has gone before the i915 driver gets to clean up. In that case,
we end up in the intel_mm_release() function and it just clears the
page tables and flushes the IOTLB. Accesses will take faults, and the
proper cleanup will happen shortly.

> > +static void intel_change_pte(struct mmu_notifier *mn, struct mm_struct *mm,
> > +> > 	> > 	> > 	> >      unsigned long address, pte_t pte)
> > +{
> > +> > 	> > struct intel_mm_struct *ims = container_of(mn, struct intel_mm_struct,
> > +> > 	> > 	> > 	> > 	> > 	> > 	> >    notifier);
> > +> > 	> > struct drm_i915_private *dev_priv = ims->dev_priv;
> > +> > 	> > struct drm_device *dev = dev_priv->dev;
> > +
> > +> > 	> > struct intel_context *ctx;
> > +
> > +> > 	> > mutex_lock(&dev->struct_mutex);
> > +> > 	> > list_for_each_entry(ctx, &ims->context_list, mm_list)
> > +> > 	> > 	> > intel_flush_page_locked(dev, ctx->pasid, address);
> > +> > 	> > mutex_unlock(&dev->struct_mutex);
> 
> Suggests you really want a ims->spinlock for context_list instead.

We now use a single PASID for all contexts, so we don't need any list
traversal like this anyway.

> > +struct intel_mm_struct *intel_bind_mm(struct drm_device *dev,
> > +				      struct intel_context *ctx)
> > +{
> > +	struct drm_i915_private *dev_priv = dev->dev_private;
> > +	struct intel_mm_struct *ims;
> > +	struct mmu_notifier *mn;
> > +	int ret;
> > +
> > +	WARN_ON(!mutex_is_locked(&dev_priv->dev->struct_mutex));
> > +
> > +	mn = mmu_find_ops(current->mm, &intel_mmuops);
> 
> Magic function, I am missing its definition

A previous patch in the series exports this. It basically just checks
if there is *already* a mmu_notifier registered with the same
mmu_notifier_ops structure, and returns it if so.

Because of the hairy locking rules around the mmu_notifier, I've done
this differently in my own code — using idr_for_each_entry() on the IDR
I use to allocate PASIDs. That's fine while there is a relatively low
number of PASIDs allocated (i.e. a low number of *processes* using
PASIDs, since there's one-PASID-per-process). But if we end up with
lots, that idr_for_each_entry() might become burdensome and we might
need to take another look.
 
> > +	if (mn) {
> > +		ims = container_of(mn, struct intel_mm_struct,
> > notifier);
> > +		kref_get(&ims->kref);
> > +		goto out;
> > +	}
> > +
> > +	ims = kzalloc(sizeof(*ims), GFP_KERNEL);
> > +	if (!ims) {
> > +		ret = -ENOMEM;
> > +		goto error;
> > +	}
> > +	INIT_LIST_HEAD(&ims->context_list);
> > +
> > +	ims->notifier.ops = &intel_mmuops;
> > +
> > +	ret = mmu_notifier_register(&ims->notifier, current->mm);
> 
> This has lock inversion between struct_mutex and mm->mmap_sem.

I think this is gone, because I no longer take my equivalent
pasid_mutex within the mmu_notifier callbacks. Unless I'm missing
something.


-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-08 22:46           ` David Woodhouse
@ 2015-10-09  7:28             ` Daniel Vetter
  2015-10-09  7:52               ` Daniel Vetter
  2015-10-09  7:56               ` David Woodhouse
  0 siblings, 2 replies; 36+ messages in thread
From: Daniel Vetter @ 2015-10-09  7:28 UTC (permalink / raw)
  To: David Woodhouse; +Cc: intel-gfx

On Thu, Oct 08, 2015 at 11:46:08PM +0100, David Woodhouse wrote:
> On Thu, 2015-10-08 at 12:29 +0100, Tomas Elf wrote:
> > 
> > Could someone clarify what this means from the TDR point of view, 
> > please? When you say "context blew up" I'm guessing that you mean that 
> > come context caused the fault handler to get involved somehow?
> > 
> > Does this imply that the offending context will hang and the driver will 
> > have to detect this hang? If so, then yes - if we have the per-engine 
> > hang recovery mode as part of the upcoming TDR work in place then we 
> > could handle it by stepping over the offending batch buffer and moving 
> > on with a minimum of side-effects on the rest of the driver/GPU.
> 
> I don't think the context does hang.
> 
> I've made the page-request code artificially fail and report that it
> was an invalid page fault. The gem_svm_fault test seems to complete
> (albeit complaining that the test failed). Whereas if I just don't
> service the page-request at all, *then* the GPU hang is detected.
> 
> I haven't actually looked at precisely what *is* happening.

Hm if this still works the same way as on older platforms then pagefaults
just read all 0 and writes go nowhere from the gpu. That generally also
explains ever-increasing numbers of the CS execution pointer since it's
busy churning through 48b worth of address space filled with MI_NOP. I'd
have hoped our hw would do better than that with svm ...

If there's really no way to make it hang when we complete the fault then I
guess we'll have to hang it by not completing. Otherwise we'll have to
roll our own fault detection code right from the start.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-09  7:28             ` Daniel Vetter
@ 2015-10-09  7:52               ` Daniel Vetter
  2015-10-09  7:56               ` David Woodhouse
  1 sibling, 0 replies; 36+ messages in thread
From: Daniel Vetter @ 2015-10-09  7:52 UTC (permalink / raw)
  To: David Woodhouse; +Cc: intel-gfx

On Fri, Oct 09, 2015 at 09:28:37AM +0200, Daniel Vetter wrote:
> On Thu, Oct 08, 2015 at 11:46:08PM +0100, David Woodhouse wrote:
> > On Thu, 2015-10-08 at 12:29 +0100, Tomas Elf wrote:
> > > 
> > > Could someone clarify what this means from the TDR point of view, 
> > > please? When you say "context blew up" I'm guessing that you mean that 
> > > come context caused the fault handler to get involved somehow?
> > > 
> > > Does this imply that the offending context will hang and the driver will 
> > > have to detect this hang? If so, then yes - if we have the per-engine 
> > > hang recovery mode as part of the upcoming TDR work in place then we 
> > > could handle it by stepping over the offending batch buffer and moving 
> > > on with a minimum of side-effects on the rest of the driver/GPU.
> > 
> > I don't think the context does hang.
> > 
> > I've made the page-request code artificially fail and report that it
> > was an invalid page fault. The gem_svm_fault test seems to complete
> > (albeit complaining that the test failed). Whereas if I just don't
> > service the page-request at all, *then* the GPU hang is detected.
> > 
> > I haven't actually looked at precisely what *is* happening.
> 
> Hm if this still works the same way as on older platforms then pagefaults
> just read all 0 and writes go nowhere from the gpu. That generally also
> explains ever-increasing numbers of the CS execution pointer since it's
> busy churning through 48b worth of address space filled with MI_NOP. I'd
> have hoped our hw would do better than that with svm ...
> 
> If there's really no way to make it hang when we complete the fault then I
> guess we'll have to hang it by not completing. Otherwise we'll have to
> roll our own fault detection code right from the start.

s/detection/handling/ I meant ofc.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-09  7:28             ` Daniel Vetter
  2015-10-09  7:52               ` Daniel Vetter
@ 2015-10-09  7:56               ` David Woodhouse
  2015-10-09  8:47                 ` Daniel Vetter
  1 sibling, 1 reply; 36+ messages in thread
From: David Woodhouse @ 2015-10-09  7:56 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1561 bytes --]

On Fri, 2015-10-09 at 09:28 +0200, Daniel Vetter wrote:
> 
> Hm if this still works the same way as on older platforms then pagefaults
> just read all 0 and writes go nowhere from the gpu. That generally also
> explains ever-increasing numbers of the CS execution pointer since it's
> busy churning through 48b worth of address space filled with MI_NOP. I'd
> have hoped our hw would do better than that with svm ...

I'm looking at simple cases like Jesse's 'gem_svm_fault' test. If the
access to process address space (a single dword write) does nothing,
I'm not sure why it would then churn through MI_NOOPs; why would the
batch still not complete?

> If there's really no way to make it hang when we complete the fault then I
> guess we'll have to hang it by not completing. Otherwise we'll have to
> roll our own fault detection code right from the start.

Well, theoretically there are ways we could handle this. It looks like
if I *don't* give the IOMMU a response to the page request, the context
remains hung and waiting for it. So I could give you a callback,
including the 'private' data from the page request that we know
identifies the context. So perhaps we *could* contrive to give you
precise exceptions when the hardware doesn't really do that sanely.

But I was really trying hard to avoid the necessity for that kind of
hack to work around stupid hardware :)

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-09  7:56               ` David Woodhouse
@ 2015-10-09  8:47                 ` Daniel Vetter
  2015-10-09  9:07                   ` David Woodhouse
  0 siblings, 1 reply; 36+ messages in thread
From: Daniel Vetter @ 2015-10-09  8:47 UTC (permalink / raw)
  To: David Woodhouse; +Cc: intel-gfx

On Fri, Oct 09, 2015 at 08:56:24AM +0100, David Woodhouse wrote:
> On Fri, 2015-10-09 at 09:28 +0200, Daniel Vetter wrote:
> > 
> > Hm if this still works the same way as on older platforms then pagefaults
> > just read all 0 and writes go nowhere from the gpu. That generally also
> > explains ever-increasing numbers of the CS execution pointer since it's
> > busy churning through 48b worth of address space filled with MI_NOP. I'd
> > have hoped our hw would do better than that with svm ...
> 
> I'm looking at simple cases like Jesse's 'gem_svm_fault' test. If the
> access to process address space (a single dword write) does nothing,
> I'm not sure why it would then churn through MI_NOOPs; why would the
> batch still not complete?

Yeah that testcase doesn't fit, the one I had in mind is where the batch
itself faults and the CS just reads MI_NOP forever. No idea why the gpu
just keeps walking through the address space here. Puzzling.

> > If there's really no way to make it hang when we complete the fault then I
> > guess we'll have to hang it by not completing. Otherwise we'll have to
> > roll our own fault detection code right from the start.
> 
> Well, theoretically there are ways we could handle this. It looks like
> if I *don't* give the IOMMU a response to the page request, the context
> remains hung and waiting for it. So I could give you a callback,
> including the 'private' data from the page request that we know
> identifies the context. So perhaps we *could* contrive to give you
> precise exceptions when the hardware doesn't really do that sanely.
> 
> But I was really trying hard to avoid the necessity for that kind of
> hack to work around stupid hardware :)

I'll take a look at your latest series. But yeah we might need that
callback to update resets/ctx-is-toast stats on our side if the gpu
doesn't naturally fall over if there's no memory.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-09  8:47                 ` Daniel Vetter
@ 2015-10-09  9:07                   ` David Woodhouse
  2015-10-09 16:20                     ` Jesse Barnes
  0 siblings, 1 reply; 36+ messages in thread
From: David Woodhouse @ 2015-10-09  9:07 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 1453 bytes --]

On Fri, 2015-10-09 at 10:47 +0200, Daniel Vetter wrote:
> On Fri, Oct 09, 2015 at 08:56:24AM +0100, David Woodhouse wrote:
> > On Fri, 2015-10-09 at 09:28 +0200, Daniel Vetter wrote:
> > > 
> > > Hm if this still works the same way as on older platforms then pagefaults
> > > just read all 0 and writes go nowhere from the gpu. That generally also
> > > explains ever-increasing numbers of the CS execution pointer since it's
> > > busy churning through 48b worth of address space filled with MI_NOP. I'd
> > > have hoped our hw would do better than that with svm ...
> > 
> > I'm looking at simple cases like Jesse's 'gem_svm_fault' test. If the
> > access to process address space (a single dword write) does nothing,
> > I'm not sure why it would then churn through MI_NOOPs; why would the
> > batch still not complete?
> 
> Yeah that testcase doesn't fit, the one I had in mind is where the batch
> itself faults and the CS just reads MI_NOP forever. No idea why the gpu
> just keeps walking through the address space here. Puzzling.

Does it just keep walking through the address space?

When I hacked my page request handler to *not* service the fault and
just say it failed, the batch did seem to complete as normal. Just
without doing the write, as you described.

-- 
David Woodhouse                            Open Source Technology Centre
David.Woodhouse@intel.com                              Intel Corporation


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/9] drm/i915: add fences to the request struct
  2015-09-04 16:59 ` [PATCH 7/9] drm/i915: add fences to the request struct Jesse Barnes
@ 2015-10-09 13:29   ` David Woodhouse
  2015-10-09 16:11     ` Jesse Barnes
  0 siblings, 1 reply; 36+ messages in thread
From: David Woodhouse @ 2015-10-09 13:29 UTC (permalink / raw)
  To: Jesse Barnes, intel-gfx


[-- Attachment #1.1: Type: text/plain, Size: 3475 bytes --]

On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
> 
> @@ -2286,6 +2287,10 @@ struct drm_i915_gem_request {
>         /** Execlists no. of times this request has been sent to the ELSP */
>         int elsp_submitted;
>  
> +       /* core fence obj for this request, may be exported */
> +       struct fence fence;

As discussed, this doesn't work as-is. The final fence_put() will
attempt to free(&req->fence). Unless you have a .release method in your
fence ops, which you don't.

I suppose we could tie up a .release method with the existing release
method for the drm_i915_gem_request.

As things stand, though, bad things are happening. This makes it go
away and at least lets me get on with testing.

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 8ef19e2..2d0c93c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2297,7 +2298,7 @@ struct drm_i915_gem_request {
 	int elsp_submitted;
 
 	/* core fence obj for this request, may be exported */
-	struct fence fence;
+	struct fence *fence;
 
 	wait_queue_t wait;
 };
diff --git a/drivers/gpu/drm/i915/i915_sync.c b/drivers/gpu/drm/i915/i915_sync.c
index 085f1f9..6ffe273 100644
--- a/drivers/gpu/drm/i915/i915_sync.c
+++ b/drivers/gpu/drm/i915/i915_sync.c
@@ -58,7 +58,12 @@ struct i915_sync_timeline {
  *   allow non-RCS fences (need ring/context association)
  */
 
-#define to_i915_request(x) container_of(x, struct drm_i915_gem_request, fence)
+struct foo {
+	struct fence fence;
+	struct drm_i915_gem_request *req;
+};
+
+#define to_i915_request(x) (((struct foo *)(x))->req)
 
 static const char *i915_fence_get_driver_name(struct fence *fence)
 {
@@ -81,10 +86,10 @@ static int i915_fence_ring_check(wait_queue_t *wait, unsigned mode, int flags,
 	if (!i915_gem_request_completed(req, false))
 		return 0;
 
-	fence_signal_locked(&req->fence);
+	fence_signal_locked(req->fence);
 
 	__remove_wait_queue(&ring->irq_queue, wait);
-	fence_put(&req->fence);
+	fence_put(req->fence);
 	ring->irq_put(ring);
 
 	return 0;
@@ -200,6 +205,15 @@ struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
 	if (ret)
 		return ERR_PTR(ret);
 
+	request->fence = kmalloc(sizeof(struct foo), GFP_KERNEL);
+	if (!request->fence) {
+		ret = -ENOMEM;
+		goto err_cancel;
+	}
+	/* I have no clue how this is *supposed* to work and no real interest
+	   in finding out. Just stop hurting me please. */
+	((struct foo *)request->fence)->req = request;
+
 	if (i915.enable_execlists) {
 		ringbuf = ctx->engine[ring->id].ringbuf;
 	} else
@@ -270,10 +284,10 @@ struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
 			   round_jiffies_up_relative(HZ));
 	intel_mark_busy(dev_priv->dev);
 
-	fence_init(&request->fence, &i915_fence_ring_ops, &fence_lock,
+	fence_init(request->fence, &i915_fence_ring_ops, &fence_lock,
 		   ctx->user_handle, request->seqno);
 
-	return &request->fence;
+	return request->fence;
 
 err_cancel:
 	i915_gem_request_cancel(request);
@@ -306,10 +320,10 @@ static struct fence *i915_fence_create_display(struct intel_context *ctx)
 
 	req = ring->outstanding_lazy_request;
 
-	fence_init(&req->fence, &i915_fence_ops, &fence_lock,
+	fence_init(req->fence, &i915_fence_ops, &fence_lock,
 		   ctx->user_handle, req->seqno);
 
-	return &req->fence;
+	return req->fence;
 }
 #endif
 
-- 
dwmw2


[-- Attachment #1.2: smime.p7s --]
[-- Type: application/x-pkcs7-signature, Size: 5691 bytes --]

[-- Attachment #2: Type: text/plain, Size: 159 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [PATCH 7/9] drm/i915: add fences to the request struct
  2015-10-09 13:29   ` David Woodhouse
@ 2015-10-09 16:11     ` Jesse Barnes
  0 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-10-09 16:11 UTC (permalink / raw)
  To: David Woodhouse, intel-gfx

On 10/09/2015 06:29 AM, David Woodhouse wrote:
> On Fri, 2015-09-04 at 09:59 -0700, Jesse Barnes wrote:
>>
>> @@ -2286,6 +2287,10 @@ struct drm_i915_gem_request {
>>         /** Execlists no. of times this request has been sent to the ELSP */
>>         int elsp_submitted;
>>  
>> +       /* core fence obj for this request, may be exported */
>> +       struct fence fence;
> 
> As discussed, this doesn't work as-is. The final fence_put() will
> attempt to free(&req->fence). Unless you have a .release method in your
> fence ops, which you don't.
> 
> I suppose we could tie up a .release method with the existing release
> method for the drm_i915_gem_request.
> 
> As things stand, though, bad things are happening. This makes it go
> away and at least lets me get on with testing.
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 8ef19e2..2d0c93c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2297,7 +2298,7 @@ struct drm_i915_gem_request {
>  	int elsp_submitted;
>  
>  	/* core fence obj for this request, may be exported */
> -	struct fence fence;
> +	struct fence *fence;
>  
>  	wait_queue_t wait;
>  };
> diff --git a/drivers/gpu/drm/i915/i915_sync.c b/drivers/gpu/drm/i915/i915_sync.c
> index 085f1f9..6ffe273 100644
> --- a/drivers/gpu/drm/i915/i915_sync.c
> +++ b/drivers/gpu/drm/i915/i915_sync.c
> @@ -58,7 +58,12 @@ struct i915_sync_timeline {
>   *   allow non-RCS fences (need ring/context association)
>   */
>  
> -#define to_i915_request(x) container_of(x, struct drm_i915_gem_request, fence)
> +struct foo {
> +	struct fence fence;
> +	struct drm_i915_gem_request *req;
> +};
> +
> +#define to_i915_request(x) (((struct foo *)(x))->req)
>  
>  static const char *i915_fence_get_driver_name(struct fence *fence)
>  {
> @@ -81,10 +86,10 @@ static int i915_fence_ring_check(wait_queue_t *wait, unsigned mode, int flags,
>  	if (!i915_gem_request_completed(req, false))
>  		return 0;
>  
> -	fence_signal_locked(&req->fence);
> +	fence_signal_locked(req->fence);
>  
>  	__remove_wait_queue(&ring->irq_queue, wait);
> -	fence_put(&req->fence);
> +	fence_put(req->fence);
>  	ring->irq_put(ring);
>  
>  	return 0;
> @@ -200,6 +205,15 @@ struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
>  	if (ret)
>  		return ERR_PTR(ret);
>  
> +	request->fence = kmalloc(sizeof(struct foo), GFP_KERNEL);
> +	if (!request->fence) {
> +		ret = -ENOMEM;
> +		goto err_cancel;
> +	}
> +	/* I have no clue how this is *supposed* to work and no real interest
> +	   in finding out. Just stop hurting me please. */
> +	((struct foo *)request->fence)->req = request;
> +
>  	if (i915.enable_execlists) {
>  		ringbuf = ctx->engine[ring->id].ringbuf;
>  	} else
> @@ -270,10 +284,10 @@ struct fence *i915_fence_create_ring(struct intel_engine_cs *ring,
>  			   round_jiffies_up_relative(HZ));
>  	intel_mark_busy(dev_priv->dev);
>  
> -	fence_init(&request->fence, &i915_fence_ring_ops, &fence_lock,
> +	fence_init(request->fence, &i915_fence_ring_ops, &fence_lock,
>  		   ctx->user_handle, request->seqno);
>  
> -	return &request->fence;
> +	return request->fence;
>  
>  err_cancel:
>  	i915_gem_request_cancel(request);
> @@ -306,10 +320,10 @@ static struct fence *i915_fence_create_display(struct intel_context *ctx)
>  
>  	req = ring->outstanding_lazy_request;
>  
> -	fence_init(&req->fence, &i915_fence_ops, &fence_lock,
> +	fence_init(req->fence, &i915_fence_ops, &fence_lock,
>  		   ctx->user_handle, req->seqno);
>  
> -	return &req->fence;
> +	return req->fence;
>  }
>  #endif

Yeah this is definitely better than what I had (untested code and all
that).  But the actual signaling and such still needs work.  I had a
question for Maarten on that actually; today it doesn't look like the
fence would enabling signaling at the right point, so I had to add
something.  But I'll look and see what the latest is here from John H; I
know his Android code worked, so it would probably be best to just use that.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [PATCH 6/9] drm/i915: driver based PASID handling
  2015-10-09  9:07                   ` David Woodhouse
@ 2015-10-09 16:20                     ` Jesse Barnes
  0 siblings, 0 replies; 36+ messages in thread
From: Jesse Barnes @ 2015-10-09 16:20 UTC (permalink / raw)
  To: David Woodhouse, Daniel Vetter; +Cc: intel-gfx

On 10/09/2015 02:07 AM, David Woodhouse wrote:
> On Fri, 2015-10-09 at 10:47 +0200, Daniel Vetter wrote:
>> On Fri, Oct 09, 2015 at 08:56:24AM +0100, David Woodhouse wrote:
>>> On Fri, 2015-10-09 at 09:28 +0200, Daniel Vetter wrote:
>>>>
>>>> Hm if this still works the same way as on older platforms then pagefaults
>>>> just read all 0 and writes go nowhere from the gpu. That generally also
>>>> explains ever-increasing numbers of the CS execution pointer since it's
>>>> busy churning through 48b worth of address space filled with MI_NOP. I'd
>>>> have hoped our hw would do better than that with svm ...
>>>
>>> I'm looking at simple cases like Jesse's 'gem_svm_fault' test. If the
>>> access to process address space (a single dword write) does nothing,
>>> I'm not sure why it would then churn through MI_NOOPs; why would the
>>> batch still not complete?
>>
>> Yeah that testcase doesn't fit, the one I had in mind is where the batch
>> itself faults and the CS just reads MI_NOP forever. No idea why the gpu
>> just keeps walking through the address space here. Puzzling.
> 
> Does it just keep walking through the address space?
> 
> When I hacked my page request handler to *not* service the fault and
> just say it failed, the batch did seem to complete as normal. Just
> without doing the write, as you described.

My understanding is that this behavior will depend on how we submit the
work.  We have to faulting modes: halt and stream.  In either case, the
context that faults will be switched out, and the hardware will either
wait for a resubmit (the halt case) to restart the context, or switch to
the next context in the execlist queue.

If the fault is then serviced by the IOMMU layer, potentially as an
error, I'd expect the faulting context to simply fault again.  I don't
think we'd see a GPU hang in the same way we do today, where we get an
indication in the GPU private fault regs and such; they go through the
IOMMU in advanced context mode.

So I think we'll need a callback in the fatal case; we can just kick off
a private i915 worker for that, just like we do for the recoverable case
that's now hidden in the IOMMU layer.

Jesse

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2015-10-09 16:19 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-09-04 16:58 [RFC] Page table sharing and bufferless execbuf Jesse Barnes
2015-09-04 16:58 ` [PATCH 1/9] mm: move mmu_find_ops to mmu_notifier.c Jesse Barnes
2015-09-04 16:58 ` [PATCH 2/9] signal: export force_sig_info Jesse Barnes
2015-09-04 16:58 ` [PATCH 3/9] android/sync: add sync_fence_create_dma Jesse Barnes
2015-09-04 16:58 ` [PATCH 4/9] android/sync: hack: enable fence signaling in Android Native Sync implementation Jesse Barnes
2015-09-04 16:58 ` [PATCH 5/9] drm/i915: add create_context2 ioctl Jesse Barnes
2015-09-04 16:59 ` [PATCH 6/9] drm/i915: driver based PASID handling Jesse Barnes
2015-10-07 13:00   ` David Woodhouse
2015-10-07 15:16     ` Jesse Barnes
2015-10-07 16:14       ` Daniel Vetter
2015-10-07 16:28         ` Jesse Barnes
2015-10-07 17:17           ` David Woodhouse
2015-10-07 17:26             ` Jesse Barnes
2015-10-08  8:12               ` Daniel Vetter
2015-10-08 10:28         ` Tomas Elf
2015-10-08 11:29         ` Tomas Elf
2015-10-08 22:46           ` David Woodhouse
2015-10-09  7:28             ` Daniel Vetter
2015-10-09  7:52               ` Daniel Vetter
2015-10-09  7:56               ` David Woodhouse
2015-10-09  8:47                 ` Daniel Vetter
2015-10-09  9:07                   ` David Woodhouse
2015-10-09 16:20                     ` Jesse Barnes
2015-10-08 15:57   ` Chris Wilson
2015-10-09  7:24     ` David Woodhouse
2015-09-04 16:59 ` [PATCH 7/9] drm/i915: add fences to the request struct Jesse Barnes
2015-10-09 13:29   ` David Woodhouse
2015-10-09 16:11     ` Jesse Barnes
2015-09-04 16:59 ` [PATCH 8/9] drm/i915: Android sync points for i915 v4 (obsolete) Jesse Barnes
2015-09-04 16:59 ` [PATCH 9/9] drm/i915: add bufferless execbuf ioctl Jesse Barnes
2015-09-04 17:37   ` Chris Wilson
2015-09-04 19:09     ` Jesse Barnes
2015-10-08 10:34   ` David Woodhouse
2015-09-04 17:23 ` [RFC] Page table sharing and bufferless execbuf Chris Wilson
2015-09-04 19:08   ` Jesse Barnes
2015-09-26 14:55 ` David Woodhouse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.