All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/9] Convert requests to use struct fence
@ 2015-07-17 14:31 John.C.Harrison
  2015-07-17 14:31 ` [RFC 1/9] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
                   ` (8 more replies)
  0 siblings, 9 replies; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is intended
to keep track of work that is executed on hardware. I.e. it solves the basic
problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
request structure does quite a lot more than simply track the execution progress
so is very definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain all the
advantages that provides.

Using the struct fence object also has the advantage that the fence can be used
outside of the i915 driver (by other drivers or by userland applications). That
is the basis of the dma-buff synchronisation API and allows asynchronous
tracking of work completion. In this case, it allows applications to be
signalled directly when a batch buffer completes without having to make an IOCTL
call into the driver.

This is work that was planned since the conversion of the driver from being
seqno value based to being request structure based. This patch series does that
work.

[Patches against drm-intel-nightly tree fetched 15/07/2015]

John Harrison (7):
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Add per context timelines to fence object
  drm/i915: Delay the freeing of requests until retire time
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Add sync framework support to execbuff IOCTL

Maarten Lankhorst (1):
  android: add sync_fence_create_dma

Tvrtko Ursulin (1):
  staging/android/sync: Support sync points created from dma-fences

 drivers/gpu/drm/i915/i915_debugfs.c        |   2 +-
 drivers/gpu/drm/i915/i915_drv.h            |  73 +++---
 drivers/gpu/drm/i915/i915_gem.c            | 369 +++++++++++++++++++++++++++--
 drivers/gpu/drm/i915/i915_gem_context.c    |  15 +-
 drivers/gpu/drm/i915/i915_gem_execbuffer.c |  90 ++++++-
 drivers/gpu/drm/i915/i915_irq.c            |   2 +-
 drivers/gpu/drm/i915/i915_trace.h          |   7 +-
 drivers/gpu/drm/i915/intel_display.c       |   4 +-
 drivers/gpu/drm/i915/intel_lrc.c           |  12 +
 drivers/gpu/drm/i915/intel_pm.c            |   6 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c    |   4 +
 drivers/gpu/drm/i915/intel_ringbuffer.h    |   7 +
 drivers/staging/android/sync.c             |  13 +-
 drivers/staging/android/sync.h             |  12 +-
 drivers/staging/android/sync_debug.c       |  42 ++--
 include/uapi/drm/i915_drm.h                |  16 +-
 16 files changed, 583 insertions(+), 91 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* [RFC 1/9] staging/android/sync: Support sync points created from dma-fences
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-17 14:44   ` Tvrtko Ursulin
  2015-07-17 14:31 ` [RFC 2/9] android: add sync_fence_create_dma John.C.Harrison
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg, Riley Andrews

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Debug output assumes all sync points are built on top of Android sync points
and when we start creating them from dma-fences will NULL ptr deref unless
taught about this.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: devel@driverdev.osuosl.org
Cc: Riley Andrews <riandrews@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
---
 drivers/staging/android/sync_debug.c | 42 +++++++++++++++++++-----------------
 1 file changed, 22 insertions(+), 20 deletions(-)

diff --git a/drivers/staging/android/sync_debug.c b/drivers/staging/android/sync_debug.c
index 91ed2c4..f45d13c 100644
--- a/drivers/staging/android/sync_debug.c
+++ b/drivers/staging/android/sync_debug.c
@@ -82,36 +82,42 @@ static const char *sync_status_str(int status)
 	return "error";
 }
 
-static void sync_print_pt(struct seq_file *s, struct sync_pt *pt, bool fence)
+static void sync_print_pt(struct seq_file *s, struct fence *pt, bool fence)
 {
 	int status = 1;
-	struct sync_timeline *parent = sync_pt_parent(pt);
 
-	if (fence_is_signaled_locked(&pt->base))
-		status = pt->base.status;
+	if (fence_is_signaled_locked(pt))
+		status = pt->status;
 
 	seq_printf(s, "  %s%spt %s",
-		   fence ? parent->name : "",
+		   fence && pt->ops->get_timeline_name ?
+		   pt->ops->get_timeline_name(pt) : "",
 		   fence ? "_" : "",
 		   sync_status_str(status));
 
 	if (status <= 0) {
 		struct timespec64 ts64 =
-			ktime_to_timespec64(pt->base.timestamp);
+			ktime_to_timespec64(pt->timestamp);
 
 		seq_printf(s, "@%lld.%09ld", (s64)ts64.tv_sec, ts64.tv_nsec);
 	}
 
-	if (parent->ops->timeline_value_str &&
-	    parent->ops->pt_value_str) {
+	if ((!fence || pt->ops->timeline_value_str) &&
+	    pt->ops->fence_value_str) {
 		char value[64];
+		bool success;
 
-		parent->ops->pt_value_str(pt, value, sizeof(value));
-		seq_printf(s, ": %s", value);
-		if (fence) {
-			parent->ops->timeline_value_str(parent, value,
-						    sizeof(value));
-			seq_printf(s, " / %s", value);
+		pt->ops->fence_value_str(pt, value, sizeof(value));
+		success = strlen(value);
+
+		if (success)
+			seq_printf(s, ": %s", value);
+
+		if (success && fence) {
+			pt->ops->timeline_value_str(pt, value, sizeof(value));
+
+			if (strlen(value))
+				seq_printf(s, " / %s", value);
 		}
 	}
 
@@ -138,7 +144,7 @@ static void sync_print_obj(struct seq_file *s, struct sync_timeline *obj)
 	list_for_each(pos, &obj->child_list_head) {
 		struct sync_pt *pt =
 			container_of(pos, struct sync_pt, child_list);
-		sync_print_pt(s, pt, false);
+		sync_print_pt(s, &pt->base, false);
 	}
 	spin_unlock_irqrestore(&obj->child_list_lock, flags);
 }
@@ -153,11 +159,7 @@ static void sync_print_fence(struct seq_file *s, struct sync_fence *fence)
 		   sync_status_str(atomic_read(&fence->status)));
 
 	for (i = 0; i < fence->num_fences; ++i) {
-		struct sync_pt *pt =
-			container_of(fence->cbs[i].sync_pt,
-				     struct sync_pt, base);
-
-		sync_print_pt(s, pt, true);
+		sync_print_pt(s, fence->cbs[i].sync_pt, true);
 	}
 
 	spin_lock_irqsave(&fence->wq.lock, flags);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 2/9] android: add sync_fence_create_dma
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
  2015-07-17 14:31 ` [RFC 1/9] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-17 14:31 ` [RFC 3/9] drm/i915: Convert requests to use struct fence John.C.Harrison
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg,
	Riley Andrews, Maarten Lankhorst

From: Maarten Lankhorst <maarten.lankhorst@canonical.com>

This allows users of dma fences to create a android fence.

v2: Added kerneldoc. (Tvrtko Ursulin).

Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Cc: Daniel Vetter <daniel@ffwll.ch>
Cc: Jesse Barnes <jbarnes@virtuousgeek.org>
Cc: devel@driverdev.osuosl.org
Cc: Riley Andrews <riandrews@android.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Arve Hjønnevåg <arve@android.com>
---
 drivers/staging/android/sync.c | 13 +++++++++----
 drivers/staging/android/sync.h | 12 +++++++++++-
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/drivers/staging/android/sync.c b/drivers/staging/android/sync.c
index f83e00c..7f0e919 100644
--- a/drivers/staging/android/sync.c
+++ b/drivers/staging/android/sync.c
@@ -188,7 +188,7 @@ static void fence_check_cb_func(struct fence *f, struct fence_cb *cb)
 }
 
 /* TODO: implement a create which takes more that one sync_pt */
-struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt)
 {
 	struct sync_fence *fence;
 
@@ -199,16 +199,21 @@ struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
 	fence->num_fences = 1;
 	atomic_set(&fence->status, 1);
 
-	fence->cbs[0].sync_pt = &pt->base;
+	fence->cbs[0].sync_pt = pt;
 	fence->cbs[0].fence = fence;
-	if (fence_add_callback(&pt->base, &fence->cbs[0].cb,
-			       fence_check_cb_func))
+	if (fence_add_callback(pt, &fence->cbs[0].cb, fence_check_cb_func))
 		atomic_dec(&fence->status);
 
 	sync_fence_debug_add(fence);
 
 	return fence;
 }
+EXPORT_SYMBOL(sync_fence_create_dma);
+
+struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt)
+{
+	return sync_fence_create_dma(name, &pt->base);
+}
 EXPORT_SYMBOL(sync_fence_create);
 
 struct sync_fence *sync_fence_fdget(int fd)
diff --git a/drivers/staging/android/sync.h b/drivers/staging/android/sync.h
index a21b79f..0f1299e 100644
--- a/drivers/staging/android/sync.h
+++ b/drivers/staging/android/sync.h
@@ -250,10 +250,20 @@ void sync_pt_free(struct sync_pt *pt);
  * @pt:		sync_pt to add to the fence
  *
  * Creates a fence containg @pt.  Once this is called, the fence takes
- * ownership of @pt.
+ * a reference on @pt.
  */
 struct sync_fence *sync_fence_create(const char *name, struct sync_pt *pt);
 
+/**
+ * sync_fence_create_dma() - creates a sync fence from dma-fence
+ * @name:	name of fence to create
+ * @pt:	dma-fence to add to the fence
+ *
+ * Creates a fence containg @pt.  Once this is called, the fence takes
+ * a reference on @pt.
+ */
+struct sync_fence *sync_fence_create_dma(const char *name, struct fence *pt);
+
 /*
  * API for sync_fence consumers
  */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
  2015-07-17 14:31 ` [RFC 1/9] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
  2015-07-17 14:31 ` [RFC 2/9] android: add sync_fence_create_dma John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-21  7:05   ` Daniel Vetter
                     ` (2 more replies)
  2015-07-17 14:31 ` [RFC 4/9] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (5 subsequent siblings)
  8 siblings, 3 replies; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is intended
to keep track of work that is executed on hardware. I.e. it solves the basic
problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
request structure does quite a lot more than simply track the execution progress
so is very definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain all the
advantages that provides.

This patch makes the first step of integrating a struct fence into the request.
It replaces the explicit reference count with that of the fence. It also
replaces the 'is completed' test with the fence's equivalent. Currently, that
simply chains on to the original request implementation. A future patch will
improve this.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
 drivers/gpu/drm/i915/i915_gem.c         | 58 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 5 files changed, 80 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index cf6761c..79d346c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -50,6 +50,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/**
+	 * Underlying object for implementing the signal/wait stuff.
+	 * NB: Never call fence_later() or return this fence object to user
+	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+	 * etc., there is no guarantee at all about the validity or
+	 * sequentiality of the fence's seqno! It is also unsafe to let
+	 * anything outside of the i915 driver get hold of the fence object
+	 * as the clean up when decrementing the reference count requires
+	 * holding the driver mutex lock.
+	 */
+	struct fence fence;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
@@ -2255,7 +2272,7 @@ static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void
@@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
 		return;
 
 	dev = req->ring->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
 		mutex_unlock(&dev->struct_mutex);
 }
 
@@ -2284,12 +2301,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	u32 seqno;
-
-	BUG_ON(req == NULL);
-
-	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
-	return i915_seqno_passed(seqno, req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index d9f2701..888bb72 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
 	struct intel_context *ctx = req->ctx;
 
+	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
+
 	if (req->file_priv)
 		i915_gem_request_remove_from_client(req);
 
@@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915_request";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	return req->ring->name;
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	BUG_ON(req == NULL);
+
+	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+};
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2658,7 +2701,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->ring = ring;
 	req->ctx  = ctx;
@@ -2673,6 +2715,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -5021,7 +5065,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j;
+	int ret, i, j, fence_base;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -5073,12 +5117,16 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 	}
 
+	fence_base = fence_context_alloc(I915_NUM_RINGS);
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
+		ring->fence_context = fence_base + i;
+
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9faad82..ee4aecd 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1808,6 +1808,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 177f7ed..d1ced30 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2040,6 +2040,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 2e85fda..a4b0545 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -346,6 +346,9 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	unsigned fence_context;
+	spinlock_t fence_lock;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 4/9] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2015-07-17 14:31 ` [RFC 3/9] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-17 14:31 ` [RFC 5/9] drm/i915: Add per context timelines to fence object John.C.Harrison
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means that the
lazy coherency flag is no longer used. This can now be removed to simplify the
interface.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 18 +++++++++---------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 14 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index bc817da..b9a92fe 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -602,7 +602,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   ring->get_seqno(ring, true),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 79d346c..0c7df46 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2239,8 +2239,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 888bb72..3970250 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1170,7 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 	timeout = jiffies + 1;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return 0;
 
 		if (time_after_eq(jiffies, timeout))
@@ -1178,7 +1178,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 		cpu_relax_lowlatency();
 	}
-	if (i915_gem_request_completed(req, false))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	return -EAGAIN;
@@ -1222,7 +1222,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = timeout ?
@@ -1262,7 +1262,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2759,7 +2759,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2902,7 +2902,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2926,7 +2926,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_completed(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
@@ -3032,7 +3032,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3144,7 +3144,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 88dca8d..93ac43c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11342,7 +11342,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 5eeddc9..cb08d9e 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7313,7 +7313,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -7329,7 +7329,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 5/9] drm/i915: Add per context timelines to fence object
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
                   ` (3 preceding siblings ...)
  2015-07-17 14:31 ` [RFC 4/9] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-23 13:50   ` Tvrtko Ursulin
  2015-07-17 14:31 ` [RFC 6/9] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The fence object used inside the request structure requires a sequence number.
Although this is not used by the i915 driver itself, it could potentially be
used by non-i915 code if the fence is passed outside of the driver. This is the
intention as it allows external kernel drivers and user applications to wait on
batch buffer completion asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things happening
with the seqno, this patch adds in a per context timeline that can provide a
guaranteed in-order seqno value for the fence. This is safe because the
scheduler will not re-order batch buffers within a context - they are considered
to be mutually dependent.

[new patch in series]

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 25 ++++++++----
 drivers/gpu/drm/i915/i915_gem.c         | 69 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++-
 drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 5 files changed, 103 insertions(+), 15 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 0c7df46..88a4746 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -840,6 +840,15 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_fence_timeline {
+	unsigned    fence_context;
+	uint32_t    context;
+	uint32_t    next;
+
+	struct intel_context *ctx;
+	struct intel_engine_cs *ring;
+};
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -885,6 +894,7 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
+		struct i915_fence_timeline fence_timeline;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
@@ -2153,13 +2163,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/**
 	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never call fence_later() or return this fence object to user
-	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
-	 * etc., there is no guarantee at all about the validity or
-	 * sequentiality of the fence's seqno! It is also unsafe to let
-	 * anything outside of the i915 driver get hold of the fence object
-	 * as the clean up when decrementing the reference count requires
-	 * holding the driver mutex lock.
+	 * NB: Never return this fence object to user land! It is unsafe to
+	 * let anything outside of the i915 driver get hold of the fence
+	 * object as the clean up when decrementing the reference count
+	 * requires holding the driver mutex lock.
 	 */
 	struct fence fence;
 
@@ -2239,6 +2246,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3970250..af79716 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2671,6 +2671,25 @@ static bool i915_gem_request_is_completed(struct fence *req_fence)
 	return i915_seqno_passed(seqno, req->seqno);
 }
 
+static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(fence, typeof(*req), fence);
+
+	/* Last signalled timeline value ??? */
+	snprintf(str, size, "? [%d]"/*, tl->value*/, req->ring->get_seqno(req->ring, true));
+}
+
+static void i915_fence_value_str(struct fence *fence, char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(fence, typeof(*req), fence);
+
+	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
+}
+
 static const struct fence_ops i915_gem_request_fops = {
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
@@ -2678,8 +2697,48 @@ static const struct fence_ops i915_gem_request_fops = {
 	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_free,
+	.fence_value_str	= i915_fence_value_str,
+	.timeline_value_str	= i915_fence_timeline_value_str,
 };
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring)
+{
+	struct i915_fence_timeline *timeline;
+
+	timeline = &ctx->engine[ring->id].fence_timeline;
+
+	if (timeline->ring)
+		return 0;
+
+	timeline->fence_context = fence_context_alloc(1);
+
+	/*
+	 * Start the timeline from seqno 0 as this is a special value
+	 * that is reserved for invalid sync points.
+	 */
+	timeline->next       = 1;
+	timeline->ctx        = ctx;
+	timeline->ring       = ring;
+
+	return 0;
+}
+
+static uint32_t i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+	uint32_t seqno;
+
+	seqno = timeline->next;
+
+	/* Reserve zero for invalid */
+	if (++timeline->next == 0 ) {
+		timeline->next = 1;
+	}
+
+	return seqno;
+}
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2715,7 +2774,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
-	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
+	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
+		   ctx->engine[ring->id].fence_timeline.fence_context,
+		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
 
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
@@ -5065,7 +5126,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j, fence_base;
+	int ret, i, j;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -5117,16 +5178,12 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 	}
 
-	fence_base = fence_context_alloc(I915_NUM_RINGS);
-
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
-		ring->fence_context = fence_base + i;
-
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index b77a8f7..7eb8694 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -242,7 +242,7 @@ i915_gem_create_context(struct drm_device *dev,
 {
 	const bool is_global_default_ctx = file_priv == NULL;
 	struct intel_context *ctx;
-	int ret = 0;
+	int i, ret = 0;
 
 	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
 
@@ -250,6 +250,19 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (!i915.enable_execlists) {
+		struct intel_engine_cs *ring;
+
+		/* Create a per context timeline for fences */
+		for_each_ring(ring, to_i915(dev), i) {
+			ret = i915_create_fence_timeline(dev, ctx, ring);
+			if (ret) {
+				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n", ring->name, ctx);
+				goto err_destroy;
+			}
+		}
+	}
+
 	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ee4aecd..8f255de 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2376,6 +2376,14 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
 		goto error;
 	}
 
+	/* Create a per context timeline for fences */
+	ret = i915_create_fence_timeline(dev, ctx, ring);
+	if (ret) {
+		DRM_ERROR("Fence timeline creation failed for ring %s, ctx %p\n",
+			  ring->name, ctx);
+		goto error;
+	}
+
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a4b0545..e2eebc0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -347,7 +347,6 @@ struct  intel_engine_cs {
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
-	unsigned fence_context;
 	spinlock_t fence_lock;
 };
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 6/9] drm/i915: Delay the freeing of requests until retire time
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
                   ` (4 preceding siblings ...)
  2015-07-17 14:31 ` [RFC 5/9] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-23 14:25   ` Tvrtko Ursulin
  2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The request structure is reference counted. When the count reached
zero, the request was immediately freed and all associated objects
were unrefereced/unallocated. This meant that the driver mutex lock
must be held at the point where the count reaches zero. This was fine
while all references were held internally to the driver. However, the
plan is to allow the underlying fence object (and hence the request
itself) to be returned to other drivers and to userland. External
users cannot be expected to acquire a driver private mutex lock.

Rather than attempt to disentangle the request structure from the
driver mutex lock, the decsion was to defer the free code until a
later (safer) point. Hence this patch changes the unreference callback
to merely move the request onto a delayed free list. The driver's
retire worker thread will then process the list and actually call the
free function on the requests.

[new patch in series]

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 22 +++---------------
 drivers/gpu/drm/i915/i915_gem.c         | 41 +++++++++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
 drivers/gpu/drm/i915/intel_pm.c         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
 drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
 7 files changed, 50 insertions(+), 25 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 88a4746..61c3db2 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2161,14 +2161,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	/**
-	 * Underlying object for implementing the signal/wait stuff.
-	 * NB: Never return this fence object to user land! It is unsafe to
-	 * let anything outside of the i915 driver get hold of the fence
-	 * object as the clean up when decrementing the reference count
-	 * requires holding the driver mutex lock.
-	 */
+	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head delay_free_list;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2281,21 +2276,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	fence_put(&req->fence);
-}
-
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
 	if (!req)
 		return;
 
-	dev = req->ring->dev;
-	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index af79716..482835a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2616,10 +2616,27 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-static void i915_gem_request_free(struct fence *req_fence)
+static void i915_gem_request_release(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+	struct intel_engine_cs *ring = req->ring;
+	struct drm_i915_private *dev_priv = to_i915(ring->dev);
+	unsigned long flags;
+
+	/*
+	 * Need to add the request to a deferred dereference list to be
+	 * processed at a mutex lock safe time.
+	 */
+	spin_lock_irqsave(&ring->delayed_free_lock, flags);
+	list_add_tail(&req->delay_free_list, &ring->delayed_free_list);
+	spin_unlock_irqrestore(&ring->delayed_free_lock, flags);
+
+	queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
+}
+
+static void i915_gem_request_free(struct drm_i915_gem_request *req)
+{
 	struct intel_context *ctx = req->ctx;
 
 	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
@@ -2696,7 +2713,7 @@ static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
 	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
-	.release		= i915_gem_request_free,
+	.release		= i915_gem_request_release,
 	.fence_value_str	= i915_fence_value_str,
 	.timeline_value_str	= i915_fence_timeline_value_str,
 };
@@ -2992,6 +3009,21 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	while (!list_empty(&ring->delayed_free_list)) {
+		struct drm_i915_gem_request *request;
+		unsigned long flags;
+
+		request = list_first_entry(&ring->delayed_free_list,
+					   struct drm_i915_gem_request,
+					   delay_free_list);
+
+		spin_lock_irqsave(&ring->delayed_free_lock, flags);
+		list_del(&request->delay_free_list);
+		spin_unlock_irqrestore(&ring->delayed_free_lock, flags);
+
+		i915_gem_request_free(request);
+	}
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
@@ -3182,7 +3214,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], reset_counter, true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  file->driver_priv);
-		i915_gem_request_unreference__unlocked(req[i]);
+		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
 
@@ -4425,7 +4457,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
-	i915_gem_request_unreference__unlocked(target);
+	i915_gem_request_unreference(target);
 
 	return ret;
 }
@@ -5313,6 +5345,7 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 }
 
 void i915_init_vm(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 93ac43c..59541ad 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11289,7 +11289,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 
 	intel_do_mmio_flip(mmio_flip->crtc);
 
-	i915_gem_request_unreference__unlocked(mmio_flip->req);
+	i915_gem_request_unreference(mmio_flip->req);
 	kfree(mmio_flip);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 8f255de..9ee80f5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1808,7 +1808,9 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
+	spin_lock_init(&ring->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index cb08d9e..52b1c37 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7317,7 +7317,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(req);
+	i915_gem_request_unreference(req);
 	kfree(boost);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index d1ced30..11494a3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2040,7 +2040,9 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
+	spin_lock_init(&ring->delayed_free_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index e2eebc0..68173a3 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -299,6 +299,10 @@ struct  intel_engine_cs {
 	 */
 	u32 last_submitted_seqno;
 
+	/* deferred free list to allow unreferencing requests outside the driver */
+	struct list_head delayed_free_list;
+	spinlock_t delayed_free_lock;
+
 	bool gpu_caches_dirty;
 
 	wait_queue_head_t irq_queue;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
                   ` (5 preceding siblings ...)
  2015-07-17 14:31 ` [RFC 6/9] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-20  9:09   ` Maarten Lankhorst
                     ` (3 more replies)
  2015-07-17 14:31 ` [RFC 8/9] drm/i915: Updated request structure tracing John.C.Harrison
  2015-07-17 14:31 ` [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
  8 siblings, 4 replies; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status should be
set on demand rather than polled. That is, there should not be a need for a
'signaled' function to be called everytime the status is queried. Instead,
'something' should be done to enable a signal callback from the hardware which
will update the state directly. In the case of requests, this is the seqno
update interrupt. The idea is that this callback will only be enabled on demand
when something actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback scheme.
Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke me' list
when a new seqno pops out and signals any matching fence/request. The fence is
then removed from the list so the entire request stack does not need to be
scanned every time. Note that the fence is added to the list before the commands
to generate the seqno interrupt are added to the ring. Thus the sequence is
guaranteed to be race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
called). Thus there is still a potential race when enabling the interrupt as the
request may already have completed. However, this is simply solved by calling
the interrupt processing code immediately after enabling the interrupt and
thereby checking for already completed requests.

Lastly, the ring clean up code has the possibility to cancel outstanding
requests (e.g. because TDR has reset the ring). These requests will never get
signalled and so must be removed from the signal list manually. This is done by
setting a 'cancelled' flag and then calling the regular notify/retire code path
rather than attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the cancellation
request might occur after/during the completion interrupt actually arriving.

v2: Updated to take advantage of the request unreference no longer requiring the
mutex lock.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   8 ++
 drivers/gpu/drm/i915/i915_gem.c         | 132 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 6 files changed, 136 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 61c3db2..d7f1aa5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
 struct drm_i915_gem_request {
 	/** Underlying object for implementing the signal/wait stuff. */
 	struct fence fence;
+	struct list_head signal_list;
+	struct list_head unsignal_list;
 	struct list_head delay_free_list;
+	bool cancelled;
+	bool irq_enabled;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+void i915_gem_request_submit(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req);
+void i915_gem_request_notify(struct intel_engine_cs *ring);
+
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
 			       struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 482835a..7c589a9 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1222,6 +1222,11 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	i915_gem_request_enable_interrupt(req);
+
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/* In case the request is still in the signal pending list */
+	if (!list_empty(&request->signal_list))
+		request->cancelled = true;
+
 	i915_gem_request_unreference(request);
 }
 
@@ -2534,6 +2543,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = ring->emit_request(request);
 	else {
@@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
 		i915_gem_context_unreference(ctx);
 	}
 
+	if (req->irq_enabled)
+		req->ring->irq_put(req->ring);
+
 	kmem_cache_free(req->i915->requests, req);
 }
 
@@ -2668,24 +2686,105 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 	return req->ring->name;
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request has been submitted to the hardware so add the fence to the
+ * list of signalable fences.
+ *
+ * NB: This does not enable interrupts yet. That only occurs on demand when
+ * the request is actually waited on. However, adding it to the list early
+ * ensures that there is no race condition where the interrupt could pop
+ * out prematurely and thus be completely lost. The race is merely that the
+ * interrupt must be manually checked for after being enabled.
+ */
+void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	fence_enable_sw_signaling(&req->fence);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req)
+{
+	if (req->irq_enabled)
+		return;
+
+	WARN_ON(!req->ring->irq_get(req->ring));
+	req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->ring);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	i915_gem_request_reference(req);
+	WARN_ON(!list_empty(&req->signal_list));
+	list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
+
+	/*
+	 * Note that signalling is always enabled for every request before
+	 * that request is submitted to the hardware. Therefore there is
+	 * no race condition whereby the signal could pop out before the
+	 * request has been added to the list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 *
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
+	LIST_HEAD(free_list);
 
-	BUG_ON(req == NULL);
+	if (list_empty(&ring->fence_signal_list))
+		return;
+
+	seqno = ring->get_seqno(ring, false);
+
+	spin_lock_irqsave(&ring->fence_lock, flags);
+	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				continue;
 
-	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+			fence_signal_locked(&req->fence);
+		}
+
+		list_del_init(&req->signal_list);
+		if (req->irq_enabled) {
+			req->ring->irq_put(req->ring);
+			req->irq_enabled = false;
+		}
 
-	return i915_seqno_passed(seqno, req->seqno);
+		/* Can't unreference here because that might grab fence_lock */
+		list_add_tail(&req->unsignal_list, &free_list);
+	}
+	spin_unlock_irqrestore(&ring->fence_lock, flags);
+
+	/* It should now be safe to actually free the requests */
+	while (!list_empty(&free_list)) {
+		req = list_first_entry(&free_list,
+				       struct drm_i915_gem_request, unsignal_list);
+		list_del(&req->unsignal_list);
+
+		i915_gem_request_unreference(req);
+	}
 }
 
 static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
@@ -2711,7 +2810,6 @@ static const struct fence_ops i915_gem_request_fops = {
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_release,
 	.fence_value_str	= i915_fence_value_str,
@@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	INIT_LIST_HEAD(&req->signal_list);
 	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
 		   ctx->engine[ring->id].fence_timeline.fence_context,
 		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
@@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 
 		i915_gem_request_retire(request);
 	}
+
+	/*
+	 * Make sure any requests that were on the signal pending list get
+	 * cleaned up.
+	 */
+	i915_gem_request_notify(ring);
+	i915_gem_retire_requests_ring(ring);
 }
 
 void i915_gem_restore_fences(struct drm_device *dev)
@@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(ring);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d87f173..e446509 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs *ring)
 
 	trace_i915_gem_request_notify(ring);
 
+	i915_gem_request_notify(ring);
+
 	wake_up_all(&ring->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 9ee80f5..18dbd5c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1808,6 +1808,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
 	spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 11494a3..83a5254 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2040,6 +2040,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
 	INIT_LIST_HEAD(&ring->delayed_free_list);
 	spin_lock_init(&ring->fence_lock);
 	spin_lock_init(&ring->delayed_free_lock);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 68173a3..2e68b73 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -352,6 +352,7 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 8/9] drm/i915: Updated request structure tracing
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
                   ` (6 preceding siblings ...)
  2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-17 14:31 ` [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
  8 siblings, 0 replies; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is signaled
as complete. Also moved the notify event from the IRQ handler code to inside the
notify function itself.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   | 3 +++
 drivers/gpu/drm/i915/i915_irq.c   | 2 --
 drivers/gpu/drm/i915/i915_trace.h | 7 +++++--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7c589a9..3f20087 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2752,6 +2752,8 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 	u32 seqno;
 	LIST_HEAD(free_list);
 
+	trace_i915_gem_request_notify(ring);
+
 	if (list_empty(&ring->fence_signal_list))
 		return;
 
@@ -2764,6 +2766,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 				continue;
 
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
 		}
 
 		list_del_init(&req->signal_list);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index e446509..d4500cc 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -851,8 +851,6 @@ static void notify_ring(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
-	trace_i915_gem_request_notify(ring);
-
 	i915_gem_request_notify(ring);
 
 	wake_up_all(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 2f34c47..f455194 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -532,16 +532,19 @@ TRACE_EVENT(i915_gem_request_notify,
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
 			   __entry->seqno = ring->get_seqno(ring, false);
+			   __entry->is_empty = list_empty(&ring->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL
  2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
                   ` (7 preceding siblings ...)
  2015-07-17 14:31 ` [RFC 8/9] drm/i915: Updated request structure tracing John.C.Harrison
@ 2015-07-17 14:31 ` John.C.Harrison
  2015-07-27 13:00   ` Tvrtko Ursulin
  8 siblings, 1 reply; 38+ messages in thread
From: John.C.Harrison @ 2015-07-17 14:31 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Various projects desire a mechanism for managing dependencies between
work items asynchronously. This can also include work items across
complete different and independent systems. For example, an
application wants to retreive a frame from a video in device,
using it for rendering on a GPU then send it to the video out device
for display all without having to stall waiting for completion along
the way. The sync framework allows this. It encapsulates
synchronisation events in file descriptors. The application can
request a sync point for the completion of each piece of work. Drivers
should also take sync points in with each new work request and not
schedule the work to start until the sync has been signalled.

This patch adds sync framework support to the exec buffer IOCTL. A
sync point can be passed in to stall execution of the batch buffer
until signalled. And a sync point can be returned after each batch
buffer submission which will be signalled upon that batch buffer's
completion.

At present, the input sync point is simply waited on synchronously
inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
this will be handled asynchronously inside the scheduler and the IOCTL
can return without having to wait.

Note also that the scheduler will re-order the execution of batch
buffers, e.g. because a batch buffer is stalled on a sync point and
cannot be submitted yet but other, independent, batch buffers are
being presented to the driver. This means that the timeline within the
sync points returned cannot be global to the engine. Instead they must
be kept per context per engine (the scheduler may not re-order batches
within a context). Hence the timeline cannot be based on the existing
seqno values but must be a new implementation.

This patch is a port of work by several people that has been pulled
across from Android. It has been updated several times across several
patches. Rather than attempt to port each individual patch, this
version is the finished product as a single patch. The various
contributors/authors along the way (in addition to myself) were:
  Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
  Tvrtko Ursulin <tvrtko.ursulin@intel.com>
  Michel Thierry <michel.thierry@intel.com>
  Arun Siluvery <arun.siluvery@linux.intel.com>

[new patch in series]

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h            |  6 ++
 drivers/gpu/drm/i915/i915_gem.c            | 84 ++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_execbuffer.c | 90 ++++++++++++++++++++++++++++--
 include/uapi/drm/i915_drm.h                | 16 +++++-
 4 files changed, 188 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d7f1aa5..cf6b7cd 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2168,6 +2168,7 @@ struct drm_i915_gem_request {
 	struct list_head delay_free_list;
 	bool cancelled;
 	bool irq_enabled;
+	bool fence_external;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2252,6 +2253,11 @@ void i915_gem_request_notify(struct intel_engine_cs *ring);
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
 			       struct intel_engine_cs *ring);
+#ifdef CONFIG_SYNC
+struct sync_fence;
+int i915_create_sync_fence(struct drm_i915_gem_request *req, int *fence_fd);
+bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *fence);
+#endif
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 3f20087..de93422 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -37,6 +37,9 @@
 #include <linux/swap.h>
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
+#ifdef CONFIG_SYNC
+#include <../drivers/staging/android/sync.h>
+#endif
 
 #define RQ_BUG_ON(expr)
 
@@ -2549,6 +2552,15 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	i915_gem_request_submit(request);
 
+	/*
+	 * If an external sync point has been requested for this request then
+	 * it can be waited on without the driver's knowledge, i.e. without
+	 * calling __i915_wait_request(). Thus interrupts must be enabled
+	 * from the start rather than only on demand.
+	 */
+	if (request->fence_external)
+		i915_gem_request_enable_interrupt(request);
+
 	if (i915.enable_execlists)
 		ret = ring->emit_request(request);
 	else {
@@ -2857,6 +2869,78 @@ static uint32_t i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
 	return seqno;
 }
 
+#ifdef CONFIG_SYNC
+int i915_create_sync_fence(struct drm_i915_gem_request *req, int *fence_fd)
+{
+	char ring_name[] = "i915_ring0";
+	struct sync_fence *sync_fence;
+	int fd;
+
+	fd = get_unused_fd_flags(O_CLOEXEC);
+	if (fd < 0) {
+		DRM_DEBUG("No available file descriptors!\n");
+		*fence_fd = -1;
+		return fd;
+	}
+
+	ring_name[9] += req->ring->id;
+	sync_fence = sync_fence_create_dma(ring_name, &req->fence);
+	if (!sync_fence) {
+		put_unused_fd(fd);
+		*fence_fd = -1;
+		return -ENOMEM;
+	}
+
+	sync_fence_install(sync_fence, fd);
+	*fence_fd = fd;
+
+	// Necessary??? Who does the put???
+	fence_get(&req->fence);
+
+	req->fence_external = true;
+
+	return 0;
+}
+
+bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *sync_fence)
+{
+	struct fence *dma_fence;
+	struct drm_i915_gem_request *req;
+	bool ignore;
+	int i;
+
+	if (atomic_read(&sync_fence->status) != 0)
+		return true;
+
+	ignore = true;
+	for(i = 0; i < sync_fence->num_fences; i++) {
+		dma_fence = sync_fence->cbs[i].sync_pt;
+
+		/* No need to worry about dead points: */
+		if (fence_is_signaled(dma_fence))
+			continue;
+
+		/* Can't ignore other people's points: */
+		if(dma_fence->ops != &i915_gem_request_fops) {
+			ignore = false;
+			break;
+		}
+
+		req = container_of(dma_fence, typeof(*req), fence);
+
+		/* Can't ignore points on other rings: */
+		if (req->ring != ring) {
+			ignore = false;
+			break;
+		}
+
+		/* Same ring means guaranteed to be in order so ignore it. */
+	}
+
+	return ignore;
+}
+#endif
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
index 923a3c4..b1a1659 100644
--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
@@ -26,6 +26,7 @@
  *
  */
 
+#include <linux/syscalls.h>
 #include <drm/drmP.h>
 #include <drm/i915_drm.h>
 #include "i915_drv.h"
@@ -33,6 +34,9 @@
 #include "intel_drv.h"
 #include <linux/dma_remapping.h>
 #include <linux/uaccess.h>
+#ifdef CONFIG_SYNC
+#include <../drivers/staging/android/sync.h>
+#endif
 
 #define  __EXEC_OBJECT_HAS_PIN (1<<31)
 #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
@@ -1403,6 +1407,35 @@ eb_get_batch(struct eb_vmas *eb)
 	return vma->obj;
 }
 
+#ifdef CONFIG_SYNC
+static int i915_early_fence_wait(struct intel_engine_cs *ring, int fence_fd)
+{
+	struct sync_fence *fence;
+	int ret = 0;
+
+	if (fence_fd < 0) {
+		DRM_ERROR("Invalid wait fence fd %d on ring %d\n", fence_fd,
+			  (int) ring->id);
+		return 1;
+	}
+
+	fence = sync_fence_fdget(fence_fd);
+	if (fence == NULL) {
+		DRM_ERROR("Invalid wait fence %d on ring %d\n", fence_fd,
+			  (int) ring->id);
+		return 1;
+	}
+
+	if (atomic_read(&fence->status) == 0) {
+		if (!i915_safe_to_ignore_fence(ring, fence))
+			ret = sync_fence_wait(fence, 1000);
+	}
+
+	sync_fence_put(fence);
+	return ret;
+}
+#endif
+
 static int
 i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		       struct drm_file *file,
@@ -1422,6 +1455,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	u32 dispatch_flags;
 	int ret;
 	bool need_relocs;
+	int fd_fence_complete = -1;
+#ifdef CONFIG_SYNC
+	int fd_fence_wait = lower_32_bits(args->rsvd2);
+#endif
+
+	/*
+	 * Make sure an broken fence handle is not returned no matter
+	 * how early an error might be hit. Note that rsvd2 has to be
+	 * saved away first because it is also an input parameter!
+	 */
+	if (args->flags & I915_EXEC_CREATE_FENCE)
+		args->rsvd2 = (__u64) -1;
 
 	if (!i915_gem_check_execbuffer(args))
 		return -EINVAL;
@@ -1505,6 +1550,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 		dispatch_flags |= I915_DISPATCH_RS;
 	}
 
+#ifdef CONFIG_SYNC
+	/*
+	 * Without a GPU scheduler, any fence waits must be done up front.
+	 */
+	if (args->flags & I915_EXEC_WAIT_FENCE) {
+		ret = i915_early_fence_wait(ring, fd_fence_wait);
+		if (ret < 0)
+			return ret;
+
+		args->flags &= ~I915_EXEC_WAIT_FENCE;
+	}
+#endif
+
 	intel_runtime_pm_get(dev_priv);
 
 	ret = i915_mutex_lock_interruptible(dev);
@@ -1652,6 +1710,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
 	params->batch_obj               = batch_obj;
 	params->ctx                     = ctx;
 
+#ifdef CONFIG_SYNC
+	if (args->flags & I915_EXEC_CREATE_FENCE) {
+		/*
+		 * Caller has requested a sync fence.
+		 * User interrupts will be enabled to make sure that
+		 * the timeline is signalled on completion.
+		 */
+		ret = i915_create_sync_fence(params->request,
+					     &fd_fence_complete);
+		if (ret) {
+			DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
+				  ring->id, ctx);
+			args->rsvd2 = (__u64) -1;
+			goto err;
+		}
+
+		/* Return the fence through the rsvd2 field */
+		args->rsvd2 = (__u64) fd_fence_complete;
+	}
+#endif
+
 	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
 
 err_batch_unpin:
@@ -1683,6 +1762,12 @@ pre_mutex_err:
 	/* intel_gpu_busy should also get a ref, so it will free when the device
 	 * is really idle. */
 	intel_runtime_pm_put(dev_priv);
+
+	if (fd_fence_complete != -1) {
+		sys_close(fd_fence_complete);
+		args->rsvd2 = (__u64) -1;
+	}
+
 	return ret;
 }
 
@@ -1788,11 +1873,6 @@ i915_gem_execbuffer2(struct drm_device *dev, void *data,
 		return -EINVAL;
 	}
 
-	if (args->rsvd2 != 0) {
-		DRM_DEBUG("dirty rvsd2 field\n");
-		return -EINVAL;
-	}
-
 	exec2_list = kmalloc(sizeof(*exec2_list)*args->buffer_count,
 			     GFP_TEMPORARY | __GFP_NOWARN | __GFP_NORETRY);
 	if (exec2_list == NULL)
diff --git a/include/uapi/drm/i915_drm.h b/include/uapi/drm/i915_drm.h
index 192027b..9dbf67e 100644
--- a/include/uapi/drm/i915_drm.h
+++ b/include/uapi/drm/i915_drm.h
@@ -250,7 +250,7 @@ typedef struct _drm_i915_sarea {
 #define DRM_IOCTL_I915_HWS_ADDR		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_HWS_ADDR, struct drm_i915_gem_init)
 #define DRM_IOCTL_I915_GEM_INIT		DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_INIT, struct drm_i915_gem_init)
 #define DRM_IOCTL_I915_GEM_EXECBUFFER	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER, struct drm_i915_gem_execbuffer)
-#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
+#define DRM_IOCTL_I915_GEM_EXECBUFFER2	DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_EXECBUFFER2, struct drm_i915_gem_execbuffer2)
 #define DRM_IOCTL_I915_GEM_PIN		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_PIN, struct drm_i915_gem_pin)
 #define DRM_IOCTL_I915_GEM_UNPIN	DRM_IOW(DRM_COMMAND_BASE + DRM_I915_GEM_UNPIN, struct drm_i915_gem_unpin)
 #define DRM_IOCTL_I915_GEM_BUSY		DRM_IOWR(DRM_COMMAND_BASE + DRM_I915_GEM_BUSY, struct drm_i915_gem_busy)
@@ -694,7 +694,7 @@ struct drm_i915_gem_exec_object2 {
 	__u64 flags;
 
 	__u64 rsvd1;
-	__u64 rsvd2;
+	__u64 rsvd2;	/* Used for fence fd */
 };
 
 struct drm_i915_gem_execbuffer2 {
@@ -775,7 +775,17 @@ struct drm_i915_gem_execbuffer2 {
  */
 #define I915_EXEC_RESOURCE_STREAMER     (1<<15)
 
-#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_RESOURCE_STREAMER<<1)
+/** Caller supplies a sync fence fd in the rsvd2 field.
+ * Wait for it to be signalled before starting the work
+ */
+#define I915_EXEC_WAIT_FENCE		(1<<16)
+
+/** Caller wants a sync fence fd for this execbuffer.
+ *  It will be returned in rsvd2
+ */
+#define I915_EXEC_CREATE_FENCE		(1<<17)
+
+#define __I915_EXEC_UNKNOWN_FLAGS -(I915_EXEC_CREATE_FENCE<<1)
 
 #define I915_EXEC_CONTEXT_ID_MASK	(0xffffffff)
 #define i915_execbuffer2_set_context_id(eb2, context) \
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [RFC 1/9] staging/android/sync: Support sync points created from dma-fences
  2015-07-17 14:31 ` [RFC 1/9] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
@ 2015-07-17 14:44   ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-17 14:44 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX
  Cc: devel, Greg Kroah-Hartman, Arve Hjønnevåg, Riley Andrews


On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> Debug output assumes all sync points are built on top of Android sync points
> and when we start creating them from dma-fences will NULL ptr deref unless
> taught about this.

This is Maarten's code, just the patch had a troubled history where it 
got misplaced, forgotten and then resurrected but with the commit 
message lost.

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-07-20  9:09   ` Maarten Lankhorst
  2015-07-21  7:19   ` Daniel Vetter
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 38+ messages in thread
From: Maarten Lankhorst @ 2015-07-20  9:09 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 17-07-15 om 16:31 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
This race will happen on any enable_signaling implementation, just something to be aware of.
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.

I notice in this commit you only clean up requests when the refcount drops to 0.
What resources need to be kept after the fence is signaled? Can't you queue the
delayed work from the function that signals the fence and make sure it holds a
refcount?

I'm curious because if userspace can grab a reference for a fence it might lead
to not freeing resources for a long time..

~Maarten

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-17 14:31 ` [RFC 3/9] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2015-07-21  7:05   ` Daniel Vetter
  2015-07-28 10:01     ` John Harrison
  2015-07-22 14:26   ` Tvrtko Ursulin
  2015-07-22 14:45   ` Tvrtko Ursulin
  2 siblings, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-07-21  7:05 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:31:17PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> There is a construct in the linux kernel called 'struct fence' that is intended
> to keep track of work that is executed on hardware. I.e. it solves the basic
> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
> request structure does quite a lot more than simply track the execution progress
> so is very definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain all the
> advantages that provides.
> 
> This patch makes the first step of integrating a struct fence into the request.
> It replaces the explicit reference count with that of the fence. It also
> replaces the 'is completed' test with the fence's equivalent. Currently, that
> simply chains on to the original request implementation. A future patch will
> improve this.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>  drivers/gpu/drm/i915/i915_gem.c         | 58 ++++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>  5 files changed, 80 insertions(+), 28 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cf6761c..79d346c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -50,6 +50,7 @@
>  #include <linux/intel-iommu.h>
>  #include <linux/kref.h>
>  #include <linux/pm_qos.h>
> +#include <linux/fence.h>
>  
>  /* General customization:
>   */
> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   * initial reference taken using kref_init
>   */
>  struct drm_i915_gem_request {
> -	struct kref ref;
> +	/**
> +	 * Underlying object for implementing the signal/wait stuff.
> +	 * NB: Never call fence_later() or return this fence object to user
> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> +	 * etc., there is no guarantee at all about the validity or
> +	 * sequentiality of the fence's seqno! It is also unsafe to let
> +	 * anything outside of the i915 driver get hold of the fence object
> +	 * as the clean up when decrementing the reference count requires
> +	 * holding the driver mutex lock.
> +	 */

This comment is outdated. Also I'm leaning towards squashing this patch
with the one implementing fences with explicit irq enabling, to avoid
churn and intermediate WARN_ONs. Each patch should be fully functional
without requiring follow-up patches.
-Daniel


> +	struct fence fence;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct intel_context *ctx,
>  			   struct drm_i915_gem_request **req_out);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> +					      bool lazy_coherency)
> +{
> +	return fence_is_signaled(&req->fence);
> +}
> +
>  int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>  				   struct drm_file *file);
>  
> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>  i915_gem_request_reference(struct drm_i915_gem_request *req)
>  {
>  	if (req)
> -		kref_get(&req->ref);
> +		fence_get(&req->fence);
>  	return req;
>  }
>  
> @@ -2255,7 +2272,7 @@ static inline void
>  i915_gem_request_unreference(struct drm_i915_gem_request *req)
>  {
>  	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> -	kref_put(&req->ref, i915_gem_request_free);
> +	fence_put(&req->fence);
>  }
>  
>  static inline void
> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>  		return;
>  
>  	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>  		mutex_unlock(&dev->struct_mutex);
>  }
>  
> @@ -2284,12 +2301,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>  }
>  
>  /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
>   * A command that requires special handling by the command parser.
>   */
>  struct drm_i915_cmd_descriptor {
> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>  	return (int32_t)(seq1 - seq2) >= 0;
>  }
>  
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> -{
> -	u32 seqno;
> -
> -	BUG_ON(req == NULL);
> -
> -	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
> -
> -	return i915_seqno_passed(seqno, req->seqno);
> -}
> -
>  int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>  int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>  int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9f2701..888bb72 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>  	}
>  }
>  
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free(struct fence *req_fence)
>  {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
>  	struct intel_context *ctx = req->ctx;
>  
> +	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> +
>  	if (req->file_priv)
>  		i915_gem_request_remove_from_client(req);
>  
> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>  	kmem_cache_free(req->i915->requests, req);
>  }
>  
> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> +{
> +	return "i915_request";
> +}
> +
> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	return req->ring->name;
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +{
> +	/* Interrupt driven fences are not implemented yet.*/
> +	WARN(true, "This should not be called!");
> +	return true;
> +}
> +
> +static bool i915_gem_request_is_completed(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	u32 seqno;
> +
> +	BUG_ON(req == NULL);
> +
> +	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +
> +	return i915_seqno_passed(seqno, req->seqno);
> +}
> +
> +static const struct fence_ops i915_gem_request_fops = {
> +	.get_driver_name	= i915_gem_request_get_driver_name,
> +	.get_timeline_name	= i915_gem_request_get_timeline_name,
> +	.enable_signaling	= i915_gem_request_enable_signaling,
> +	.signaled		= i915_gem_request_is_completed,
> +	.wait			= fence_default_wait,
> +	.release		= i915_gem_request_free,
> +};
> +
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct intel_context *ctx,
>  			   struct drm_i915_gem_request **req_out)
> @@ -2658,7 +2701,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  	if (ret)
>  		goto err;
>  
> -	kref_init(&req->ref);
>  	req->i915 = dev_priv;
>  	req->ring = ring;
>  	req->ctx  = ctx;
> @@ -2673,6 +2715,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  		goto err;
>  	}
>  
> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
> +
>  	/*
>  	 * Reserve space in the ring buffer for all the commands required to
>  	 * eventually emit this request. This is to guarantee that the
> @@ -5021,7 +5065,7 @@ i915_gem_init_hw(struct drm_device *dev)
>  {
>  	struct drm_i915_private *dev_priv = dev->dev_private;
>  	struct intel_engine_cs *ring;
> -	int ret, i, j;
> +	int ret, i, j, fence_base;
>  
>  	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>  		return -EIO;
> @@ -5073,12 +5117,16 @@ i915_gem_init_hw(struct drm_device *dev)
>  			goto out;
>  	}
>  
> +	fence_base = fence_context_alloc(I915_NUM_RINGS);
> +
>  	/* Now it is safe to go back round and do everything else: */
>  	for_each_ring(ring, dev_priv, i) {
>  		struct drm_i915_gem_request *req;
>  
>  		WARN_ON(!ring->default_context);
>  
> +		ring->fence_context = fence_base + i;
> +
>  		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
>  		if (ret) {
>  			i915_gem_cleanup_ringbuffer(dev);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9faad82..ee4aecd 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1808,6 +1808,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
> +	spin_lock_init(&ring->fence_lock);
>  	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>  	init_waitqueue_head(&ring->irq_queue);
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 177f7ed..d1ced30 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2040,6 +2040,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
>  	INIT_LIST_HEAD(&ring->execlist_queue);
> +	spin_lock_init(&ring->fence_lock);
>  	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>  	ringbuf->size = 32 * PAGE_SIZE;
>  	ringbuf->ring = ring;
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 2e85fda..a4b0545 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -346,6 +346,9 @@ struct  intel_engine_cs {
>  	 * to encode the command length in the header).
>  	 */
>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	unsigned fence_context;
> +	spinlock_t fence_lock;
>  };
>  
>  bool intel_ring_initialized(struct intel_engine_cs *ring);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
  2015-07-20  9:09   ` Maarten Lankhorst
@ 2015-07-21  7:19   ` Daniel Vetter
  2015-07-27 11:33   ` Tvrtko Ursulin
  2015-07-27 13:20   ` Tvrtko Ursulin
  3 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-07-21  7:19 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jul 17, 2015 at 03:31:21PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
> 
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
> 
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
> 
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.
> 
> v2: Updated to take advantage of the request unreference no longer requiring the
> mutex lock.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   8 ++
>  drivers/gpu/drm/i915/i915_gem.c         | 132 +++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>  6 files changed, 136 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 61c3db2..d7f1aa5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>  struct drm_i915_gem_request {
>  	/** Underlying object for implementing the signal/wait stuff. */
>  	struct fence fence;
> +	struct list_head signal_list;
> +	struct list_head unsignal_list;
>  	struct list_head delay_free_list;

From a very quick look a request can only ever be on one of these lists.
It would be clearer to just use one list and list_move to make that
reassignement and changing of ownership clearer.

> +	bool cancelled;
> +	bool irq_enabled;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  			   struct drm_i915_gem_request **req_out);
>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>  
> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req);
> +void i915_gem_request_notify(struct intel_engine_cs *ring);
> +
>  int i915_create_fence_timeline(struct drm_device *dev,
>  			       struct intel_context *ctx,
>  			       struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 482835a..7c589a9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1222,6 +1222,11 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	if (list_empty(&req->list))
>  		return 0;
>  
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	i915_gem_request_enable_interrupt(req);
> +
>  	if (i915_gem_request_completed(req))
>  		return 0;
>  
> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	list_del_init(&request->list);
>  	i915_gem_request_remove_from_client(request);
>  
> +	/* In case the request is still in the signal pending list */
> +	if (!list_empty(&request->signal_list))
> +		request->cancelled = true;
> +
>  	i915_gem_request_unreference(request);
>  }
>  
> @@ -2534,6 +2543,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>  	 */
>  	request->postfix = intel_ring_get_tail(ringbuf);
>  
> +	/*
> +	 * Add the fence to the pending list before emitting the commands to
> +	 * generate a seqno notification interrupt.
> +	 */
> +	i915_gem_request_submit(request);
> +
>  	if (i915.enable_execlists)
>  		ret = ring->emit_request(request);
>  	else {
> @@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
>  		i915_gem_context_unreference(ctx);
>  	}
>  
> +	if (req->irq_enabled)
> +		req->ring->irq_put(req->ring);
> +
>  	kmem_cache_free(req->i915->requests, req);
>  }
>  
> @@ -2668,24 +2686,105 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>  	return req->ring->name;
>  }
>  
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request has been submitted to the hardware so add the fence to the
> + * list of signalable fences.
> + *
> + * NB: This does not enable interrupts yet. That only occurs on demand when
> + * the request is actually waited on. However, adding it to the list early
> + * ensures that there is no race condition where the interrupt could pop
> + * out prematurely and thus be completely lost. The race is merely that the
> + * interrupt must be manually checked for after being enabled.
> + */
> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
>  {
> -	/* Interrupt driven fences are not implemented yet.*/
> -	WARN(true, "This should not be called!");
> -	return true;
> +	fence_enable_sw_signaling(&req->fence);
>  }
>  
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req)
> +{
> +	if (req->irq_enabled)
> +		return;
> +
> +	WARN_ON(!req->ring->irq_get(req->ring));
> +	req->irq_enabled = true;
> +
> +	/*
> +	 * Because the interrupt is only enabled on demand, there is a race
> +	 * where the interrupt can fire before anyone is looking for it. So
> +	 * do an explicit check for missed interrupts.
> +	 */
> +	i915_gem_request_notify(req->ring);
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>  {
>  	struct drm_i915_gem_request *req = container_of(req_fence,
>  						 typeof(*req), fence);
> +
> +	i915_gem_request_reference(req);
> +	WARN_ON(!list_empty(&req->signal_list));
> +	list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
> +
> +	/*
> +	 * Note that signalling is always enabled for every request before
> +	 * that request is submitted to the hardware. Therefore there is
> +	 * no race condition whereby the signal could pop out before the
> +	 * request has been added to the list. Hence no need to check
> +	 * for completion, undo the list add and return false.
> +	 *
> +	 * NB: Interrupts are only enabled on demand. Thus there is still a
> +	 * race where the request could complete before the interrupt has
> +	 * been enabled. Thus care must be taken at that point.
> +	 */
> +
> +	return true;

fence->enable_signalling is the part that should enable timely signalling,
i.e. interrupts. Adding the fence to completion lists for eventual
signalling should be done unconditionally.

Note that struct fence supports irq context callbacks and not just
waiting, so just enabling interrupts in the wait callback is not enough.

fence->wait callback is optional and really just for some additional
tricks when waiting with a process context, like busy-spinning (which we
do). The current i915_wait_request should be moved into that callback.

Also I'd like to move i915 internally over to calling fence functions so
that we can test this code without the android syncpts.
-Daniel

> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>  	u32 seqno;
> +	LIST_HEAD(free_list);
>  
> -	BUG_ON(req == NULL);
> +	if (list_empty(&ring->fence_signal_list))
> +		return;
> +
> +	seqno = ring->get_seqno(ring, false);
> +
> +	spin_lock_irqsave(&ring->fence_lock, flags);
> +	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
> +		if (!req->cancelled) {
> +			if (!i915_seqno_passed(seqno, req->seqno))
> +				continue;
>  
> -	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +			fence_signal_locked(&req->fence);
> +		}
> +
> +		list_del_init(&req->signal_list);
> +		if (req->irq_enabled) {
> +			req->ring->irq_put(req->ring);
> +			req->irq_enabled = false;
> +		}
>  
> -	return i915_seqno_passed(seqno, req->seqno);
> +		/* Can't unreference here because that might grab fence_lock */
> +		list_add_tail(&req->unsignal_list, &free_list);
> +	}
> +	spin_unlock_irqrestore(&ring->fence_lock, flags);
> +
> +	/* It should now be safe to actually free the requests */
> +	while (!list_empty(&free_list)) {
> +		req = list_first_entry(&free_list,
> +				       struct drm_i915_gem_request, unsignal_list);
> +		list_del(&req->unsignal_list);
> +
> +		i915_gem_request_unreference(req);
> +	}
>  }
>  
>  static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
> @@ -2711,7 +2810,6 @@ static const struct fence_ops i915_gem_request_fops = {
>  	.get_driver_name	= i915_gem_request_get_driver_name,
>  	.get_timeline_name	= i915_gem_request_get_timeline_name,
>  	.enable_signaling	= i915_gem_request_enable_signaling,
> -	.signaled		= i915_gem_request_is_completed,
>  	.wait			= fence_default_wait,
>  	.release		= i915_gem_request_release,
>  	.fence_value_str	= i915_fence_value_str,
> @@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>  		goto err;
>  	}
>  
> +	INIT_LIST_HEAD(&req->signal_list);
>  	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>  		   ctx->engine[ring->id].fence_timeline.fence_context,
>  		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
> @@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>  
>  		i915_gem_request_retire(request);
>  	}
> +
> +	/*
> +	 * Make sure any requests that were on the signal pending list get
> +	 * cleaned up.
> +	 */
> +	i915_gem_request_notify(ring);
> +	i915_gem_retire_requests_ring(ring);
>  }
>  
>  void i915_gem_restore_fences(struct drm_device *dev)
> @@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  {
>  	WARN_ON(i915_verify_lists(ring->dev));
>  
> +	/*
> +	 * If no-one has waited on a request recently then interrupts will
> +	 * not have been enabled and thus no requests will ever be marked as
> +	 * completed. So do an interrupt check now.
> +	 */
> +	i915_gem_request_notify(ring);
> +
>  	/* Retire requests first as we use it above for the early return.
>  	 * If we retire requests last, we may use a later seqno and so clear
>  	 * the requests lists without clearing the active list, leading to
> @@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
>  {
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
> +	INIT_LIST_HEAD(&ring->fence_signal_list);
>  	INIT_LIST_HEAD(&ring->delayed_free_list);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d87f173..e446509 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs *ring)
>  
>  	trace_i915_gem_request_notify(ring);
>  
> +	i915_gem_request_notify(ring);
> +
>  	wake_up_all(&ring->irq_queue);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 9ee80f5..18dbd5c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1808,6 +1808,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	ring->dev = dev;
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
> +	INIT_LIST_HEAD(&ring->fence_signal_list);
>  	INIT_LIST_HEAD(&ring->delayed_free_list);
>  	spin_lock_init(&ring->fence_lock);
>  	spin_lock_init(&ring->delayed_free_lock);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 11494a3..83a5254 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2040,6 +2040,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  	INIT_LIST_HEAD(&ring->active_list);
>  	INIT_LIST_HEAD(&ring->request_list);
>  	INIT_LIST_HEAD(&ring->execlist_queue);
> +	INIT_LIST_HEAD(&ring->fence_signal_list);
>  	INIT_LIST_HEAD(&ring->delayed_free_list);
>  	spin_lock_init(&ring->fence_lock);
>  	spin_lock_init(&ring->delayed_free_lock);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 68173a3..2e68b73 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -352,6 +352,7 @@ struct  intel_engine_cs {
>  	u32 (*get_cmd_length_mask)(u32 cmd_header);
>  
>  	spinlock_t fence_lock;
> +	struct list_head fence_signal_list;
>  };
>  
>  bool intel_ring_initialized(struct intel_engine_cs *ring);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-17 14:31 ` [RFC 3/9] drm/i915: Convert requests to use struct fence John.C.Harrison
  2015-07-21  7:05   ` Daniel Vetter
@ 2015-07-22 14:26   ` Tvrtko Ursulin
  2015-07-28 10:10     ` John Harrison
  2015-07-22 14:45   ` Tvrtko Ursulin
  2 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-22 14:26 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


Hi,

On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is intended
> to keep track of work that is executed on hardware. I.e. it solves the basic
> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
> request structure does quite a lot more than simply track the execution progress
> so is very definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain all the
> advantages that provides.
>
> This patch makes the first step of integrating a struct fence into the request.
> It replaces the explicit reference count with that of the fence. It also
> replaces the 'is completed' test with the fence's equivalent. Currently, that
> simply chains on to the original request implementation. A future patch will
> improve this.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>   drivers/gpu/drm/i915/i915_gem.c         | 58 ++++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>   5 files changed, 80 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cf6761c..79d346c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -50,6 +50,7 @@
>   #include <linux/intel-iommu.h>
>   #include <linux/kref.h>
>   #include <linux/pm_qos.h>
> +#include <linux/fence.h>
>
>   /* General customization:
>    */
> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>    * initial reference taken using kref_init
>    */
>   struct drm_i915_gem_request {
> -	struct kref ref;
> +	/**
> +	 * Underlying object for implementing the signal/wait stuff.
> +	 * NB: Never call fence_later() or return this fence object to user
> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> +	 * etc., there is no guarantee at all about the validity or
> +	 * sequentiality of the fence's seqno! It is also unsafe to let
> +	 * anything outside of the i915 driver get hold of the fence object
> +	 * as the clean up when decrementing the reference count requires
> +	 * holding the driver mutex lock.
> +	 */
> +	struct fence fence;
>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx,
>   			   struct drm_i915_gem_request **req_out);
>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> +					      bool lazy_coherency)
> +{
> +	return fence_is_signaled(&req->fence);
> +}
> +
>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>   				   struct drm_file *file);
>
> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>   {
>   	if (req)
> -		kref_get(&req->ref);
> +		fence_get(&req->fence);
>   	return req;
>   }
>
> @@ -2255,7 +2272,7 @@ static inline void
>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>   {
>   	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> -	kref_put(&req->ref, i915_gem_request_free);
> +	fence_put(&req->fence);
>   }
>
>   static inline void
> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>   		return;
>
>   	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>   		mutex_unlock(&dev->struct_mutex);

Would it be nicer to add fence_put_mutex(struct fence *, struct mutex *) 
for this? It would avoid the layering violation of requests peeking into 
fence implementation details.

>   }
>
> @@ -2284,12 +2301,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   }
>
>   /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
>    * A command that requires special handling by the command parser.
>    */
>   struct drm_i915_cmd_descriptor {
> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>   	return (int32_t)(seq1 - seq2) >= 0;
>   }
>
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> -{
> -	u32 seqno;
> -
> -	BUG_ON(req == NULL);
> -
> -	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
> -
> -	return i915_seqno_passed(seqno, req->seqno);
> -}
> -
>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>   int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9f2701..888bb72 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free(struct fence *req_fence)
>   {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
>   	struct intel_context *ctx = req->ctx;
>
> +	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));

It would be nicer (for the user experience, even if only the developer 
working on the code) to WARN and leak.

> +
>   	if (req->file_priv)
>   		i915_gem_request_remove_from_client(req);
>
> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>   	kmem_cache_free(req->i915->requests, req);
>   }
>
> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> +{
> +	return "i915_request";
> +}

I think this becomes kind of ABI once added so we need to make sure the 
best name is chosen to start with. I couldn't immediately figure out why 
not just "i915"?

> +
> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	return req->ring->name;
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +{
> +	/* Interrupt driven fences are not implemented yet.*/
> +	WARN(true, "This should not be called!");
> +	return true;

I suppose WARN is not really needed in the interim patch. Would return 
false work?

> +}
> +
> +static bool i915_gem_request_is_completed(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	u32 seqno;
> +
> +	BUG_ON(req == NULL);

Hm, I don't think container_of can return NULL in a meaningful way.

> +
> +	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +
> +	return i915_seqno_passed(seqno, req->seqno);
> +}
> +
> +static const struct fence_ops i915_gem_request_fops = {
> +	.get_driver_name	= i915_gem_request_get_driver_name,
> +	.get_timeline_name	= i915_gem_request_get_timeline_name,
> +	.enable_signaling	= i915_gem_request_enable_signaling,
> +	.signaled		= i915_gem_request_is_completed,
> +	.wait			= fence_default_wait,
> +	.release		= i915_gem_request_free,
> +};
> +
>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx,
>   			   struct drm_i915_gem_request **req_out)
> @@ -2658,7 +2701,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   	if (ret)
>   		goto err;
>
> -	kref_init(&req->ref);
>   	req->i915 = dev_priv;
>   	req->ring = ring;
>   	req->ctx  = ctx;
> @@ -2673,6 +2715,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   		goto err;
>   	}
>
> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
> +
>   	/*
>   	 * Reserve space in the ring buffer for all the commands required to
>   	 * eventually emit this request. This is to guarantee that the
> @@ -5021,7 +5065,7 @@ i915_gem_init_hw(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_engine_cs *ring;
> -	int ret, i, j;
> +	int ret, i, j, fence_base;
>
>   	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>   		return -EIO;
> @@ -5073,12 +5117,16 @@ i915_gem_init_hw(struct drm_device *dev)
>   			goto out;
>   	}
>
> +	fence_base = fence_context_alloc(I915_NUM_RINGS);
> +
>   	/* Now it is safe to go back round and do everything else: */
>   	for_each_ring(ring, dev_priv, i) {
>   		struct drm_i915_gem_request *req;
>
>   		WARN_ON(!ring->default_context);
>
> +		ring->fence_context = fence_base + i;

Could you store fence_base in dev_priv and then ring->init_hw could set 
up the fence_context on its own?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-17 14:31 ` [RFC 3/9] drm/i915: Convert requests to use struct fence John.C.Harrison
  2015-07-21  7:05   ` Daniel Vetter
  2015-07-22 14:26   ` Tvrtko Ursulin
@ 2015-07-22 14:45   ` Tvrtko Ursulin
  2015-07-28 10:18     ` John Harrison
  2 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-22 14:45 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is intended
> to keep track of work that is executed on hardware. I.e. it solves the basic
> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
> request structure does quite a lot more than simply track the execution progress
> so is very definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain all the
> advantages that provides.
>
> This patch makes the first step of integrating a struct fence into the request.
> It replaces the explicit reference count with that of the fence. It also
> replaces the 'is completed' test with the fence's equivalent. Currently, that
> simply chains on to the original request implementation. A future patch will
> improve this.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>   drivers/gpu/drm/i915/i915_gem.c         | 58 ++++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>   5 files changed, 80 insertions(+), 28 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index cf6761c..79d346c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -50,6 +50,7 @@
>   #include <linux/intel-iommu.h>
>   #include <linux/kref.h>
>   #include <linux/pm_qos.h>
> +#include <linux/fence.h>
>
>   /* General customization:
>    */
> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>    * initial reference taken using kref_init
>    */
>   struct drm_i915_gem_request {
> -	struct kref ref;
> +	/**
> +	 * Underlying object for implementing the signal/wait stuff.
> +	 * NB: Never call fence_later() or return this fence object to user
> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> +	 * etc., there is no guarantee at all about the validity or
> +	 * sequentiality of the fence's seqno! It is also unsafe to let
> +	 * anything outside of the i915 driver get hold of the fence object
> +	 * as the clean up when decrementing the reference count requires
> +	 * holding the driver mutex lock.
> +	 */
> +	struct fence fence;
>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx,
>   			   struct drm_i915_gem_request **req_out);
>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> +					      bool lazy_coherency)
> +{
> +	return fence_is_signaled(&req->fence);
> +}
> +
>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>   				   struct drm_file *file);
>
> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>   {
>   	if (req)
> -		kref_get(&req->ref);
> +		fence_get(&req->fence);
>   	return req;
>   }
>
> @@ -2255,7 +2272,7 @@ static inline void
>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>   {
>   	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> -	kref_put(&req->ref, i915_gem_request_free);
> +	fence_put(&req->fence);
>   }
>
>   static inline void
> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>   		return;
>
>   	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>   		mutex_unlock(&dev->struct_mutex);
>   }
>
> @@ -2284,12 +2301,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   }
>
>   /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
>    * A command that requires special handling by the command parser.
>    */
>   struct drm_i915_cmd_descriptor {
> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>   	return (int32_t)(seq1 - seq2) >= 0;
>   }
>
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> -{
> -	u32 seqno;
> -
> -	BUG_ON(req == NULL);
> -
> -	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
> -
> -	return i915_seqno_passed(seqno, req->seqno);
> -}
> -
>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>   int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index d9f2701..888bb72 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free(struct fence *req_fence)
>   {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
>   	struct intel_context *ctx = req->ctx;
>
> +	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> +
>   	if (req->file_priv)
>   		i915_gem_request_remove_from_client(req);
>
> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>   	kmem_cache_free(req->i915->requests, req);
>   }
>
> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> +{
> +	return "i915_request";
> +}
> +
> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	return req->ring->name;
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +{
> +	/* Interrupt driven fences are not implemented yet.*/
> +	WARN(true, "This should not be called!");
> +	return true;
> +}
> +
> +static bool i915_gem_request_is_completed(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	u32 seqno;
> +
> +	BUG_ON(req == NULL);
> +
> +	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +
> +	return i915_seqno_passed(seqno, req->seqno);
> +}

How does this really work? I don't see any fence code calling this, 
plus, this patch is not doing fence_signal anywhere. So is the whole 
thing functional at this point?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 5/9] drm/i915: Add per context timelines to fence object
  2015-07-17 14:31 ` [RFC 5/9] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2015-07-23 13:50   ` Tvrtko Ursulin
  2015-10-28 12:59     ` John Harrison
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-23 13:50 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


Hi,

On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The fence object used inside the request structure requires a sequence number.
> Although this is not used by the i915 driver itself, it could potentially be
> used by non-i915 code if the fence is passed outside of the driver. This is the
> intention as it allows external kernel drivers and user applications to wait on
> batch buffer completion asynchronously via the dma-buff fence API.
>
> To ensure that such external users are not confused by strange things happening
> with the seqno, this patch adds in a per context timeline that can provide a
> guaranteed in-order seqno value for the fence. This is safe because the
> scheduler will not re-order batch buffers within a context - they are considered
> to be mutually dependent.
>
> [new patch in series]
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 25 ++++++++----
>   drivers/gpu/drm/i915/i915_gem.c         | 69 ++++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++-
>   drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>   5 files changed, 103 insertions(+), 15 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 0c7df46..88a4746 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -840,6 +840,15 @@ struct i915_ctx_hang_stats {
>   	bool banned;
>   };
>
> +struct i915_fence_timeline {
> +	unsigned    fence_context;
> +	uint32_t    context;

Unused field?

> +	uint32_t    next;

fence.h defines seqnos as 'unsigned', which matches this in practice, 
but maybe it would be nicer to use the same type name.

> +
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *ring;
> +};
> +
>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>   #define DEFAULT_CONTEXT_HANDLE 0
>
> @@ -885,6 +894,7 @@ struct intel_context {
>   		struct drm_i915_gem_object *state;
>   		struct intel_ringbuffer *ringbuf;
>   		int pin_count;
> +		struct i915_fence_timeline fence_timeline;
>   	} engine[I915_NUM_RINGS];
>
>   	struct list_head link;
> @@ -2153,13 +2163,10 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   struct drm_i915_gem_request {
>   	/**
>   	 * Underlying object for implementing the signal/wait stuff.
> -	 * NB: Never call fence_later() or return this fence object to user
> -	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
> -	 * etc., there is no guarantee at all about the validity or
> -	 * sequentiality of the fence's seqno! It is also unsafe to let
> -	 * anything outside of the i915 driver get hold of the fence object
> -	 * as the clean up when decrementing the reference count requires
> -	 * holding the driver mutex lock.
> +	 * NB: Never return this fence object to user land! It is unsafe to
> +	 * let anything outside of the i915 driver get hold of the fence
> +	 * object as the clean up when decrementing the reference count
> +	 * requires holding the driver mutex lock.
>   	 */
>   	struct fence fence;
>
> @@ -2239,6 +2246,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct drm_i915_gem_request **req_out);
>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct intel_context *ctx,
> +			       struct intel_engine_cs *ring);
> +
>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>   {
>   	return fence_is_signaled(&req->fence);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3970250..af79716 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2671,6 +2671,25 @@ static bool i915_gem_request_is_completed(struct fence *req_fence)
>   	return i915_seqno_passed(seqno, req->seqno);
>   }
>
> +static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(fence, typeof(*req), fence);
> +
> +	/* Last signalled timeline value ??? */
> +	snprintf(str, size, "? [%d]"/*, tl->value*/, req->ring->get_seqno(req->ring, true));
> +}

If timelines are per context now maybe we should update 
i915_gem_request_get_timeline_name to be per context instead of per 
engine as well? Like this we have a name space overlap / seqno 
collisions from userspace point of view.

> +static void i915_fence_value_str(struct fence *fence, char *str, int size)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(fence, typeof(*req), fence);
> +
> +	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
> +}
> +
>   static const struct fence_ops i915_gem_request_fops = {
>   	.get_driver_name	= i915_gem_request_get_driver_name,
>   	.get_timeline_name	= i915_gem_request_get_timeline_name,
> @@ -2678,8 +2697,48 @@ static const struct fence_ops i915_gem_request_fops = {
>   	.signaled		= i915_gem_request_is_completed,
>   	.wait			= fence_default_wait,
>   	.release		= i915_gem_request_free,
> +	.fence_value_str	= i915_fence_value_str,
> +	.timeline_value_str	= i915_fence_timeline_value_str,
>   };
>
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct intel_context *ctx,
> +			       struct intel_engine_cs *ring)
> +{
> +	struct i915_fence_timeline *timeline;
> +
> +	timeline = &ctx->engine[ring->id].fence_timeline;
> +
> +	if (timeline->ring)
> +		return 0;
> +
> +	timeline->fence_context = fence_context_alloc(1);
> +
> +	/*
> +	 * Start the timeline from seqno 0 as this is a special value
> +	 * that is reserved for invalid sync points.
> +	 */
> +	timeline->next       = 1;
> +	timeline->ctx        = ctx;
> +	timeline->ring       = ring;
> +
> +	return 0;
> +}
> +
> +static uint32_t i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
> +{
> +	uint32_t seqno;
> +
> +	seqno = timeline->next;
> +
> +	/* Reserve zero for invalid */
> +	if (++timeline->next == 0 ) {
> +		timeline->next = 1;
> +	}
> +
> +	return seqno;
> +}
> +
>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx,
>   			   struct drm_i915_gem_request **req_out)
> @@ -2715,7 +2774,9 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   		goto err;
>   	}
>
> -	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
> +		   ctx->engine[ring->id].fence_timeline.fence_context,
> +		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));

I suppose for debugging it could be useful to add this new seqno in 
i915_gem_request_info to have visibility at both sides. To map userspace 
seqnos to driver state.

>   	/*
>   	 * Reserve space in the ring buffer for all the commands required to
> @@ -5065,7 +5126,7 @@ i915_gem_init_hw(struct drm_device *dev)
>   {
>   	struct drm_i915_private *dev_priv = dev->dev_private;
>   	struct intel_engine_cs *ring;
> -	int ret, i, j, fence_base;
> +	int ret, i, j;
>
>   	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>   		return -EIO;
> @@ -5117,16 +5178,12 @@ i915_gem_init_hw(struct drm_device *dev)
>   			goto out;
>   	}
>
> -	fence_base = fence_context_alloc(I915_NUM_RINGS);
> -
>   	/* Now it is safe to go back round and do everything else: */
>   	for_each_ring(ring, dev_priv, i) {
>   		struct drm_i915_gem_request *req;
>
>   		WARN_ON(!ring->default_context);
>
> -		ring->fence_context = fence_base + i;
> -
>   		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
>   		if (ret) {
>   			i915_gem_cleanup_ringbuffer(dev);
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index b77a8f7..7eb8694 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -242,7 +242,7 @@ i915_gem_create_context(struct drm_device *dev,
>   {
>   	const bool is_global_default_ctx = file_priv == NULL;
>   	struct intel_context *ctx;
> -	int ret = 0;
> +	int i, ret = 0;
>
>   	BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>
> @@ -250,6 +250,19 @@ i915_gem_create_context(struct drm_device *dev,
>   	if (IS_ERR(ctx))
>   		return ctx;
>
> +	if (!i915.enable_execlists) {
> +		struct intel_engine_cs *ring;
> +
> +		/* Create a per context timeline for fences */
> +		for_each_ring(ring, to_i915(dev), i) {
> +			ret = i915_create_fence_timeline(dev, ctx, ring);
> +			if (ret) {
> +				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n", ring->name, ctx);
> +				goto err_destroy;
> +			}
> +		}
> +	}
> +
>   	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
>   		/* We may need to do things with the shrinker which
>   		 * require us to immediately switch back to the default
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ee4aecd..8f255de 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2376,6 +2376,14 @@ int intel_lr_context_deferred_create(struct intel_context *ctx,
>   		goto error;
>   	}
>
> +	/* Create a per context timeline for fences */
> +	ret = i915_create_fence_timeline(dev, ctx, ring);
> +	if (ret) {
> +		DRM_ERROR("Fence timeline creation failed for ring %s, ctx %p\n",
> +			  ring->name, ctx);
> +		goto error;
> +	}
> +

We must be 100% sure userspace cannot provoke context creation failure 
by accident or deliberately. Otherwise we would leak fence contexts 
until overflow which would be bad.

Perhaps matching fence_context_release for existing fence_context_alloc 
should be added?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 6/9] drm/i915: Delay the freeing of requests until retire time
  2015-07-17 14:31 ` [RFC 6/9] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
@ 2015-07-23 14:25   ` Tvrtko Ursulin
  2015-10-28 13:00     ` John Harrison
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-23 14:25 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


Hi,

On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The request structure is reference counted. When the count reached
> zero, the request was immediately freed and all associated objects
> were unrefereced/unallocated. This meant that the driver mutex lock
> must be held at the point where the count reaches zero. This was fine
> while all references were held internally to the driver. However, the
> plan is to allow the underlying fence object (and hence the request
> itself) to be returned to other drivers and to userland. External
> users cannot be expected to acquire a driver private mutex lock.
>
> Rather than attempt to disentangle the request structure from the
> driver mutex lock, the decsion was to defer the free code until a
> later (safer) point. Hence this patch changes the unreference callback
> to merely move the request onto a delayed free list. The driver's
> retire worker thread will then process the list and actually call the
> free function on the requests.
>
> [new patch in series]
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 22 +++---------------
>   drivers/gpu/drm/i915/i915_gem.c         | 41 +++++++++++++++++++++++++++++----
>   drivers/gpu/drm/i915/intel_display.c    |  2 +-
>   drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
>   drivers/gpu/drm/i915/intel_pm.c         |  2 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
>   drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
>   7 files changed, 50 insertions(+), 25 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 88a4746..61c3db2 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2161,14 +2161,9 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>    * initial reference taken using kref_init
>    */
>   struct drm_i915_gem_request {
> -	/**
> -	 * Underlying object for implementing the signal/wait stuff.
> -	 * NB: Never return this fence object to user land! It is unsafe to
> -	 * let anything outside of the i915 driver get hold of the fence
> -	 * object as the clean up when decrementing the reference count
> -	 * requires holding the driver mutex lock.
> -	 */
> +	/** Underlying object for implementing the signal/wait stuff. */
>   	struct fence fence;
> +	struct list_head delay_free_list;

Maybe call this delay_free_link to continue the established convention.

>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2281,21 +2276,10 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
>   static inline void
>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>   {
> -	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> -	fence_put(&req->fence);
> -}
> -
> -static inline void
> -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
> -{
> -	struct drm_device *dev;
> -
>   	if (!req)
>   		return;
>
> -	dev = req->ring->dev;
> -	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
> -		mutex_unlock(&dev->struct_mutex);
> +	fence_put(&req->fence);
>   }
>
>   static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index af79716..482835a 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2616,10 +2616,27 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> -static void i915_gem_request_free(struct fence *req_fence)
> +static void i915_gem_request_release(struct fence *req_fence)
>   {
>   	struct drm_i915_gem_request *req = container_of(req_fence,
>   						 typeof(*req), fence);
> +	struct intel_engine_cs *ring = req->ring;
> +	struct drm_i915_private *dev_priv = to_i915(ring->dev);
> +	unsigned long flags;
> +
> +	/*
> +	 * Need to add the request to a deferred dereference list to be
> +	 * processed at a mutex lock safe time.
> +	 */
> +	spin_lock_irqsave(&ring->delayed_free_lock, flags);

At the moment there is no request unreferencing from irq handlers right? 
Unless (or until) you plan to add that you could use simple spin_lock 
here. (And in the i915_gem_retire_requests_ring.)

> +	list_add_tail(&req->delay_free_list, &ring->delayed_free_list);
> +	spin_unlock_irqrestore(&ring->delayed_free_lock, flags);
> +
> +	queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);

Have you decided to re-use the retire worker just for convenience of for 
some other reason as well?

I found it a bit unexpected and though dedicated request free worker 
would be cleaner, but I don't know, not a strong opinion.

> +}
> +
> +static void i915_gem_request_free(struct drm_i915_gem_request *req)
> +{
>   	struct intel_context *ctx = req->ctx;
>
>   	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
> @@ -2696,7 +2713,7 @@ static const struct fence_ops i915_gem_request_fops = {
>   	.enable_signaling	= i915_gem_request_enable_signaling,
>   	.signaled		= i915_gem_request_is_completed,
>   	.wait			= fence_default_wait,
> -	.release		= i915_gem_request_free,
> +	.release		= i915_gem_request_release,
>   	.fence_value_str	= i915_fence_value_str,
>   	.timeline_value_str	= i915_fence_timeline_value_str,
>   };
> @@ -2992,6 +3009,21 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   		i915_gem_request_assign(&ring->trace_irq_req, NULL);
>   	}
>
> +	while (!list_empty(&ring->delayed_free_list)) {
> +		struct drm_i915_gem_request *request;
> +		unsigned long flags;
> +
> +		request = list_first_entry(&ring->delayed_free_list,
> +					   struct drm_i915_gem_request,
> +					   delay_free_list);

Need a spinlock to sample list head here. Then maybe move it on a 
temporary list and do the freeing afterwards.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
  2015-07-20  9:09   ` Maarten Lankhorst
  2015-07-21  7:19   ` Daniel Vetter
@ 2015-07-27 11:33   ` Tvrtko Ursulin
  2015-10-28 13:00     ` John Harrison
  2015-07-27 13:20   ` Tvrtko Ursulin
  3 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-27 11:33 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


Hi,

On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer requiring the
> mutex lock.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         |   8 ++
>   drivers/gpu/drm/i915/i915_gem.c         | 132 +++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>   drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>   6 files changed, 136 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 61c3db2..d7f1aa5 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   struct drm_i915_gem_request {
>   	/** Underlying object for implementing the signal/wait stuff. */
>   	struct fence fence;
> +	struct list_head signal_list;
> +	struct list_head unsignal_list;

In addition to what Daniel said (one list_head looks enough) it is 
customary to call it _link.

>   	struct list_head delay_free_list;
> +	bool cancelled;
> +	bool irq_enabled;
>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct drm_i915_gem_request **req_out);
>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>
> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req);
> +void i915_gem_request_notify(struct intel_engine_cs *ring);
> +
>   int i915_create_fence_timeline(struct drm_device *dev,
>   			       struct intel_context *ctx,
>   			       struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 482835a..7c589a9 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1222,6 +1222,11 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	if (list_empty(&req->list))
>   		return 0;
>
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	i915_gem_request_enable_interrupt(req);
> +
>   	if (i915_gem_request_completed(req))
>   		return 0;
>
> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   	list_del_init(&request->list);
>   	i915_gem_request_remove_from_client(request);
>
> +	/* In case the request is still in the signal pending list */
> +	if (!list_empty(&request->signal_list))
> +		request->cancelled = true;
> +
>   	i915_gem_request_unreference(request);
>   }
>
> @@ -2534,6 +2543,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   	 */
>   	request->postfix = intel_ring_get_tail(ringbuf);
>
> +	/*
> +	 * Add the fence to the pending list before emitting the commands to
> +	 * generate a seqno notification interrupt.
> +	 */
> +	i915_gem_request_submit(request);
> +
>   	if (i915.enable_execlists)
>   		ret = ring->emit_request(request);
>   	else {
> @@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct drm_i915_gem_request *req)
>   		i915_gem_context_unreference(ctx);
>   	}
>
> +	if (req->irq_enabled)
> +		req->ring->irq_put(req->ring);
> +

We get here with interrupts still enabled only if userspace is 
abandoning a wait on an unsignaled fence, did I get that right?

>   	kmem_cache_free(req->i915->requests, req);
>   }
>
> @@ -2668,24 +2686,105 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>   	return req->ring->name;
>   }
>
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request has been submitted to the hardware so add the fence to the
> + * list of signalable fences.
> + *
> + * NB: This does not enable interrupts yet. That only occurs on demand when
> + * the request is actually waited on. However, adding it to the list early
> + * ensures that there is no race condition where the interrupt could pop
> + * out prematurely and thus be completely lost. The race is merely that the
> + * interrupt must be manually checked for after being enabled.
> + */
> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
>   {
> -	/* Interrupt driven fences are not implemented yet.*/
> -	WARN(true, "This should not be called!");
> -	return true;
> +	fence_enable_sw_signaling(&req->fence);
>   }
>
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req)
> +{
> +	if (req->irq_enabled)
> +		return;
> +
> +	WARN_ON(!req->ring->irq_get(req->ring));
> +	req->irq_enabled = true;

req->irq_enabled manipulations look racy. Here and in request free it is 
protected by struct_mutex, but that is not held in 
i915_gem_request_notify. Initial feeling is you should use 
ring->fence_lock everyplace you query/manipulate req->irq_enabled.

> +
> +	/*
> +	 * Because the interrupt is only enabled on demand, there is a race
> +	 * where the interrupt can fire before anyone is looking for it. So
> +	 * do an explicit check for missed interrupts.
> +	 */
> +	i915_gem_request_notify(req->ring);
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>   {
>   	struct drm_i915_gem_request *req = container_of(req_fence,
>   						 typeof(*req), fence);
> +
> +	i915_gem_request_reference(req);
> +	WARN_ON(!list_empty(&req->signal_list));

It looks very unsafe to proceed normally after this WARN_ON. It should 
probably return false here to preserve data structure sanity.

> +	list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
> +
> +	/*
> +	 * Note that signalling is always enabled for every request before
> +	 * that request is submitted to the hardware. Therefore there is
> +	 * no race condition whereby the signal could pop out before the
> +	 * request has been added to the list. Hence no need to check
> +	 * for completion, undo the list add and return false.
> +	 *
> +	 * NB: Interrupts are only enabled on demand. Thus there is still a
> +	 * race where the request could complete before the interrupt has
> +	 * been enabled. Thus care must be taken at that point.
> +	 */
> +
> +	return true;
> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *ring)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>   	u32 seqno;
> +	LIST_HEAD(free_list);
>
> -	BUG_ON(req == NULL);
> +	if (list_empty(&ring->fence_signal_list))
> +		return;
> +
> +	seqno = ring->get_seqno(ring, false);
> +
> +	spin_lock_irqsave(&ring->fence_lock, flags);
> +	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
> +		if (!req->cancelled) {
> +			if (!i915_seqno_passed(seqno, req->seqno))
> +				continue;
>
> -	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
> +			fence_signal_locked(&req->fence);
> +		}
> +
> +		list_del_init(&req->signal_list);

I haven't managed to figure out why is this apparently removing requests 
which have not been signalled from the signal_list?

Shouldn't they be moved to free_list only if i915_seqno_passed?

> +		if (req->irq_enabled) {
> +			req->ring->irq_put(req->ring);
> +			req->irq_enabled = false;
> +		}
>
> -	return i915_seqno_passed(seqno, req->seqno);
> +		/* Can't unreference here because that might grab fence_lock */
> +		list_add_tail(&req->unsignal_list, &free_list);
> +	}
> +	spin_unlock_irqrestore(&ring->fence_lock, flags);
> +
> +	/* It should now be safe to actually free the requests */
> +	while (!list_empty(&free_list)) {
> +		req = list_first_entry(&free_list,
> +				       struct drm_i915_gem_request, unsignal_list);
> +		list_del(&req->unsignal_list);
> +
> +		i915_gem_request_unreference(req);
> +	}
>   }
>
>   static void i915_fence_timeline_value_str(struct fence *fence, char *str, int size)
> @@ -2711,7 +2810,6 @@ static const struct fence_ops i915_gem_request_fops = {
>   	.get_driver_name	= i915_gem_request_get_driver_name,
>   	.get_timeline_name	= i915_gem_request_get_timeline_name,
>   	.enable_signaling	= i915_gem_request_enable_signaling,
> -	.signaled		= i915_gem_request_is_completed,
>   	.wait			= fence_default_wait,
>   	.release		= i915_gem_request_release,
>   	.fence_value_str	= i915_fence_value_str,
> @@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   		goto err;
>   	}
>
> +	INIT_LIST_HEAD(&req->signal_list);
>   	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>   		   ctx->engine[ring->id].fence_timeline.fence_context,
>   		   i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
> @@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
>
>   		i915_gem_request_retire(request);
>   	}
> +
> +	/*
> +	 * Make sure any requests that were on the signal pending list get
> +	 * cleaned up.
> +	 */
> +	i915_gem_request_notify(ring);
> +	i915_gem_retire_requests_ring(ring);

Would i915_gem_retire_requests_ring be enough given how it calls 
i915_gem_request_notify itself as the first thing below?

>   }
>
>   void i915_gem_restore_fences(struct drm_device *dev)
> @@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>   {
>   	WARN_ON(i915_verify_lists(ring->dev));
>
> +	/*
> +	 * If no-one has waited on a request recently then interrupts will
> +	 * not have been enabled and thus no requests will ever be marked as
> +	 * completed. So do an interrupt check now.
> +	 */
> +	i915_gem_request_notify(ring);
> +
>   	/* Retire requests first as we use it above for the early return.
>   	 * If we retire requests last, we may use a later seqno and so clear
>   	 * the requests lists without clearing the active list, leading to
> @@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
>   {
>   	INIT_LIST_HEAD(&ring->active_list);
>   	INIT_LIST_HEAD(&ring->request_list);
> +	INIT_LIST_HEAD(&ring->fence_signal_list);
>   	INIT_LIST_HEAD(&ring->delayed_free_list);
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index d87f173..e446509 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs *ring)
>
>   	trace_i915_gem_request_notify(ring);
>
> +	i915_gem_request_notify(ring);
> +

How many requests are typically on signal_list on some typical 
workloads? This could be a significant performance change since on every 
user interrupt it would talk it all potentially only removing one 
request at a time.

These are just review comments on this particular patch without thinking 
yet of the bigger design questions Daniel has raised.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL
  2015-07-17 14:31 ` [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
@ 2015-07-27 13:00   ` Tvrtko Ursulin
  2015-10-28 13:01     ` John Harrison
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-27 13:00 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX



On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> Various projects desire a mechanism for managing dependencies between
> work items asynchronously. This can also include work items across
> complete different and independent systems. For example, an
> application wants to retreive a frame from a video in device,
> using it for rendering on a GPU then send it to the video out device
> for display all without having to stall waiting for completion along
> the way. The sync framework allows this. It encapsulates
> synchronisation events in file descriptors. The application can
> request a sync point for the completion of each piece of work. Drivers
> should also take sync points in with each new work request and not
> schedule the work to start until the sync has been signalled.
>
> This patch adds sync framework support to the exec buffer IOCTL. A
> sync point can be passed in to stall execution of the batch buffer
> until signalled. And a sync point can be returned after each batch
> buffer submission which will be signalled upon that batch buffer's
> completion.
>
> At present, the input sync point is simply waited on synchronously
> inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
> this will be handled asynchronously inside the scheduler and the IOCTL
> can return without having to wait.
>
> Note also that the scheduler will re-order the execution of batch
> buffers, e.g. because a batch buffer is stalled on a sync point and
> cannot be submitted yet but other, independent, batch buffers are
> being presented to the driver. This means that the timeline within the
> sync points returned cannot be global to the engine. Instead they must
> be kept per context per engine (the scheduler may not re-order batches
> within a context). Hence the timeline cannot be based on the existing
> seqno values but must be a new implementation.
>
> This patch is a port of work by several people that has been pulled
> across from Android. It has been updated several times across several
> patches. Rather than attempt to port each individual patch, this
> version is the finished product as a single patch. The various
> contributors/authors along the way (in addition to myself) were:
>    Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
>    Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>    Michel Thierry <michel.thierry@intel.com>
>    Arun Siluvery <arun.siluvery@linux.intel.com>
>
> [new patch in series]
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h            |  6 ++
>   drivers/gpu/drm/i915/i915_gem.c            | 84 ++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 90 ++++++++++++++++++++++++++++--
>   include/uapi/drm/i915_drm.h                | 16 +++++-
>   4 files changed, 188 insertions(+), 8 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d7f1aa5..cf6b7cd 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2168,6 +2168,7 @@ struct drm_i915_gem_request {
>   	struct list_head delay_free_list;
>   	bool cancelled;
>   	bool irq_enabled;
> +	bool fence_external;
>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2252,6 +2253,11 @@ void i915_gem_request_notify(struct intel_engine_cs *ring);
>   int i915_create_fence_timeline(struct drm_device *dev,
>   			       struct intel_context *ctx,
>   			       struct intel_engine_cs *ring);
> +#ifdef CONFIG_SYNC
> +struct sync_fence;
> +int i915_create_sync_fence(struct drm_i915_gem_request *req, int *fence_fd);
> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *fence);
> +#endif
>
>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 3f20087..de93422 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -37,6 +37,9 @@
>   #include <linux/swap.h>
>   #include <linux/pci.h>
>   #include <linux/dma-buf.h>
> +#ifdef CONFIG_SYNC
> +#include <../drivers/staging/android/sync.h>
> +#endif
>
>   #define RQ_BUG_ON(expr)
>
> @@ -2549,6 +2552,15 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   	 */
>   	i915_gem_request_submit(request);
>
> +	/*
> +	 * If an external sync point has been requested for this request then
> +	 * it can be waited on without the driver's knowledge, i.e. without
> +	 * calling __i915_wait_request(). Thus interrupts must be enabled
> +	 * from the start rather than only on demand.
> +	 */
> +	if (request->fence_external)
> +		i915_gem_request_enable_interrupt(request);

Maybe then fence_exported would be clearer, fence_external at first 
sounds like it is coming from another driver or something.

> +
>   	if (i915.enable_execlists)
>   		ret = ring->emit_request(request);
>   	else {
> @@ -2857,6 +2869,78 @@ static uint32_t i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
>   	return seqno;
>   }
>
> +#ifdef CONFIG_SYNC
> +int i915_create_sync_fence(struct drm_i915_gem_request *req, int *fence_fd)
> +{
> +	char ring_name[] = "i915_ring0";
> +	struct sync_fence *sync_fence;
> +	int fd;
> +
> +	fd = get_unused_fd_flags(O_CLOEXEC);
> +	if (fd < 0) {
> +		DRM_DEBUG("No available file descriptors!\n");
> +		*fence_fd = -1;
> +		return fd;
> +	}
> +
> +	ring_name[9] += req->ring->id;
> +	sync_fence = sync_fence_create_dma(ring_name, &req->fence);

This will call ->enable_signalling so perhaps you could enable 
interrupts in there for exported fences. Maybe it would be a tiny bit 
more logically grouped. (Rather than have _add_request do it.)

> +	if (!sync_fence) {
> +		put_unused_fd(fd);
> +		*fence_fd = -1;
> +		return -ENOMEM;
> +	}
> +
> +	sync_fence_install(sync_fence, fd);
> +	*fence_fd = fd;
> +
> +	// Necessary??? Who does the put???
> +	fence_get(&req->fence);

sync_fence_release?

> +
> +	req->fence_external = true;
> +
> +	return 0;
> +}
> +
> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct sync_fence *sync_fence)
> +{
> +	struct fence *dma_fence;
> +	struct drm_i915_gem_request *req;
> +	bool ignore;
> +	int i;
> +
> +	if (atomic_read(&sync_fence->status) != 0)
> +		return true;
> +
> +	ignore = true;
> +	for(i = 0; i < sync_fence->num_fences; i++) {
> +		dma_fence = sync_fence->cbs[i].sync_pt;
> +
> +		/* No need to worry about dead points: */
> +		if (fence_is_signaled(dma_fence))
> +			continue;
> +
> +		/* Can't ignore other people's points: */
> +		if(dma_fence->ops != &i915_gem_request_fops) {
> +			ignore = false;
> +			break;

The same as return false and then don't need bool ignore at all.

> +		}
> +
> +		req = container_of(dma_fence, typeof(*req), fence);
> +
> +		/* Can't ignore points on other rings: */
> +		if (req->ring != ring) {
> +			ignore = false;
> +			break;
> +		}
> +
> +		/* Same ring means guaranteed to be in order so ignore it. */
> +	}
> +
> +	return ignore;
> +}
> +#endif
> +
>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>   			   struct intel_context *ctx,
>   			   struct drm_i915_gem_request **req_out)
> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> index 923a3c4..b1a1659 100644
> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> @@ -26,6 +26,7 @@
>    *
>    */
>
> +#include <linux/syscalls.h>
>   #include <drm/drmP.h>
>   #include <drm/i915_drm.h>
>   #include "i915_drv.h"
> @@ -33,6 +34,9 @@
>   #include "intel_drv.h"
>   #include <linux/dma_remapping.h>
>   #include <linux/uaccess.h>
> +#ifdef CONFIG_SYNC
> +#include <../drivers/staging/android/sync.h>
> +#endif
>
>   #define  __EXEC_OBJECT_HAS_PIN (1<<31)
>   #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> @@ -1403,6 +1407,35 @@ eb_get_batch(struct eb_vmas *eb)
>   	return vma->obj;
>   }
>
> +#ifdef CONFIG_SYNC

I don't expect you'll be able to get away with ifdef's in the code like 
this so for non-RFC it will have to be cleaned up.

> +static int i915_early_fence_wait(struct intel_engine_cs *ring, int fence_fd)
> +{
> +	struct sync_fence *fence;
> +	int ret = 0;
> +
> +	if (fence_fd < 0) {
> +		DRM_ERROR("Invalid wait fence fd %d on ring %d\n", fence_fd,
> +			  (int) ring->id);
> +		return 1;
> +	}
> +
> +	fence = sync_fence_fdget(fence_fd);
> +	if (fence == NULL) {
> +		DRM_ERROR("Invalid wait fence %d on ring %d\n", fence_fd,
> +			  (int) ring->id);

These two should be DRM_DEBUG to prevent userspace from spamming the 
logs too easily.

> +		return 1;
> +	}
> +
> +	if (atomic_read(&fence->status) == 0) {
> +		if (!i915_safe_to_ignore_fence(ring, fence))
> +			ret = sync_fence_wait(fence, 1000);

I expect you have to wait indefinitely here, not just for one second.

> +	}
> +
> +	sync_fence_put(fence);
> +	return ret;
> +}
> +#endif
> +
>   static int
>   i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		       struct drm_file *file,
> @@ -1422,6 +1455,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	u32 dispatch_flags;
>   	int ret;
>   	bool need_relocs;
> +	int fd_fence_complete = -1;
> +#ifdef CONFIG_SYNC
> +	int fd_fence_wait = lower_32_bits(args->rsvd2);
> +#endif
> +
> +	/*
> +	 * Make sure an broken fence handle is not returned no matter
> +	 * how early an error might be hit. Note that rsvd2 has to be
> +	 * saved away first because it is also an input parameter!
> +	 */
> +	if (args->flags & I915_EXEC_CREATE_FENCE)
> +		args->rsvd2 = (__u64) -1;
>
>   	if (!i915_gem_check_execbuffer(args))
>   		return -EINVAL;
> @@ -1505,6 +1550,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   		dispatch_flags |= I915_DISPATCH_RS;
>   	}
>
> +#ifdef CONFIG_SYNC
> +	/*
> +	 * Without a GPU scheduler, any fence waits must be done up front.
> +	 */
> +	if (args->flags & I915_EXEC_WAIT_FENCE) {
> +		ret = i915_early_fence_wait(ring, fd_fence_wait);
> +		if (ret < 0)
> +			return ret;
> +
> +		args->flags &= ~I915_EXEC_WAIT_FENCE;
> +	}
> +#endif
> +
>   	intel_runtime_pm_get(dev_priv);
>
>   	ret = i915_mutex_lock_interruptible(dev);
> @@ -1652,6 +1710,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>   	params->batch_obj               = batch_obj;
>   	params->ctx                     = ctx;
>
> +#ifdef CONFIG_SYNC
> +	if (args->flags & I915_EXEC_CREATE_FENCE) {
> +		/*
> +		 * Caller has requested a sync fence.
> +		 * User interrupts will be enabled to make sure that
> +		 * the timeline is signalled on completion.
> +		 */
> +		ret = i915_create_sync_fence(params->request,
> +					     &fd_fence_complete);
> +		if (ret) {
> +			DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
> +				  ring->id, ctx);
> +			args->rsvd2 = (__u64) -1;
> +			goto err;
> +		}
> +
> +		/* Return the fence through the rsvd2 field */
> +		args->rsvd2 = (__u64) fd_fence_complete;
> +	}
> +#endif
> +
>   	ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
>
>   err_batch_unpin:
> @@ -1683,6 +1762,12 @@ pre_mutex_err:
>   	/* intel_gpu_busy should also get a ref, so it will free when the device
>   	 * is really idle. */
>   	intel_runtime_pm_put(dev_priv);
> +
> +	if (fd_fence_complete != -1) {
> +		sys_close(fd_fence_complete);

I am not sure calling system call functions from driver code will be 
allowed. that's why I was doing fd_install only when sure everything 
went OK.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
                     ` (2 preceding siblings ...)
  2015-07-27 11:33   ` Tvrtko Ursulin
@ 2015-07-27 13:20   ` Tvrtko Ursulin
  2015-07-27 14:00     ` Daniel Vetter
  3 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-07-27 13:20 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer requiring the
> mutex lock.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---

[snip]

> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   	list_del_init(&request->list);
>   	i915_gem_request_remove_from_client(request);
>
> +	/* In case the request is still in the signal pending list */
> +	if (!list_empty(&request->signal_list))
> +		request->cancelled = true;
> +

Another thing I did not see implemented is the sync_fence error state.

This is more about the Android part, but related to this canceled flag 
so I am commenting here.

I thought when TDR kicks in and we set request->cancelled to true, there 
should be a code path which somehow makes sync_fence->status negative.

As it is, because fence_signal will not be called on canceled, I thought 
waiters will wait until timeout, rather than being woken up and return 
error status.

For this to work you would somehow need to make sync_fence->status go 
negative. With normal fence completion it goes from 1 -> 0, via the 
completion callback. I did not immediately see how to make it go 
negative using the existing API.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-27 13:20   ` Tvrtko Ursulin
@ 2015-07-27 14:00     ` Daniel Vetter
  2015-08-03  9:20       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-07-27 14:00 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-GFX

On Mon, Jul 27, 2015 at 02:20:43PM +0100, Tvrtko Ursulin wrote:
> 
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> >From: John Harrison <John.C.Harrison@Intel.com>
> >
> >The intended usage model for struct fence is that the signalled status should be
> >set on demand rather than polled. That is, there should not be a need for a
> >'signaled' function to be called everytime the status is queried. Instead,
> >'something' should be done to enable a signal callback from the hardware which
> >will update the state directly. In the case of requests, this is the seqno
> >update interrupt. The idea is that this callback will only be enabled on demand
> >when something actually tries to wait on the fence.
> >
> >This change removes the polling test and replaces it with the callback scheme.
> >Each fence is added to a 'please poke me' list at the start of
> >i915_add_request(). The interrupt handler then scans through the 'poke me' list
> >when a new seqno pops out and signals any matching fence/request. The fence is
> >then removed from the list so the entire request stack does not need to be
> >scanned every time. Note that the fence is added to the list before the commands
> >to generate the seqno interrupt are added to the ring. Thus the sequence is
> >guaranteed to be race free if the interrupt is already enabled.
> >
> >Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> >called). Thus there is still a potential race when enabling the interrupt as the
> >request may already have completed. However, this is simply solved by calling
> >the interrupt processing code immediately after enabling the interrupt and
> >thereby checking for already completed requests.
> >
> >Lastly, the ring clean up code has the possibility to cancel outstanding
> >requests (e.g. because TDR has reset the ring). These requests will never get
> >signalled and so must be removed from the signal list manually. This is done by
> >setting a 'cancelled' flag and then calling the regular notify/retire code path
> >rather than attempting to duplicate the list manipulatation and clean up code in
> >multiple places. This also avoid any race condition where the cancellation
> >request might occur after/during the completion interrupt actually arriving.
> >
> >v2: Updated to take advantage of the request unreference no longer requiring the
> >mutex lock.
> >
> >For: VIZ-5190
> >Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >---
> 
> [snip]
> 
> >@@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> >  	list_del_init(&request->list);
> >  	i915_gem_request_remove_from_client(request);
> >
> >+	/* In case the request is still in the signal pending list */
> >+	if (!list_empty(&request->signal_list))
> >+		request->cancelled = true;
> >+
> 
> Another thing I did not see implemented is the sync_fence error state.
> 
> This is more about the Android part, but related to this canceled flag so I
> am commenting here.
> 
> I thought when TDR kicks in and we set request->cancelled to true, there
> should be a code path which somehow makes sync_fence->status negative.
> 
> As it is, because fence_signal will not be called on canceled, I thought
> waiters will wait until timeout, rather than being woken up and return error
> status.
> 
> For this to work you would somehow need to make sync_fence->status go
> negative. With normal fence completion it goes from 1 -> 0, via the
> completion callback. I did not immediately see how to make it go negative
> using the existing API.

I think back when we did struct fence we decided that we won't care yet
about forwarding error state since doing that across drivers if you have a
chain of fences looked complicated. And no one had any clear idea about
what kind of semantics we really want. If we want this we'd need to add
it, but probably better to do that as a follow-up (usual caveat about
open-source userspace and demonstration vehicles apply and all that).
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-21  7:05   ` Daniel Vetter
@ 2015-07-28 10:01     ` John Harrison
  0 siblings, 0 replies; 38+ messages in thread
From: John Harrison @ 2015-07-28 10:01 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel-GFX

On 21/07/2015 08:05, Daniel Vetter wrote:
> On Fri, Jul 17, 2015 at 03:31:17PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that is intended
>> to keep track of work that is executed on hardware. I.e. it solves the basic
>> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
>> request structure does quite a lot more than simply track the execution progress
>> so is very definitely still required. However, the basic completion status side
>> could be updated to use the ready made fence implementation and gain all the
>> advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into the request.
>> It replaces the explicit reference count with that of the fence. It also
>> replaces the 'is completed' test with the fence's equivalent. Currently, that
>> simply chains on to the original request implementation. A future patch will
>> improve this.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>>   drivers/gpu/drm/i915/i915_gem.c         | 58 ++++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>   5 files changed, 80 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index cf6761c..79d346c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -50,6 +50,7 @@
>>   #include <linux/intel-iommu.h>
>>   #include <linux/kref.h>
>>   #include <linux/pm_qos.h>
>> +#include <linux/fence.h>
>>   
>>   /* General customization:
>>    */
>> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -	struct kref ref;
>> +	/**
>> +	 * Underlying object for implementing the signal/wait stuff.
>> +	 * NB: Never call fence_later() or return this fence object to user
>> +	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
>> +	 * etc., there is no guarantee at all about the validity or
>> +	 * sequentiality of the fence's seqno! It is also unsafe to let
>> +	 * anything outside of the i915 driver get hold of the fence object
>> +	 * as the clean up when decrementing the reference count requires
>> +	 * holding the driver mutex lock.
>> +	 */
> This comment is outdated.
Not at this point in the patch series. Until the 'delay freeing of 
requests' patch it is very definitely unsafe to allow external reference 
counting of the fence object as decrementing the count must only be done 
with the mutex lock held. Likewise, without the per context per ring 
timeline, the seqno value would unsafe if any scheduling code was added. 
The comment is written from the point of view of what happens if this 
patch is taken as a stand alone patch without the rest of the series 
following and describes the state of the driver at that moment in time.

> Also I'm leaning towards squashing this patch
> with the one implementing fences with explicit irq enabling, to avoid
> churn and intermediate WARN_ONs. Each patch should be fully functional
> without requiring follow-up patches.

It is fully functional as a stand alone patch. The intermediate WARN_ONs 
would never fire unless someone takes only this patch and the starts 
adding extra (unsupported) usage of the fence object. If they want to do 
that then they must first add in the extra support that they need. Or 
they could just take the rest of the series which adds in that support 
for them.

Squashing the interrupt support into the basic fence implementation 
would just make for a much more complicated single patch.


> -Daniel
>
>
>> +	struct fence fence;
>>   
>>   	/** On Which ring this request was generated */
>>   	struct drm_i915_private *i915;
>> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>   			   struct intel_context *ctx,
>>   			   struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>> -void i915_gem_request_free(struct kref *req_ref);
>> +
>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> +					      bool lazy_coherency)
>> +{
>> +	return fence_is_signaled(&req->fence);
>> +}
>> +
>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>   				   struct drm_file *file);
>>   
>> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>   {
>>   	if (req)
>> -		kref_get(&req->ref);
>> +		fence_get(&req->fence);
>>   	return req;
>>   }
>>   
>> @@ -2255,7 +2272,7 @@ static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>>   	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> -	kref_put(&req->ref, i915_gem_request_free);
>> +	fence_put(&req->fence);
>>   }
>>   
>>   static inline void
>> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>>   		return;
>>   
>>   	dev = req->ring->dev;
>> -	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
>> +	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
>>   		mutex_unlock(&dev->struct_mutex);
>>   }
>>   
>> @@ -2284,12 +2301,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>   }
>>   
>>   /*
>> - * XXX: i915_gem_request_completed should be here but currently needs the
>> - * definition of i915_seqno_passed() which is below. It will be moved in
>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>> - */
>> -
>> -/*
>>    * A command that requires special handling by the command parser.
>>    */
>>   struct drm_i915_cmd_descriptor {
>> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>   	return (int32_t)(seq1 - seq2) >= 0;
>>   }
>>   
>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> -					      bool lazy_coherency)
>> -{
>> -	u32 seqno;
>> -
>> -	BUG_ON(req == NULL);
>> -
>> -	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
>> -
>> -	return i915_seqno_passed(seqno, req->seqno);
>> -}
>> -
>>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>   int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index d9f2701..888bb72 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>   	}
>>   }
>>   
>> -void i915_gem_request_free(struct kref *req_ref)
>> +static void i915_gem_request_free(struct fence *req_fence)
>>   {
>> -	struct drm_i915_gem_request *req = container_of(req_ref,
>> -						 typeof(*req), ref);
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>>   	struct intel_context *ctx = req->ctx;
>>   
>> +	BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> +
>>   	if (req->file_priv)
>>   		i915_gem_request_remove_from_client(req);
>>   
>> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>>   	kmem_cache_free(req->i915->requests, req);
>>   }
>>   
>> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>> +{
>> +	return "i915_request";
>> +}
>> +
>> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>> +{
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>> +	return req->ring->name;
>> +}
>> +
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +{
>> +	/* Interrupt driven fences are not implemented yet.*/
>> +	WARN(true, "This should not be called!");
>> +	return true;
>> +}
>> +
>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +{
>> +	struct drm_i915_gem_request *req = container_of(req_fence,
>> +						 typeof(*req), fence);
>> +	u32 seqno;
>> +
>> +	BUG_ON(req == NULL);
>> +
>> +	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +
>> +	return i915_seqno_passed(seqno, req->seqno);
>> +}
>> +
>> +static const struct fence_ops i915_gem_request_fops = {
>> +	.get_driver_name	= i915_gem_request_get_driver_name,
>> +	.get_timeline_name	= i915_gem_request_get_timeline_name,
>> +	.enable_signaling	= i915_gem_request_enable_signaling,
>> +	.signaled		= i915_gem_request_is_completed,
>> +	.wait			= fence_default_wait,
>> +	.release		= i915_gem_request_free,
>> +};
>> +
>>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>   			   struct intel_context *ctx,
>>   			   struct drm_i915_gem_request **req_out)
>> @@ -2658,7 +2701,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>   	if (ret)
>>   		goto err;
>>   
>> -	kref_init(&req->ref);
>>   	req->i915 = dev_priv;
>>   	req->ring = ring;
>>   	req->ctx  = ctx;
>> @@ -2673,6 +2715,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>   		goto err;
>>   	}
>>   
>> +	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
>> +
>>   	/*
>>   	 * Reserve space in the ring buffer for all the commands required to
>>   	 * eventually emit this request. This is to guarantee that the
>> @@ -5021,7 +5065,7 @@ i915_gem_init_hw(struct drm_device *dev)
>>   {
>>   	struct drm_i915_private *dev_priv = dev->dev_private;
>>   	struct intel_engine_cs *ring;
>> -	int ret, i, j;
>> +	int ret, i, j, fence_base;
>>   
>>   	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>>   		return -EIO;
>> @@ -5073,12 +5117,16 @@ i915_gem_init_hw(struct drm_device *dev)
>>   			goto out;
>>   	}
>>   
>> +	fence_base = fence_context_alloc(I915_NUM_RINGS);
>> +
>>   	/* Now it is safe to go back round and do everything else: */
>>   	for_each_ring(ring, dev_priv, i) {
>>   		struct drm_i915_gem_request *req;
>>   
>>   		WARN_ON(!ring->default_context);
>>   
>> +		ring->fence_context = fence_base + i;
>> +
>>   		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
>>   		if (ret) {
>>   			i915_gem_cleanup_ringbuffer(dev);
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 9faad82..ee4aecd 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1808,6 +1808,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>>   	ring->dev = dev;
>>   	INIT_LIST_HEAD(&ring->active_list);
>>   	INIT_LIST_HEAD(&ring->request_list);
>> +	spin_lock_init(&ring->fence_lock);
>>   	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>>   	init_waitqueue_head(&ring->irq_queue);
>>   
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 177f7ed..d1ced30 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2040,6 +2040,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>   	INIT_LIST_HEAD(&ring->active_list);
>>   	INIT_LIST_HEAD(&ring->request_list);
>>   	INIT_LIST_HEAD(&ring->execlist_queue);
>> +	spin_lock_init(&ring->fence_lock);
>>   	i915_gem_batch_pool_init(dev, &ring->batch_pool);
>>   	ringbuf->size = 32 * PAGE_SIZE;
>>   	ringbuf->ring = ring;
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 2e85fda..a4b0545 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -346,6 +346,9 @@ struct  intel_engine_cs {
>>   	 * to encode the command length in the header).
>>   	 */
>>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
>> +
>> +	unsigned fence_context;
>> +	spinlock_t fence_lock;
>>   };
>>   
>>   bool intel_ring_initialized(struct intel_engine_cs *ring);
>> -- 
>> 1.9.1
>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-22 14:26   ` Tvrtko Ursulin
@ 2015-07-28 10:10     ` John Harrison
  2015-08-03  9:17       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: John Harrison @ 2015-07-28 10:10 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 22/07/2015 15:26, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that 
>> is intended
>> to keep track of work that is executed on hardware. I.e. it solves 
>> the basic
>> problem that the drivers 'struct drm_i915_gem_request' is trying to 
>> address. The
>> request structure does quite a lot more than simply track the 
>> execution progress
>> so is very definitely still required. However, the basic completion 
>> status side
>> could be updated to use the ready made fence implementation and gain 
>> all the
>> advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into 
>> the request.
>> It replaces the explicit reference count with that of the fence. It also
>> replaces the 'is completed' test with the fence's equivalent. 
>> Currently, that
>> simply chains on to the original request implementation. A future 
>> patch will
>> improve this.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>>   drivers/gpu/drm/i915/i915_gem.c         | 58 
>> ++++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>   5 files changed, 80 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index cf6761c..79d346c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -50,6 +50,7 @@
>>   #include <linux/intel-iommu.h>
>>   #include <linux/kref.h>
>>   #include <linux/pm_qos.h>
>> +#include <linux/fence.h>
>>
>>   /* General customization:
>>    */
>> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -    struct kref ref;
>> +    /**
>> +     * Underlying object for implementing the signal/wait stuff.
>> +     * NB: Never call fence_later() or return this fence object to user
>> +     * land! Due to lazy allocation, scheduler re-ordering, 
>> pre-emption,
>> +     * etc., there is no guarantee at all about the validity or
>> +     * sequentiality of the fence's seqno! It is also unsafe to let
>> +     * anything outside of the i915 driver get hold of the fence object
>> +     * as the clean up when decrementing the reference count requires
>> +     * holding the driver mutex lock.
>> +     */
>> +    struct fence fence;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>                  struct intel_context *ctx,
>>                  struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>> -void i915_gem_request_free(struct kref *req_ref);
>> +
>> +static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req,
>> +                          bool lazy_coherency)
>> +{
>> +    return fence_is_signaled(&req->fence);
>> +}
>> +
>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>                      struct drm_file *file);
>>
>> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>   {
>>       if (req)
>> -        kref_get(&req->ref);
>> +        fence_get(&req->fence);
>>       return req;
>>   }
>>
>> @@ -2255,7 +2272,7 @@ static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>> WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> -    kref_put(&req->ref, i915_gem_request_free);
>> +    fence_put(&req->fence);
>>   }
>>
>>   static inline void
>> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct 
>> drm_i915_gem_request *req)
>>           return;
>>
>>       dev = req->ring->dev;
>> -    if (kref_put_mutex(&req->ref, i915_gem_request_free, 
>> &dev->struct_mutex))
>> +    if (kref_put_mutex(&req->fence.refcount, fence_release, 
>> &dev->struct_mutex))
>>           mutex_unlock(&dev->struct_mutex);
>
> Would it be nicer to add fence_put_mutex(struct fence *, struct mutex 
> *) for this? It would avoid the layering violation of requests peeking 
> into fence implementation details.

Maarten pointed out that adding 'fence_put_mutex()' is breaking the 
fence ABI itself. It requires users of any random fence to know and 
worry about what mutex objects that fence's internal implementation 
might require.

This is a bit more hacky from our point of view but it is only a 
temporary measure until the mutex lock is not required for 
dereferencing. At that point all the nasty stuff disappears completely.


>
>>   }
>>
>> @@ -2284,12 +2301,6 @@ static inline void 
>> i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>   }
>>
>>   /*
>> - * XXX: i915_gem_request_completed should be here but currently 
>> needs the
>> - * definition of i915_seqno_passed() which is below. It will be 
>> moved in
>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>> - */
>> -
>> -/*
>>    * A command that requires special handling by the command parser.
>>    */
>>   struct drm_i915_cmd_descriptor {
>> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>       return (int32_t)(seq1 - seq2) >= 0;
>>   }
>>
>> -static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req,
>> -                          bool lazy_coherency)
>> -{
>> -    u32 seqno;
>> -
>> -    BUG_ON(req == NULL);
>> -
>> -    seqno = req->ring->get_seqno(req->ring, lazy_coherency);
>> -
>> -    return i915_seqno_passed(seqno, req->seqno);
>> -}
>> -
>>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 
>> *seqno);
>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 
>> seqno);
>>   int __must_check i915_gem_object_get_fence(struct 
>> drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index d9f2701..888bb72 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct 
>> drm_i915_private *dev_priv,
>>       }
>>   }
>>
>> -void i915_gem_request_free(struct kref *req_ref)
>> +static void i915_gem_request_free(struct fence *req_fence)
>>   {
>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>> -                         typeof(*req), ref);
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>>       struct intel_context *ctx = req->ctx;
>>
>> + BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>
> It would be nicer (for the user experience, even if only the developer 
> working on the code) to WARN and leak.
Habit. Daniel hasn't quite broken my fingers of their BUG_ON preference.

>
>> +
>>       if (req->file_priv)
>>           i915_gem_request_remove_from_client(req);
>>
>> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>>       kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> +static const char *i915_gem_request_get_driver_name(struct fence 
>> *req_fence)
>> +{
>> +    return "i915_request";
>> +}
>
> I think this becomes kind of ABI once added so we need to make sure 
> the best name is chosen to start with. I couldn't immediately figure 
> out why not just "i915"?

The thought was that we might start using fences for other purposes at 
some point in the future. At which point it is helpful to differentiate 
them. If nothing else, a previous iteration of the native sync 
implementation was using different fence objects due to worries about 
allowing user land to get its grubby mitts on important driver internal 
structures.

>
>> +
>> +static const char *i915_gem_request_get_timeline_name(struct fence 
>> *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>> +    return req->ring->name;
>> +}
>> +
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +{
>> +    /* Interrupt driven fences are not implemented yet.*/
>> +    WARN(true, "This should not be called!");
>> +    return true;
>
> I suppose WARN is not really needed in the interim patch. Would return 
> false work?

The point is to keep the driver 'safe' if that patch is viewed as stand 
alone. I.e., if the interrupt follow up does not get merged immediately 
after (or not at all) then this prevents people accidentally trying to 
use an unsupported interface without first implementing it.

>
>> +}
>> +
>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>> +    u32 seqno;
>> +
>> +    BUG_ON(req == NULL);
>
> Hm, I don't think container_of can return NULL in a meaningful way.
Hmm, historical code. The request used to be a direct parameter.

>
>> +
>> +    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +
>> +    return i915_seqno_passed(seqno, req->seqno);
>> +}
>> +
>> +static const struct fence_ops i915_gem_request_fops = {
>> +    .get_driver_name    = i915_gem_request_get_driver_name,
>> +    .get_timeline_name    = i915_gem_request_get_timeline_name,
>> +    .enable_signaling    = i915_gem_request_enable_signaling,
>> +    .signaled        = i915_gem_request_is_completed,
>> +    .wait            = fence_default_wait,
>> +    .release        = i915_gem_request_free,
>> +};
>> +
>>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>                  struct intel_context *ctx,
>>                  struct drm_i915_gem_request **req_out)
>> @@ -2658,7 +2701,6 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>       if (ret)
>>           goto err;
>>
>> -    kref_init(&req->ref);
>>       req->i915 = dev_priv;
>>       req->ring = ring;
>>       req->ctx  = ctx;
>> @@ -2673,6 +2715,8 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>           goto err;
>>       }
>>
>> +    fence_init(&req->fence, &i915_gem_request_fops, 
>> &ring->fence_lock, ring->fence_context, req->seqno);
>> +
>>       /*
>>        * Reserve space in the ring buffer for all the commands 
>> required to
>>        * eventually emit this request. This is to guarantee that the
>> @@ -5021,7 +5065,7 @@ i915_gem_init_hw(struct drm_device *dev)
>>   {
>>       struct drm_i915_private *dev_priv = dev->dev_private;
>>       struct intel_engine_cs *ring;
>> -    int ret, i, j;
>> +    int ret, i, j, fence_base;
>>
>>       if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>>           return -EIO;
>> @@ -5073,12 +5117,16 @@ i915_gem_init_hw(struct drm_device *dev)
>>               goto out;
>>       }
>>
>> +    fence_base = fence_context_alloc(I915_NUM_RINGS);
>> +
>>       /* Now it is safe to go back round and do everything else: */
>>       for_each_ring(ring, dev_priv, i) {
>>           struct drm_i915_gem_request *req;
>>
>>           WARN_ON(!ring->default_context);
>>
>> +        ring->fence_context = fence_base + i;
>
> Could you store fence_base in dev_priv and then ring->init_hw could 
> set up the fence_context on its own?

Probably. It seemed to make more sense to me to keep the fence 
allocation all in one place rather than splitting it in half and 
distributing it. It gets removed again with the per context per ring 
timelines anyway.

>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-22 14:45   ` Tvrtko Ursulin
@ 2015-07-28 10:18     ` John Harrison
  2015-08-03  9:18       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: John Harrison @ 2015-07-28 10:18 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 22/07/2015 15:45, Tvrtko Ursulin wrote:
>
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that 
>> is intended
>> to keep track of work that is executed on hardware. I.e. it solves 
>> the basic
>> problem that the drivers 'struct drm_i915_gem_request' is trying to 
>> address. The
>> request structure does quite a lot more than simply track the 
>> execution progress
>> so is very definitely still required. However, the basic completion 
>> status side
>> could be updated to use the ready made fence implementation and gain 
>> all the
>> advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into 
>> the request.
>> It replaces the explicit reference count with that of the fence. It also
>> replaces the 'is completed' test with the fence's equivalent. 
>> Currently, that
>> simply chains on to the original request implementation. A future 
>> patch will
>> improve this.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>>   drivers/gpu/drm/i915/i915_gem.c         | 58 
>> ++++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>   5 files changed, 80 insertions(+), 28 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index cf6761c..79d346c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -50,6 +50,7 @@
>>   #include <linux/intel-iommu.h>
>>   #include <linux/kref.h>
>>   #include <linux/pm_qos.h>
>> +#include <linux/fence.h>
>>
>>   /* General customization:
>>    */
>> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -    struct kref ref;
>> +    /**
>> +     * Underlying object for implementing the signal/wait stuff.
>> +     * NB: Never call fence_later() or return this fence object to user
>> +     * land! Due to lazy allocation, scheduler re-ordering, 
>> pre-emption,
>> +     * etc., there is no guarantee at all about the validity or
>> +     * sequentiality of the fence's seqno! It is also unsafe to let
>> +     * anything outside of the i915 driver get hold of the fence object
>> +     * as the clean up when decrementing the reference count requires
>> +     * holding the driver mutex lock.
>> +     */
>> +    struct fence fence;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>                  struct intel_context *ctx,
>>                  struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>> -void i915_gem_request_free(struct kref *req_ref);
>> +
>> +static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req,
>> +                          bool lazy_coherency)
>> +{
>> +    return fence_is_signaled(&req->fence);
>> +}
>> +
>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>                      struct drm_file *file);
>>
>> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>   {
>>       if (req)
>> -        kref_get(&req->ref);
>> +        fence_get(&req->fence);
>>       return req;
>>   }
>>
>> @@ -2255,7 +2272,7 @@ static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>> WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> -    kref_put(&req->ref, i915_gem_request_free);
>> +    fence_put(&req->fence);
>>   }
>>
>>   static inline void
>> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct 
>> drm_i915_gem_request *req)
>>           return;
>>
>>       dev = req->ring->dev;
>> -    if (kref_put_mutex(&req->ref, i915_gem_request_free, 
>> &dev->struct_mutex))
>> +    if (kref_put_mutex(&req->fence.refcount, fence_release, 
>> &dev->struct_mutex))
>>           mutex_unlock(&dev->struct_mutex);
>>   }
>>
>> @@ -2284,12 +2301,6 @@ static inline void 
>> i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>   }
>>
>>   /*
>> - * XXX: i915_gem_request_completed should be here but currently 
>> needs the
>> - * definition of i915_seqno_passed() which is below. It will be 
>> moved in
>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>> - */
>> -
>> -/*
>>    * A command that requires special handling by the command parser.
>>    */
>>   struct drm_i915_cmd_descriptor {
>> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>       return (int32_t)(seq1 - seq2) >= 0;
>>   }
>>
>> -static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req,
>> -                          bool lazy_coherency)
>> -{
>> -    u32 seqno;
>> -
>> -    BUG_ON(req == NULL);
>> -
>> -    seqno = req->ring->get_seqno(req->ring, lazy_coherency);
>> -
>> -    return i915_seqno_passed(seqno, req->seqno);
>> -}
>> -
>>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 
>> *seqno);
>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 
>> seqno);
>>   int __must_check i915_gem_object_get_fence(struct 
>> drm_i915_gem_object *obj);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index d9f2701..888bb72 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct 
>> drm_i915_private *dev_priv,
>>       }
>>   }
>>
>> -void i915_gem_request_free(struct kref *req_ref)
>> +static void i915_gem_request_free(struct fence *req_fence)
>>   {
>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>> -                         typeof(*req), ref);
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>>       struct intel_context *ctx = req->ctx;
>>
>> + BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> +
>>       if (req->file_priv)
>>           i915_gem_request_remove_from_client(req);
>>
>> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>>       kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> +static const char *i915_gem_request_get_driver_name(struct fence 
>> *req_fence)
>> +{
>> +    return "i915_request";
>> +}
>> +
>> +static const char *i915_gem_request_get_timeline_name(struct fence 
>> *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>> +    return req->ring->name;
>> +}
>> +
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +{
>> +    /* Interrupt driven fences are not implemented yet.*/
>> +    WARN(true, "This should not be called!");
>> +    return true;
>> +}
>> +
>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>> +    u32 seqno;
>> +
>> +    BUG_ON(req == NULL);
>> +
>> +    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +
>> +    return i915_seqno_passed(seqno, req->seqno);
>> +}
>
> How does this really work? I don't see any fence code calling this, 
> plus, this patch is not doing fence_signal anywhere. So is the whole 
> thing functional at this point?

Do you mean fence code calling i915_gem_request_is_completed? It is a 
callback in the fence ops structure assigned a few lines lower in the patch:
 > + .signaled = i915_gem_request_is_completed,

Whenever 'fence_is_signaled(&fence)' is called it basically chains on to 
the .signalled callback inside the fence implementation. When the 
interrupt version comes in with the later patch, this is removed as the 
fence is then tracking its completion state and does not need to do the 
hardware query every time. However, at this point it still does need to 
chain on to reading the HWS. And yes, it is certainly supposed to be 
fully functional at this point! It certainly was when I was testing it.


>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-28 10:10     ` John Harrison
@ 2015-08-03  9:17       ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-08-03  9:17 UTC (permalink / raw)
  To: John Harrison, Intel-GFX


On 07/28/2015 11:10 AM, John Harrison wrote:
[snip]
>>>   static inline void
>>> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct
>>> drm_i915_gem_request *req)
>>>           return;
>>>
>>>       dev = req->ring->dev;
>>> -    if (kref_put_mutex(&req->ref, i915_gem_request_free,
>>> &dev->struct_mutex))
>>> +    if (kref_put_mutex(&req->fence.refcount, fence_release,
>>> &dev->struct_mutex))
>>>           mutex_unlock(&dev->struct_mutex);
>>
>> Would it be nicer to add fence_put_mutex(struct fence *, struct mutex
>> *) for this? It would avoid the layering violation of requests peeking
>> into fence implementation details.
>
> Maarten pointed out that adding 'fence_put_mutex()' is breaking the
> fence ABI itself. It requires users of any random fence to know and
> worry about what mutex objects that fence's internal implementation
> might require.

I don't see what it has to do with the ABI? I dislike both approaches 
actually and don't like the question of what is the lesser evil. :)

> This is a bit more hacky from our point of view but it is only a
> temporary measure until the mutex lock is not required for
> dereferencing. At that point all the nasty stuff disappears completely.

Yes saw that in later patches. No worries then, just a consequence of 
going patch by patch.

>>> +
>>>       if (req->file_priv)
>>>           i915_gem_request_remove_from_client(req);
>>>
>>> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>>>       kmem_cache_free(req->i915->requests, req);
>>>   }
>>>
>>> +static const char *i915_gem_request_get_driver_name(struct fence
>>> *req_fence)
>>> +{
>>> +    return "i915_request";
>>> +}
>>
>> I think this becomes kind of ABI once added so we need to make sure
>> the best name is chosen to start with. I couldn't immediately figure
>> out why not just "i915"?
>
> The thought was that we might start using fences for other purposes at
> some point in the future. At which point it is helpful to differentiate
> them. If nothing else, a previous iteration of the native sync
> implementation was using different fence objects due to worries about
> allowing user land to get its grubby mitts on important driver internal
> structures.

I don't follow on the connection between these names and the last 
concern? If I did, would it explain why driver name "i915" and 
differentiation by timeline names would be problematic? Looks much 
cleaner to me.

>>> +
>>> +static const char *i915_gem_request_get_timeline_name(struct fence
>>> *req_fence)
>>> +{
>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>> +                         typeof(*req), fence);
>>> +    return req->ring->name;
>>> +}
>>> +
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +{
>>> +    /* Interrupt driven fences are not implemented yet.*/
>>> +    WARN(true, "This should not be called!");
>>> +    return true;
>>
>> I suppose WARN is not really needed in the interim patch. Would return
>> false work?
>
> The point is to keep the driver 'safe' if that patch is viewed as stand
> alone. I.e., if the interrupt follow up does not get merged immediately
> after (or not at all) then this prevents people accidentally trying to
> use an unsupported interface without first implementing it.

I assumed true means "signaling enabled successfully" but "false" 
actually means "fence already passed" so you are right. I blame the 
un-intuitive API. :)

>>> +    fence_base = fence_context_alloc(I915_NUM_RINGS);
>>> +
>>>       /* Now it is safe to go back round and do everything else: */
>>>       for_each_ring(ring, dev_priv, i) {
>>>           struct drm_i915_gem_request *req;
>>>
>>>           WARN_ON(!ring->default_context);
>>>
>>> +        ring->fence_context = fence_base + i;
>>
>> Could you store fence_base in dev_priv and then ring->init_hw could
>> set up the fence_context on its own?
>
> Probably. It seemed to make more sense to me to keep the fence
> allocation all in one place rather than splitting it in half and
> distributing it. It gets removed again with the per context per ring
> timelines anyway.

Yes saw that later, never mind then.

Tvrtko





_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 3/9] drm/i915: Convert requests to use struct fence
  2015-07-28 10:18     ` John Harrison
@ 2015-08-03  9:18       ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-08-03  9:18 UTC (permalink / raw)
  To: John Harrison, Intel-GFX


On 07/28/2015 11:18 AM, John Harrison wrote:
> On 22/07/2015 15:45, Tvrtko Ursulin wrote:
>>
>> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> There is a construct in the linux kernel called 'struct fence' that
>>> is intended
>>> to keep track of work that is executed on hardware. I.e. it solves
>>> the basic
>>> problem that the drivers 'struct drm_i915_gem_request' is trying to
>>> address. The
>>> request structure does quite a lot more than simply track the
>>> execution progress
>>> so is very definitely still required. However, the basic completion
>>> status side
>>> could be updated to use the ready made fence implementation and gain
>>> all the
>>> advantages that provides.
>>>
>>> This patch makes the first step of integrating a struct fence into
>>> the request.
>>> It replaces the explicit reference count with that of the fence. It also
>>> replaces the 'is completed' test with the fence's equivalent.
>>> Currently, that
>>> simply chains on to the original request implementation. A future
>>> patch will
>>> improve this.
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++------------
>>>   drivers/gpu/drm/i915/i915_gem.c         | 58
>>> ++++++++++++++++++++++++++++++---
>>>   drivers/gpu/drm/i915/intel_lrc.c        |  1 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
>>>   5 files changed, 80 insertions(+), 28 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index cf6761c..79d346c 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -50,6 +50,7 @@
>>>   #include <linux/intel-iommu.h>
>>>   #include <linux/kref.h>
>>>   #include <linux/pm_qos.h>
>>> +#include <linux/fence.h>
>>>
>>>   /* General customization:
>>>    */
>>> @@ -2150,7 +2151,17 @@ void i915_gem_track_fb(struct
>>> drm_i915_gem_object *old,
>>>    * initial reference taken using kref_init
>>>    */
>>>   struct drm_i915_gem_request {
>>> -    struct kref ref;
>>> +    /**
>>> +     * Underlying object for implementing the signal/wait stuff.
>>> +     * NB: Never call fence_later() or return this fence object to user
>>> +     * land! Due to lazy allocation, scheduler re-ordering,
>>> pre-emption,
>>> +     * etc., there is no guarantee at all about the validity or
>>> +     * sequentiality of the fence's seqno! It is also unsafe to let
>>> +     * anything outside of the i915 driver get hold of the fence object
>>> +     * as the clean up when decrementing the reference count requires
>>> +     * holding the driver mutex lock.
>>> +     */
>>> +    struct fence fence;
>>>
>>>       /** On Which ring this request was generated */
>>>       struct drm_i915_private *i915;
>>> @@ -2227,7 +2238,13 @@ int i915_gem_request_alloc(struct
>>> intel_engine_cs *ring,
>>>                  struct intel_context *ctx,
>>>                  struct drm_i915_gem_request **req_out);
>>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>> -void i915_gem_request_free(struct kref *req_ref);
>>> +
>>> +static inline bool i915_gem_request_completed(struct
>>> drm_i915_gem_request *req,
>>> +                          bool lazy_coherency)
>>> +{
>>> +    return fence_is_signaled(&req->fence);
>>> +}
>>> +
>>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>>                      struct drm_file *file);
>>>
>>> @@ -2247,7 +2264,7 @@ static inline struct drm_i915_gem_request *
>>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>>   {
>>>       if (req)
>>> -        kref_get(&req->ref);
>>> +        fence_get(&req->fence);
>>>       return req;
>>>   }
>>>
>>> @@ -2255,7 +2272,7 @@ static inline void
>>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>>   {
>>> WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>>> -    kref_put(&req->ref, i915_gem_request_free);
>>> +    fence_put(&req->fence);
>>>   }
>>>
>>>   static inline void
>>> @@ -2267,7 +2284,7 @@ i915_gem_request_unreference__unlocked(struct
>>> drm_i915_gem_request *req)
>>>           return;
>>>
>>>       dev = req->ring->dev;
>>> -    if (kref_put_mutex(&req->ref, i915_gem_request_free,
>>> &dev->struct_mutex))
>>> +    if (kref_put_mutex(&req->fence.refcount, fence_release,
>>> &dev->struct_mutex))
>>>           mutex_unlock(&dev->struct_mutex);
>>>   }
>>>
>>> @@ -2284,12 +2301,6 @@ static inline void
>>> i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>>   }
>>>
>>>   /*
>>> - * XXX: i915_gem_request_completed should be here but currently
>>> needs the
>>> - * definition of i915_seqno_passed() which is below. It will be
>>> moved in
>>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>>> - */
>>> -
>>> -/*
>>>    * A command that requires special handling by the command parser.
>>>    */
>>>   struct drm_i915_cmd_descriptor {
>>> @@ -2851,18 +2862,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>>       return (int32_t)(seq1 - seq2) >= 0;
>>>   }
>>>
>>> -static inline bool i915_gem_request_completed(struct
>>> drm_i915_gem_request *req,
>>> -                          bool lazy_coherency)
>>> -{
>>> -    u32 seqno;
>>> -
>>> -    BUG_ON(req == NULL);
>>> -
>>> -    seqno = req->ring->get_seqno(req->ring, lazy_coherency);
>>> -
>>> -    return i915_seqno_passed(seqno, req->seqno);
>>> -}
>>> -
>>>   int __must_check i915_gem_get_seqno(struct drm_device *dev, u32
>>> *seqno);
>>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32
>>> seqno);
>>>   int __must_check i915_gem_object_get_fence(struct
>>> drm_i915_gem_object *obj);
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index d9f2701..888bb72 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -2616,12 +2616,14 @@ static void i915_set_reset_status(struct
>>> drm_i915_private *dev_priv,
>>>       }
>>>   }
>>>
>>> -void i915_gem_request_free(struct kref *req_ref)
>>> +static void i915_gem_request_free(struct fence *req_fence)
>>>   {
>>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>>> -                         typeof(*req), ref);
>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>> +                         typeof(*req), fence);
>>>       struct intel_context *ctx = req->ctx;
>>>
>>> + BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>>> +
>>>       if (req->file_priv)
>>>           i915_gem_request_remove_from_client(req);
>>>
>>> @@ -2637,6 +2639,47 @@ void i915_gem_request_free(struct kref *req_ref)
>>>       kmem_cache_free(req->i915->requests, req);
>>>   }
>>>
>>> +static const char *i915_gem_request_get_driver_name(struct fence
>>> *req_fence)
>>> +{
>>> +    return "i915_request";
>>> +}
>>> +
>>> +static const char *i915_gem_request_get_timeline_name(struct fence
>>> *req_fence)
>>> +{
>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>> +                         typeof(*req), fence);
>>> +    return req->ring->name;
>>> +}
>>> +
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +{
>>> +    /* Interrupt driven fences are not implemented yet.*/
>>> +    WARN(true, "This should not be called!");
>>> +    return true;
>>> +}
>>> +
>>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>>> +{
>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>> +                         typeof(*req), fence);
>>> +    u32 seqno;
>>> +
>>> +    BUG_ON(req == NULL);
>>> +
>>> +    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>>> +
>>> +    return i915_seqno_passed(seqno, req->seqno);
>>> +}
>>
>> How does this really work? I don't see any fence code calling this,
>> plus, this patch is not doing fence_signal anywhere. So is the whole
>> thing functional at this point?
>
> Do you mean fence code calling i915_gem_request_is_completed? It is a
> callback in the fence ops structure assigned a few lines lower in the
> patch:
>  > + .signaled = i915_gem_request_is_completed,
>
> Whenever 'fence_is_signaled(&fence)' is called it basically chains on to
> the .signalled callback inside the fence implementation. When the
> interrupt version comes in with the later patch, this is removed as the
> fence is then tracking its completion state and does not need to do the
> hardware query every time. However, at this point it still does need to
> chain on to reading the HWS. And yes, it is certainly supposed to be
> fully functional at this point! It certainly was when I was testing it.

Yeah look OK. I probably missed something when initialy reading this, or 
confused the two variants of "is complete" / "completed".

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-27 14:00     ` Daniel Vetter
@ 2015-08-03  9:20       ` Tvrtko Ursulin
  2015-08-05  8:05         ` Daniel Vetter
  0 siblings, 1 reply; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-08-03  9:20 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Intel-GFX


On 07/27/2015 03:00 PM, Daniel Vetter wrote:
> On Mon, Jul 27, 2015 at 02:20:43PM +0100, Tvrtko Ursulin wrote:
>>
>> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The intended usage model for struct fence is that the signalled status should be
>>> set on demand rather than polled. That is, there should not be a need for a
>>> 'signaled' function to be called everytime the status is queried. Instead,
>>> 'something' should be done to enable a signal callback from the hardware which
>>> will update the state directly. In the case of requests, this is the seqno
>>> update interrupt. The idea is that this callback will only be enabled on demand
>>> when something actually tries to wait on the fence.
>>>
>>> This change removes the polling test and replaces it with the callback scheme.
>>> Each fence is added to a 'please poke me' list at the start of
>>> i915_add_request(). The interrupt handler then scans through the 'poke me' list
>>> when a new seqno pops out and signals any matching fence/request. The fence is
>>> then removed from the list so the entire request stack does not need to be
>>> scanned every time. Note that the fence is added to the list before the commands
>>> to generate the seqno interrupt are added to the ring. Thus the sequence is
>>> guaranteed to be race free if the interrupt is already enabled.
>>>
>>> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
>>> called). Thus there is still a potential race when enabling the interrupt as the
>>> request may already have completed. However, this is simply solved by calling
>>> the interrupt processing code immediately after enabling the interrupt and
>>> thereby checking for already completed requests.
>>>
>>> Lastly, the ring clean up code has the possibility to cancel outstanding
>>> requests (e.g. because TDR has reset the ring). These requests will never get
>>> signalled and so must be removed from the signal list manually. This is done by
>>> setting a 'cancelled' flag and then calling the regular notify/retire code path
>>> rather than attempting to duplicate the list manipulatation and clean up code in
>>> multiple places. This also avoid any race condition where the cancellation
>>> request might occur after/during the completion interrupt actually arriving.
>>>
>>> v2: Updated to take advantage of the request unreference no longer requiring the
>>> mutex lock.
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>
>> [snip]
>>
>>> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>>   	list_del_init(&request->list);
>>>   	i915_gem_request_remove_from_client(request);
>>>
>>> +	/* In case the request is still in the signal pending list */
>>> +	if (!list_empty(&request->signal_list))
>>> +		request->cancelled = true;
>>> +
>>
>> Another thing I did not see implemented is the sync_fence error state.
>>
>> This is more about the Android part, but related to this canceled flag so I
>> am commenting here.
>>
>> I thought when TDR kicks in and we set request->cancelled to true, there
>> should be a code path which somehow makes sync_fence->status negative.
>>
>> As it is, because fence_signal will not be called on canceled, I thought
>> waiters will wait until timeout, rather than being woken up and return error
>> status.
>>
>> For this to work you would somehow need to make sync_fence->status go
>> negative. With normal fence completion it goes from 1 -> 0, via the
>> completion callback. I did not immediately see how to make it go negative
>> using the existing API.
>
> I think back when we did struct fence we decided that we won't care yet
> about forwarding error state since doing that across drivers if you have a
> chain of fences looked complicated. And no one had any clear idea about
> what kind of semantics we really want. If we want this we'd need to add
> it, but probably better to do that as a follow-up (usual caveat about
> open-source userspace and demonstration vehicles apply and all that).

Hm, I am not sure but it feels to me not having an error state is a 
problem. Without it userspace can just keep waiting and waiting upon a 
fence even though the driver has discarded that workload and never plans 
to resubmit it. Am I missing something?

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-08-03  9:20       ` Tvrtko Ursulin
@ 2015-08-05  8:05         ` Daniel Vetter
  2015-08-05 11:05           ` Maarten Lankhorst
  0 siblings, 1 reply; 38+ messages in thread
From: Daniel Vetter @ 2015-08-05  8:05 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-GFX

On Mon, Aug 03, 2015 at 10:20:29AM +0100, Tvrtko Ursulin wrote:
> 
> On 07/27/2015 03:00 PM, Daniel Vetter wrote:
> >On Mon, Jul 27, 2015 at 02:20:43PM +0100, Tvrtko Ursulin wrote:
> >>
> >>On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> >>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>>The intended usage model for struct fence is that the signalled status should be
> >>>set on demand rather than polled. That is, there should not be a need for a
> >>>'signaled' function to be called everytime the status is queried. Instead,
> >>>'something' should be done to enable a signal callback from the hardware which
> >>>will update the state directly. In the case of requests, this is the seqno
> >>>update interrupt. The idea is that this callback will only be enabled on demand
> >>>when something actually tries to wait on the fence.
> >>>
> >>>This change removes the polling test and replaces it with the callback scheme.
> >>>Each fence is added to a 'please poke me' list at the start of
> >>>i915_add_request(). The interrupt handler then scans through the 'poke me' list
> >>>when a new seqno pops out and signals any matching fence/request. The fence is
> >>>then removed from the list so the entire request stack does not need to be
> >>>scanned every time. Note that the fence is added to the list before the commands
> >>>to generate the seqno interrupt are added to the ring. Thus the sequence is
> >>>guaranteed to be race free if the interrupt is already enabled.
> >>>
> >>>Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> >>>called). Thus there is still a potential race when enabling the interrupt as the
> >>>request may already have completed. However, this is simply solved by calling
> >>>the interrupt processing code immediately after enabling the interrupt and
> >>>thereby checking for already completed requests.
> >>>
> >>>Lastly, the ring clean up code has the possibility to cancel outstanding
> >>>requests (e.g. because TDR has reset the ring). These requests will never get
> >>>signalled and so must be removed from the signal list manually. This is done by
> >>>setting a 'cancelled' flag and then calling the regular notify/retire code path
> >>>rather than attempting to duplicate the list manipulatation and clean up code in
> >>>multiple places. This also avoid any race condition where the cancellation
> >>>request might occur after/during the completion interrupt actually arriving.
> >>>
> >>>v2: Updated to take advantage of the request unreference no longer requiring the
> >>>mutex lock.
> >>>
> >>>For: VIZ-5190
> >>>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>>---
> >>
> >>[snip]
> >>
> >>>@@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
> >>>  	list_del_init(&request->list);
> >>>  	i915_gem_request_remove_from_client(request);
> >>>
> >>>+	/* In case the request is still in the signal pending list */
> >>>+	if (!list_empty(&request->signal_list))
> >>>+		request->cancelled = true;
> >>>+
> >>
> >>Another thing I did not see implemented is the sync_fence error state.
> >>
> >>This is more about the Android part, but related to this canceled flag so I
> >>am commenting here.
> >>
> >>I thought when TDR kicks in and we set request->cancelled to true, there
> >>should be a code path which somehow makes sync_fence->status negative.
> >>
> >>As it is, because fence_signal will not be called on canceled, I thought
> >>waiters will wait until timeout, rather than being woken up and return error
> >>status.
> >>
> >>For this to work you would somehow need to make sync_fence->status go
> >>negative. With normal fence completion it goes from 1 -> 0, via the
> >>completion callback. I did not immediately see how to make it go negative
> >>using the existing API.
> >
> >I think back when we did struct fence we decided that we won't care yet
> >about forwarding error state since doing that across drivers if you have a
> >chain of fences looked complicated. And no one had any clear idea about
> >what kind of semantics we really want. If we want this we'd need to add
> >it, but probably better to do that as a follow-up (usual caveat about
> >open-source userspace and demonstration vehicles apply and all that).
> 
> Hm, I am not sure but it feels to me not having an error state is a problem.
> Without it userspace can just keep waiting and waiting upon a fence even
> though the driver has discarded that workload and never plans to resubmit
> it. Am I missing something?

fences still must complete eventually, they simply won't tell you whether
the gpu died before completing the fence or not. Which is in line with gl
spec for fences and arb_robustness - inquiring about gpu hangs is done
through sideband apis.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-08-05  8:05         ` Daniel Vetter
@ 2015-08-05 11:05           ` Maarten Lankhorst
  0 siblings, 0 replies; 38+ messages in thread
From: Maarten Lankhorst @ 2015-08-05 11:05 UTC (permalink / raw)
  To: Daniel Vetter, Tvrtko Ursulin; +Cc: Intel-GFX

Op 05-08-15 om 10:05 schreef Daniel Vetter:
> On Mon, Aug 03, 2015 at 10:20:29AM +0100, Tvrtko Ursulin wrote:
>> On 07/27/2015 03:00 PM, Daniel Vetter wrote:
>>> On Mon, Jul 27, 2015 at 02:20:43PM +0100, Tvrtko Ursulin wrote:
>>>> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The intended usage model for struct fence is that the signalled status should be
>>>>> set on demand rather than polled. That is, there should not be a need for a
>>>>> 'signaled' function to be called everytime the status is queried. Instead,
>>>>> 'something' should be done to enable a signal callback from the hardware which
>>>>> will update the state directly. In the case of requests, this is the seqno
>>>>> update interrupt. The idea is that this callback will only be enabled on demand
>>>>> when something actually tries to wait on the fence.
>>>>>
>>>>> This change removes the polling test and replaces it with the callback scheme.
>>>>> Each fence is added to a 'please poke me' list at the start of
>>>>> i915_add_request(). The interrupt handler then scans through the 'poke me' list
>>>>> when a new seqno pops out and signals any matching fence/request. The fence is
>>>>> then removed from the list so the entire request stack does not need to be
>>>>> scanned every time. Note that the fence is added to the list before the commands
>>>>> to generate the seqno interrupt are added to the ring. Thus the sequence is
>>>>> guaranteed to be race free if the interrupt is already enabled.
>>>>>
>>>>> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
>>>>> called). Thus there is still a potential race when enabling the interrupt as the
>>>>> request may already have completed. However, this is simply solved by calling
>>>>> the interrupt processing code immediately after enabling the interrupt and
>>>>> thereby checking for already completed requests.
>>>>>
>>>>> Lastly, the ring clean up code has the possibility to cancel outstanding
>>>>> requests (e.g. because TDR has reset the ring). These requests will never get
>>>>> signalled and so must be removed from the signal list manually. This is done by
>>>>> setting a 'cancelled' flag and then calling the regular notify/retire code path
>>>>> rather than attempting to duplicate the list manipulatation and clean up code in
>>>>> multiple places. This also avoid any race condition where the cancellation
>>>>> request might occur after/during the completion interrupt actually arriving.
>>>>>
>>>>> v2: Updated to take advantage of the request unreference no longer requiring the
>>>>> mutex lock.
>>>>>
>>>>> For: VIZ-5190
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> ---
>>>> [snip]
>>>>
>>>>> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>>>>  	list_del_init(&request->list);
>>>>>  	i915_gem_request_remove_from_client(request);
>>>>>
>>>>> +	/* In case the request is still in the signal pending list */
>>>>> +	if (!list_empty(&request->signal_list))
>>>>> +		request->cancelled = true;
>>>>> +
>>>> Another thing I did not see implemented is the sync_fence error state.
>>>>
>>>> This is more about the Android part, but related to this canceled flag so I
>>>> am commenting here.
>>>>
>>>> I thought when TDR kicks in and we set request->cancelled to true, there
>>>> should be a code path which somehow makes sync_fence->status negative.
>>>>
>>>> As it is, because fence_signal will not be called on canceled, I thought
>>>> waiters will wait until timeout, rather than being woken up and return error
>>>> status.
>>>>
>>>> For this to work you would somehow need to make sync_fence->status go
>>>> negative. With normal fence completion it goes from 1 -> 0, via the
>>>> completion callback. I did not immediately see how to make it go negative
>>>> using the existing API.
>>> I think back when we did struct fence we decided that we won't care yet
>>> about forwarding error state since doing that across drivers if you have a
>>> chain of fences looked complicated. And no one had any clear idea about
>>> what kind of semantics we really want. If we want this we'd need to add
>>> it, but probably better to do that as a follow-up (usual caveat about
>>> open-source userspace and demonstration vehicles apply and all that).
>> Hm, I am not sure but it feels to me not having an error state is a problem.
>> Without it userspace can just keep waiting and waiting upon a fence even
>> though the driver has discarded that workload and never plans to resubmit
>> it. Am I missing something?
> fences still must complete eventually, they simply won't tell you whether
> the gpu died before completing the fence or not. Which is in line with gl
> spec for fences and arb_robustness - inquiring about gpu hangs is done
> through sideband apis.
Actually you can tell. Set fence->status before signalling.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 5/9] drm/i915: Add per context timelines to fence object
  2015-07-23 13:50   ` Tvrtko Ursulin
@ 2015-10-28 12:59     ` John Harrison
  2015-11-17 13:54       ` Daniel Vetter
  0 siblings, 1 reply; 38+ messages in thread
From: John Harrison @ 2015-10-28 12:59 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

Have finally had some time to come back to this and respond 
to/incorporate the comments made some while ago...


On 23/07/2015 14:50, Tvrtko Ursulin wrote:
> Hi,
>
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The fence object used inside the request structure requires a 
>> sequence number.
>> Although this is not used by the i915 driver itself, it could 
>> potentially be
>> used by non-i915 code if the fence is passed outside of the driver. 
>> This is the
>> intention as it allows external kernel drivers and user applications 
>> to wait on
>> batch buffer completion asynchronously via the dma-buff fence API.
>>
>> To ensure that such external users are not confused by strange things 
>> happening
>> with the seqno, this patch adds in a per context timeline that can 
>> provide a
>> guaranteed in-order seqno value for the fence. This is safe because the
>> scheduler will not re-order batch buffers within a context - they are 
>> considered
>> to be mutually dependent.
>>
>> [new patch in series]
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 25 ++++++++----
>>   drivers/gpu/drm/i915/i915_gem.c         | 69 
>> ++++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++-
>>   drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>>   5 files changed, 103 insertions(+), 15 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 0c7df46..88a4746 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -840,6 +840,15 @@ struct i915_ctx_hang_stats {
>>       bool banned;
>>   };
>>
>> +struct i915_fence_timeline {
>> +    unsigned    fence_context;
>> +    uint32_t    context;
>
> Unused field?
>
>> +    uint32_t    next;
>
> fence.h defines seqnos as 'unsigned', which matches this in practice, 
> but maybe it would be nicer to use the same type name.
>
>> +
>> +    struct intel_context *ctx;
>> +    struct intel_engine_cs *ring;
>> +};
>> +
>>   /* This must match up with the value previously used for 
>> execbuf2.rsvd1. */
>>   #define DEFAULT_CONTEXT_HANDLE 0
>>
>> @@ -885,6 +894,7 @@ struct intel_context {
>>           struct drm_i915_gem_object *state;
>>           struct intel_ringbuffer *ringbuf;
>>           int pin_count;
>> +        struct i915_fence_timeline fence_timeline;
>>       } engine[I915_NUM_RINGS];
>>
>>       struct list_head link;
>> @@ -2153,13 +2163,10 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>   struct drm_i915_gem_request {
>>       /**
>>        * Underlying object for implementing the signal/wait stuff.
>> -     * NB: Never call fence_later() or return this fence object to user
>> -     * land! Due to lazy allocation, scheduler re-ordering, 
>> pre-emption,
>> -     * etc., there is no guarantee at all about the validity or
>> -     * sequentiality of the fence's seqno! It is also unsafe to let
>> -     * anything outside of the i915 driver get hold of the fence object
>> -     * as the clean up when decrementing the reference count requires
>> -     * holding the driver mutex lock.
>> +     * NB: Never return this fence object to user land! It is unsafe to
>> +     * let anything outside of the i915 driver get hold of the fence
>> +     * object as the clean up when decrementing the reference count
>> +     * requires holding the driver mutex lock.
>>        */
>>       struct fence fence;
>>
>> @@ -2239,6 +2246,10 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>                  struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>
>> +int i915_create_fence_timeline(struct drm_device *dev,
>> +                   struct intel_context *ctx,
>> +                   struct intel_engine_cs *ring);
>> +
>>   static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req)
>>   {
>>       return fence_is_signaled(&req->fence);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 3970250..af79716 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2671,6 +2671,25 @@ static bool 
>> i915_gem_request_is_completed(struct fence *req_fence)
>>       return i915_seqno_passed(seqno, req->seqno);
>>   }
>>
>> +static void i915_fence_timeline_value_str(struct fence *fence, char 
>> *str, int size)
>> +{
>> +    struct drm_i915_gem_request *req;
>> +
>> +    req = container_of(fence, typeof(*req), fence);
>> +
>> +    /* Last signalled timeline value ??? */
>> +    snprintf(str, size, "? [%d]"/*, tl->value*/, 
>> req->ring->get_seqno(req->ring, true));
>> +}
>
> If timelines are per context now maybe we should update 
> i915_gem_request_get_timeline_name to be per context instead of per 
> engine as well? Like this we have a name space overlap / seqno 
> collisions from userspace point of view.
>
>> +static void i915_fence_value_str(struct fence *fence, char *str, int 
>> size)
>> +{
>> +    struct drm_i915_gem_request *req;
>> +
>> +    req = container_of(fence, typeof(*req), fence);
>> +
>> +    snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
>> +}
>> +
>>   static const struct fence_ops i915_gem_request_fops = {
>>       .get_driver_name    = i915_gem_request_get_driver_name,
>>       .get_timeline_name    = i915_gem_request_get_timeline_name,
>> @@ -2678,8 +2697,48 @@ static const struct fence_ops 
>> i915_gem_request_fops = {
>>       .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>>       .release        = i915_gem_request_free,
>> +    .fence_value_str    = i915_fence_value_str,
>> +    .timeline_value_str    = i915_fence_timeline_value_str,
>>   };
>>
>> +int i915_create_fence_timeline(struct drm_device *dev,
>> +                   struct intel_context *ctx,
>> +                   struct intel_engine_cs *ring)
>> +{
>> +    struct i915_fence_timeline *timeline;
>> +
>> +    timeline = &ctx->engine[ring->id].fence_timeline;
>> +
>> +    if (timeline->ring)
>> +        return 0;
>> +
>> +    timeline->fence_context = fence_context_alloc(1);
>> +
>> +    /*
>> +     * Start the timeline from seqno 0 as this is a special value
>> +     * that is reserved for invalid sync points.
>> +     */
>> +    timeline->next       = 1;
>> +    timeline->ctx        = ctx;
>> +    timeline->ring       = ring;
>> +
>> +    return 0;
>> +}
>> +
>> +static uint32_t i915_fence_timeline_get_next_seqno(struct 
>> i915_fence_timeline *timeline)
>> +{
>> +    uint32_t seqno;
>> +
>> +    seqno = timeline->next;
>> +
>> +    /* Reserve zero for invalid */
>> +    if (++timeline->next == 0 ) {
>> +        timeline->next = 1;
>> +    }
>> +
>> +    return seqno;
>> +}
>> +
>>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>                  struct intel_context *ctx,
>>                  struct drm_i915_gem_request **req_out)
>> @@ -2715,7 +2774,9 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>           goto err;
>>       }
>>
>> -    fence_init(&req->fence, &i915_gem_request_fops, 
>> &ring->fence_lock, ring->fence_context, req->seqno);
>> +    fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>> + ctx->engine[ring->id].fence_timeline.fence_context,
>> + 
>> i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
>
> I suppose for debugging it could be useful to add this new seqno in 
> i915_gem_request_info to have visibility at both sides. To map 
> userspace seqnos to driver state.
>
>>       /*
>>        * Reserve space in the ring buffer for all the commands 
>> required to
>> @@ -5065,7 +5126,7 @@ i915_gem_init_hw(struct drm_device *dev)
>>   {
>>       struct drm_i915_private *dev_priv = dev->dev_private;
>>       struct intel_engine_cs *ring;
>> -    int ret, i, j, fence_base;
>> +    int ret, i, j;
>>
>>       if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
>>           return -EIO;
>> @@ -5117,16 +5178,12 @@ i915_gem_init_hw(struct drm_device *dev)
>>               goto out;
>>       }
>>
>> -    fence_base = fence_context_alloc(I915_NUM_RINGS);
>> -
>>       /* Now it is safe to go back round and do everything else: */
>>       for_each_ring(ring, dev_priv, i) {
>>           struct drm_i915_gem_request *req;
>>
>>           WARN_ON(!ring->default_context);
>>
>> -        ring->fence_context = fence_base + i;
>> -
>>           ret = i915_gem_request_alloc(ring, ring->default_context, 
>> &req);
>>           if (ret) {
>>               i915_gem_cleanup_ringbuffer(dev);
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index b77a8f7..7eb8694 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -242,7 +242,7 @@ i915_gem_create_context(struct drm_device *dev,
>>   {
>>       const bool is_global_default_ctx = file_priv == NULL;
>>       struct intel_context *ctx;
>> -    int ret = 0;
>> +    int i, ret = 0;
>>
>>       BUG_ON(!mutex_is_locked(&dev->struct_mutex));
>>
>> @@ -250,6 +250,19 @@ i915_gem_create_context(struct drm_device *dev,
>>       if (IS_ERR(ctx))
>>           return ctx;
>>
>> +    if (!i915.enable_execlists) {
>> +        struct intel_engine_cs *ring;
>> +
>> +        /* Create a per context timeline for fences */
>> +        for_each_ring(ring, to_i915(dev), i) {
>> +            ret = i915_create_fence_timeline(dev, ctx, ring);
>> +            if (ret) {
>> +                DRM_ERROR("Fence timeline creation failed for legacy 
>> %s: %p\n", ring->name, ctx);
>> +                goto err_destroy;
>> +            }
>> +        }
>> +    }
>> +
>>       if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
>>           /* We may need to do things with the shrinker which
>>            * require us to immediately switch back to the default
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index ee4aecd..8f255de 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2376,6 +2376,14 @@ int intel_lr_context_deferred_create(struct 
>> intel_context *ctx,
>>           goto error;
>>       }
>>
>> +    /* Create a per context timeline for fences */
>> +    ret = i915_create_fence_timeline(dev, ctx, ring);
>> +    if (ret) {
>> +        DRM_ERROR("Fence timeline creation failed for ring %s, ctx 
>> %p\n",
>> +              ring->name, ctx);
>> +        goto error;
>> +    }
>> +
>
> We must be 100% sure userspace cannot provoke context creation failure 
> by accident or deliberately. Otherwise we would leak fence contexts 
> until overflow which would be bad.
>
> Perhaps matching fence_context_release for existing 
> fence_context_alloc should be added?

Note that there is no fence_context_release. The fence_context_alloc 
code is simply 'return static_count++;'. There is no overflow checking. 
There is no anti-re-use checking. When 4GB contexts have been allocated, 
the old ones will get re-allocated and if they are still in use then 
tough. It's a really bad API! On the other hand, the context is not 
actually used for anything. So it doesn't really matter.


>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 6/9] drm/i915: Delay the freeing of requests until retire time
  2015-07-23 14:25   ` Tvrtko Ursulin
@ 2015-10-28 13:00     ` John Harrison
  2015-10-28 13:42       ` Tvrtko Ursulin
  0 siblings, 1 reply; 38+ messages in thread
From: John Harrison @ 2015-10-28 13:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 23/07/2015 15:25, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The request structure is reference counted. When the count reached
>> zero, the request was immediately freed and all associated objects
>> were unrefereced/unallocated. This meant that the driver mutex lock
>> must be held at the point where the count reaches zero. This was fine
>> while all references were held internally to the driver. However, the
>> plan is to allow the underlying fence object (and hence the request
>> itself) to be returned to other drivers and to userland. External
>> users cannot be expected to acquire a driver private mutex lock.
>>
>> Rather than attempt to disentangle the request structure from the
>> driver mutex lock, the decsion was to defer the free code until a
>> later (safer) point. Hence this patch changes the unreference callback
>> to merely move the request onto a delayed free list. The driver's
>> retire worker thread will then process the list and actually call the
>> free function on the requests.
>>
>> [new patch in series]
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 22 +++---------------
>>   drivers/gpu/drm/i915/i915_gem.c         | 41 
>> +++++++++++++++++++++++++++++----
>>   drivers/gpu/drm/i915/intel_display.c    |  2 +-
>>   drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
>>   drivers/gpu/drm/i915/intel_pm.c         |  2 +-
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
>>   7 files changed, 50 insertions(+), 25 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 88a4746..61c3db2 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2161,14 +2161,9 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -    /**
>> -     * Underlying object for implementing the signal/wait stuff.
>> -     * NB: Never return this fence object to user land! It is unsafe to
>> -     * let anything outside of the i915 driver get hold of the fence
>> -     * object as the clean up when decrementing the reference count
>> -     * requires holding the driver mutex lock.
>> -     */
>> +    /** Underlying object for implementing the signal/wait stuff. */
>>       struct fence fence;
>> +    struct list_head delay_free_list;
>
> Maybe call this delay_free_link to continue the established convention.
>
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2281,21 +2276,10 @@ i915_gem_request_reference(struct 
>> drm_i915_gem_request *req)
>>   static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>> - WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> -    fence_put(&req->fence);
>> -}
>> -
>> -static inline void
>> -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request 
>> *req)
>> -{
>> -    struct drm_device *dev;
>> -
>>       if (!req)
>>           return;
>>
>> -    dev = req->ring->dev;
>> -    if (kref_put_mutex(&req->fence.refcount, fence_release, 
>> &dev->struct_mutex))
>> -        mutex_unlock(&dev->struct_mutex);
>> +    fence_put(&req->fence);
>>   }
>>
>>   static inline void i915_gem_request_assign(struct 
>> drm_i915_gem_request **pdst,
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index af79716..482835a 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2616,10 +2616,27 @@ static void i915_set_reset_status(struct 
>> drm_i915_private *dev_priv,
>>       }
>>   }
>>
>> -static void i915_gem_request_free(struct fence *req_fence)
>> +static void i915_gem_request_release(struct fence *req_fence)
>>   {
>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>                            typeof(*req), fence);
>> +    struct intel_engine_cs *ring = req->ring;
>> +    struct drm_i915_private *dev_priv = to_i915(ring->dev);
>> +    unsigned long flags;
>> +
>> +    /*
>> +     * Need to add the request to a deferred dereference list to be
>> +     * processed at a mutex lock safe time.
>> +     */
>> +    spin_lock_irqsave(&ring->delayed_free_lock, flags);
>
> At the moment there is no request unreferencing from irq handlers 
> right? Unless (or until) you plan to add that you could use simple 
> spin_lock here. (And in the i915_gem_retire_requests_ring.)

I don't believe there is an unreference at IRQ time at this precise 
moment. However, there certainly have been in various other iterations 
of the code (including one on the display side that has since 
disappeared due to changes by others completely unrelated to this work). 
So I would be nervous about not making it IRQ compatible. It seems like 
a bug waiting to happen.


>> + list_add_tail(&req->delay_free_list, &ring->delayed_free_list);
>> +    spin_unlock_irqrestore(&ring->delayed_free_lock, flags);
>> +
>> +    queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
>
> Have you decided to re-use the retire worker just for convenience of 
> for some other reason as well?

It seemed like the logical place to do this. It is a periodic function 
that cleans up requests. Converting the unreference into a free isn't a 
hugely time critical thing so adding an entire dedicated work handler 
seems like overkill. Plus, retire requests is usually the place that 
releases the last reference so it makes sense to do the free at the end 
of that code.

>
> I found it a bit unexpected and though dedicated request free worker 
> would be cleaner, but I don't know, not a strong opinion.
>
>> +}
>> +
>> +static void i915_gem_request_free(struct drm_i915_gem_request *req)
>> +{
>>       struct intel_context *ctx = req->ctx;
>>
>> BUG_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>> @@ -2696,7 +2713,7 @@ static const struct fence_ops 
>> i915_gem_request_fops = {
>>       .enable_signaling    = i915_gem_request_enable_signaling,
>>       .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>> -    .release        = i915_gem_request_free,
>> +    .release        = i915_gem_request_release,
>>       .fence_value_str    = i915_fence_value_str,
>>       .timeline_value_str    = i915_fence_timeline_value_str,
>>   };
>> @@ -2992,6 +3009,21 @@ i915_gem_retire_requests_ring(struct 
>> intel_engine_cs *ring)
>>           i915_gem_request_assign(&ring->trace_irq_req, NULL);
>>       }
>>
>> +    while (!list_empty(&ring->delayed_free_list)) {
>> +        struct drm_i915_gem_request *request;
>> +        unsigned long flags;
>> +
>> +        request = list_first_entry(&ring->delayed_free_list,
>> +                       struct drm_i915_gem_request,
>> +                       delay_free_list);
>
> Need a spinlock to sample list head here. Then maybe move it on a 
> temporary list and do the freeing afterwards.

Not necessary. The only other usage of the list is to add to it. So this 
code can't pull an entry that gets removed beneath its feet. Either the 
list empty test will return true and nothing further happens or there is 
definitely a node on the list and list_first_entry() will return 
something sane. The spinlock is only required when actually deleting 
that node.


>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 7/9] drm/i915: Interrupt driven fences
  2015-07-27 11:33   ` Tvrtko Ursulin
@ 2015-10-28 13:00     ` John Harrison
  0 siblings, 0 replies; 38+ messages in thread
From: John Harrison @ 2015-10-28 13:00 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 27/07/2015 12:33, Tvrtko Ursulin wrote:
>
> Hi,
>
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The intended usage model for struct fence is that the signalled 
>> status should be
>> set on demand rather than polled. That is, there should not be a need 
>> for a
>> 'signaled' function to be called everytime the status is queried. 
>> Instead,
>> 'something' should be done to enable a signal callback from the 
>> hardware which
>> will update the state directly. In the case of requests, this is the 
>> seqno
>> update interrupt. The idea is that this callback will only be enabled 
>> on demand
>> when something actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the 
>> callback scheme.
>> Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 
>> 'poke me' list
>> when a new seqno pops out and signals any matching fence/request. The 
>> fence is
>> then removed from the list so the entire request stack does not need 
>> to be
>> scanned every time. Note that the fence is added to the list before 
>> the commands
>> to generate the seqno interrupt are added to the ring. Thus the 
>> sequence is
>> guaranteed to be race free if the interrupt is already enabled.
>>
>> Note that the interrupt is only enabled on demand (i.e. when 
>> __wait_request() is
>> called). Thus there is still a potential race when enabling the 
>> interrupt as the
>> request may already have completed. However, this is simply solved by 
>> calling
>> the interrupt processing code immediately after enabling the 
>> interrupt and
>> thereby checking for already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel outstanding
>> requests (e.g. because TDR has reset the ring). These requests will 
>> never get
>> signalled and so must be removed from the signal list manually. This 
>> is done by
>> setting a 'cancelled' flag and then calling the regular notify/retire 
>> code path
>> rather than attempting to duplicate the list manipulatation and clean 
>> up code in
>> multiple places. This also avoid any race condition where the 
>> cancellation
>> request might occur after/during the completion interrupt actually 
>> arriving.
>>
>> v2: Updated to take advantage of the request unreference no longer 
>> requiring the
>> mutex lock.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         |   8 ++
>>   drivers/gpu/drm/i915/i915_gem.c         | 132 
>> +++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>   drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>>   6 files changed, 136 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 61c3db2..d7f1aa5 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2163,7 +2163,11 @@ void i915_gem_track_fb(struct 
>> drm_i915_gem_object *old,
>>   struct drm_i915_gem_request {
>>       /** Underlying object for implementing the signal/wait stuff. */
>>       struct fence fence;
>> +    struct list_head signal_list;
>> +    struct list_head unsignal_list;
>
> In addition to what Daniel said (one list_head looks enough) it is 
> customary to call it _link.
>
>>       struct list_head delay_free_list;
>> +    bool cancelled;
>> +    bool irq_enabled;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2241,6 +2245,10 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>                  struct drm_i915_gem_request **req_out);
>>   void i915_gem_request_cancel(struct drm_i915_gem_request *req);
>>
>> +void i915_gem_request_submit(struct drm_i915_gem_request *req);
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
>> *req);
>> +void i915_gem_request_notify(struct intel_engine_cs *ring);
>> +
>>   int i915_create_fence_timeline(struct drm_device *dev,
>>                      struct intel_context *ctx,
>>                      struct intel_engine_cs *ring);
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 482835a..7c589a9 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1222,6 +1222,11 @@ int __i915_wait_request(struct 
>> drm_i915_gem_request *req,
>>       if (list_empty(&req->list))
>>           return 0;
>>
>> +    /*
>> +     * Enable interrupt completion of the request.
>> +     */
>> +    i915_gem_request_enable_interrupt(req);
>> +
>>       if (i915_gem_request_completed(req))
>>           return 0;
>>
>> @@ -1382,6 +1387,10 @@ static void i915_gem_request_retire(struct 
>> drm_i915_gem_request *request)
>>       list_del_init(&request->list);
>>       i915_gem_request_remove_from_client(request);
>>
>> +    /* In case the request is still in the signal pending list */
>> +    if (!list_empty(&request->signal_list))
>> +        request->cancelled = true;
>> +
>>       i915_gem_request_unreference(request);
>>   }
>>
>> @@ -2534,6 +2543,12 @@ void __i915_add_request(struct 
>> drm_i915_gem_request *request,
>>        */
>>       request->postfix = intel_ring_get_tail(ringbuf);
>>
>> +    /*
>> +     * Add the fence to the pending list before emitting the 
>> commands to
>> +     * generate a seqno notification interrupt.
>> +     */
>> +    i915_gem_request_submit(request);
>> +
>>       if (i915.enable_execlists)
>>           ret = ring->emit_request(request);
>>       else {
>> @@ -2653,6 +2668,9 @@ static void i915_gem_request_free(struct 
>> drm_i915_gem_request *req)
>>           i915_gem_context_unreference(ctx);
>>       }
>>
>> +    if (req->irq_enabled)
>> +        req->ring->irq_put(req->ring);
>> +
>
> We get here with interrupts still enabled only if userspace is 
> abandoning a wait on an unsignaled fence, did I get that right?
It implies the request has been abandoned in some manner, yes. E.g. TDR 
has killed it, user space has given up, ...

>
>> kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> @@ -2668,24 +2686,105 @@ static const char 
>> *i915_gem_request_get_timeline_name(struct fence *req_fence)
>>       return req->ring->name;
>>   }
>>
>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +/*
>> + * The request has been submitted to the hardware so add the fence 
>> to the
>> + * list of signalable fences.
>> + *
>> + * NB: This does not enable interrupts yet. That only occurs on 
>> demand when
>> + * the request is actually waited on. However, adding it to the list 
>> early
>> + * ensures that there is no race condition where the interrupt could 
>> pop
>> + * out prematurely and thus be completely lost. The race is merely 
>> that the
>> + * interrupt must be manually checked for after being enabled.
>> + */
>> +void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>   {
>> -    /* Interrupt driven fences are not implemented yet.*/
>> -    WARN(true, "This should not be called!");
>> -    return true;
>> +    fence_enable_sw_signaling(&req->fence);
>>   }
>>
>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +/*
>> + * The request is being actively waited on, so enable interrupt based
>> + * completion signalling.
>> + */
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request 
>> *req)
>> +{
>> +    if (req->irq_enabled)
>> +        return;
>> +
>> +    WARN_ON(!req->ring->irq_get(req->ring));
>> +    req->irq_enabled = true;
>
> req->irq_enabled manipulations look racy. Here and in request free it 
> is protected by struct_mutex, but that is not held in 
> i915_gem_request_notify. Initial feeling is you should use 
> ring->fence_lock everyplace you query/manipulate req->irq_enabled.

The only asynchronous access is from _notify() which disables IRQs if 
the flag is set and then clears it. That can't race with the enable 
because the enable only sets the flag after setting IRQs on. The worst 
that can happen on a race is that IRQs are enabled and then immediately 
disabled - truly concurrent execution would result in one test or the 
other failing and so only one code path would be taken. The only other 
usage is in _request_free() but that can only run when the last 
reference has been dropped and that means it is no longer on any list 
that _notify() can see.


>> +
>> +    /*
>> +     * Because the interrupt is only enabled on demand, there is a race
>> +     * where the interrupt can fire before anyone is looking for it. So
>> +     * do an explicit check for missed interrupts.
>> +     */
>> +    i915_gem_request_notify(req->ring);
>> +}
>> +
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>   {
>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>                            typeof(*req), fence);
>> +
>> +    i915_gem_request_reference(req);
>> +    WARN_ON(!list_empty(&req->signal_list));
>
> It looks very unsafe to proceed normally after this WARN_ON. It should 
> probably return false here to preserve data structure sanity.
This really should be a BUG_ON but Daniel doesn't like those. It should 
be an impossible code path and not something that can be hit by the user 
being dumb. Anyway, this code has all been changed in the latest 
incarnation.


>
>> + list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
>> +
>> +    /*
>> +     * Note that signalling is always enabled for every request before
>> +     * that request is submitted to the hardware. Therefore there is
>> +     * no race condition whereby the signal could pop out before the
>> +     * request has been added to the list. Hence no need to check
>> +     * for completion, undo the list add and return false.
>> +     *
>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>> +     * race where the request could complete before the interrupt has
>> +     * been enabled. Thus care must be taken at that point.
>> +     */
>> +
>> +    return true;
>> +}
>> +
>> +void i915_gem_request_notify(struct intel_engine_cs *ring)
>> +{
>> +    struct drm_i915_gem_request *req, *req_next;
>> +    unsigned long flags;
>>       u32 seqno;
>> +    LIST_HEAD(free_list);
>>
>> -    BUG_ON(req == NULL);
>> +    if (list_empty(&ring->fence_signal_list))
>> +        return;
>> +
>> +    seqno = ring->get_seqno(ring, false);
>> +
>> +    spin_lock_irqsave(&ring->fence_lock, flags);
>> +    list_for_each_entry_safe(req, req_next, 
>> &ring->fence_signal_list, signal_list) {
>> +        if (!req->cancelled) {
>> +            if (!i915_seqno_passed(seqno, req->seqno))
>> +                continue;
>>
>> -    seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
>> +            fence_signal_locked(&req->fence);
>> +        }
>> +
>> +        list_del_init(&req->signal_list);
>
> I haven't managed to figure out why is this apparently removing 
> requests which have not been signalled from the signal_list?
>
> Shouldn't they be moved to free_list only if i915_seqno_passed?
Requests are only removed from the signal list if either a) the seqno 
has passed or b) the request has been cancelled and thus will never 
actually complete. Not sure what other scenario you are seeing.

>
>> +        if (req->irq_enabled) {
>> +            req->ring->irq_put(req->ring);
>> +            req->irq_enabled = false;
>> +        }
>>
>> -    return i915_seqno_passed(seqno, req->seqno);
>> +        /* Can't unreference here because that might grab fence_lock */
>> +        list_add_tail(&req->unsignal_list, &free_list);
>> +    }
>> +    spin_unlock_irqrestore(&ring->fence_lock, flags);
>> +
>> +    /* It should now be safe to actually free the requests */
>> +    while (!list_empty(&free_list)) {
>> +        req = list_first_entry(&free_list,
>> +                       struct drm_i915_gem_request, unsignal_list);
>> +        list_del(&req->unsignal_list);
>> +
>> +        i915_gem_request_unreference(req);
>> +    }
>>   }
>>
>>   static void i915_fence_timeline_value_str(struct fence *fence, char 
>> *str, int size)
>> @@ -2711,7 +2810,6 @@ static const struct fence_ops 
>> i915_gem_request_fops = {
>>       .get_driver_name    = i915_gem_request_get_driver_name,
>>       .get_timeline_name    = i915_gem_request_get_timeline_name,
>>       .enable_signaling    = i915_gem_request_enable_signaling,
>> -    .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>>       .release        = i915_gem_request_release,
>>       .fence_value_str    = i915_fence_value_str,
>> @@ -2791,6 +2889,7 @@ int i915_gem_request_alloc(struct 
>> intel_engine_cs *ring,
>>           goto err;
>>       }
>>
>> +    INIT_LIST_HEAD(&req->signal_list);
>>       fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
>> ctx->engine[ring->id].fence_timeline.fence_context,
>> i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
>> @@ -2913,6 +3012,13 @@ static void i915_gem_reset_ring_cleanup(struct 
>> drm_i915_private *dev_priv,
>>
>>           i915_gem_request_retire(request);
>>       }
>> +
>> +    /*
>> +     * Make sure any requests that were on the signal pending list get
>> +     * cleaned up.
>> +     */
>> +    i915_gem_request_notify(ring);
>> +    i915_gem_retire_requests_ring(ring);
>
> Would i915_gem_retire_requests_ring be enough given how it calls 
> i915_gem_request_notify itself as the first thing below?
>
Oops, left over from before retire called notify explicitly.

>>   }
>>
>>   void i915_gem_restore_fences(struct drm_device *dev)
>> @@ -2968,6 +3074,13 @@ i915_gem_retire_requests_ring(struct 
>> intel_engine_cs *ring)
>>   {
>>       WARN_ON(i915_verify_lists(ring->dev));
>>
>> +    /*
>> +     * If no-one has waited on a request recently then interrupts will
>> +     * not have been enabled and thus no requests will ever be 
>> marked as
>> +     * completed. So do an interrupt check now.
>> +     */
>> +    i915_gem_request_notify(ring);
>> +
>>       /* Retire requests first as we use it above for the early return.
>>        * If we retire requests last, we may use a later seqno and so 
>> clear
>>        * the requests lists without clearing the active list, leading to
>> @@ -5345,6 +5458,7 @@ init_ring_lists(struct intel_engine_cs *ring)
>>   {
>>       INIT_LIST_HEAD(&ring->active_list);
>>       INIT_LIST_HEAD(&ring->request_list);
>> +    INIT_LIST_HEAD(&ring->fence_signal_list);
>>       INIT_LIST_HEAD(&ring->delayed_free_list);
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c 
>> b/drivers/gpu/drm/i915/i915_irq.c
>> index d87f173..e446509 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs 
>> *ring)
>>
>>       trace_i915_gem_request_notify(ring);
>>
>> +    i915_gem_request_notify(ring);
>> +
>
> How many requests are typically on signal_list on some typical 
> workloads? This could be a significant performance change since on 
> every user interrupt it would talk it all potentially only removing 
> one request at a time.
>
Obviously, some of the IGT tests can produce very large request lists 
(e.g. gem_exec_nop) but running 'normal' stuff rarely seems to generate 
a long list. E.g. running GLBench + GLXGears on an Ubuntu desktop I get 
95% of the time there are ten or fewer requests in the list but the loop 
iterates only once (49%) or twice (46%) because the first request gets 
signalled and the second (if present) aborts the loop.

The biggest problem seems to be that the hardware is brain dead with 
respect to generating interrupts. So if two requests complete in quick 
succession and the ISR only gets to see the second seqno, it still gets 
called a second time. Thus we actually get a ridiculous figure of 60% of 
ISR calls being no-ops because the seqno has not actually advanced. The 
code now checks for duplicate seqnos and early exits. Not sure how to 
get rid of the call completely.


> These are just review comments on this particular patch without 
> thinking yet of the bigger design questions Daniel has raised.
>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL
  2015-07-27 13:00   ` Tvrtko Ursulin
@ 2015-10-28 13:01     ` John Harrison
  2015-10-28 14:31       ` Tvrtko Ursulin
  2015-11-17 13:59       ` Daniel Vetter
  0 siblings, 2 replies; 38+ messages in thread
From: John Harrison @ 2015-10-28 13:01 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 27/07/2015 14:00, Tvrtko Ursulin wrote:
> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> Various projects desire a mechanism for managing dependencies between
>> work items asynchronously. This can also include work items across
>> complete different and independent systems. For example, an
>> application wants to retreive a frame from a video in device,
>> using it for rendering on a GPU then send it to the video out device
>> for display all without having to stall waiting for completion along
>> the way. The sync framework allows this. It encapsulates
>> synchronisation events in file descriptors. The application can
>> request a sync point for the completion of each piece of work. Drivers
>> should also take sync points in with each new work request and not
>> schedule the work to start until the sync has been signalled.
>>
>> This patch adds sync framework support to the exec buffer IOCTL. A
>> sync point can be passed in to stall execution of the batch buffer
>> until signalled. And a sync point can be returned after each batch
>> buffer submission which will be signalled upon that batch buffer's
>> completion.
>>
>> At present, the input sync point is simply waited on synchronously
>> inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
>> this will be handled asynchronously inside the scheduler and the IOCTL
>> can return without having to wait.
>>
>> Note also that the scheduler will re-order the execution of batch
>> buffers, e.g. because a batch buffer is stalled on a sync point and
>> cannot be submitted yet but other, independent, batch buffers are
>> being presented to the driver. This means that the timeline within the
>> sync points returned cannot be global to the engine. Instead they must
>> be kept per context per engine (the scheduler may not re-order batches
>> within a context). Hence the timeline cannot be based on the existing
>> seqno values but must be a new implementation.
>>
>> This patch is a port of work by several people that has been pulled
>> across from Android. It has been updated several times across several
>> patches. Rather than attempt to port each individual patch, this
>> version is the finished product as a single patch. The various
>> contributors/authors along the way (in addition to myself) were:
>>    Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
>>    Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>    Michel Thierry <michel.thierry@intel.com>
>>    Arun Siluvery <arun.siluvery@linux.intel.com>
>>
>> [new patch in series]
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h            |  6 ++
>>   drivers/gpu/drm/i915/i915_gem.c            | 84 
>> ++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_execbuffer.c | 90 
>> ++++++++++++++++++++++++++++--
>>   include/uapi/drm/i915_drm.h                | 16 +++++-
>>   4 files changed, 188 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index d7f1aa5..cf6b7cd 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2168,6 +2168,7 @@ struct drm_i915_gem_request {
>>       struct list_head delay_free_list;
>>       bool cancelled;
>>       bool irq_enabled;
>> +    bool fence_external;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2252,6 +2253,11 @@ void i915_gem_request_notify(struct 
>> intel_engine_cs *ring);
>>   int i915_create_fence_timeline(struct drm_device *dev,
>>                      struct intel_context *ctx,
>>                      struct intel_engine_cs *ring);
>> +#ifdef CONFIG_SYNC
>> +struct sync_fence;
>> +int i915_create_sync_fence(struct drm_i915_gem_request *req, int 
>> *fence_fd);
>> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct 
>> sync_fence *fence);
>> +#endif
>>
>>   static inline bool i915_gem_request_completed(struct 
>> drm_i915_gem_request *req)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 3f20087..de93422 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -37,6 +37,9 @@
>>   #include <linux/swap.h>
>>   #include <linux/pci.h>
>>   #include <linux/dma-buf.h>
>> +#ifdef CONFIG_SYNC
>> +#include <../drivers/staging/android/sync.h>
>> +#endif
>>
>>   #define RQ_BUG_ON(expr)
>>
>> @@ -2549,6 +2552,15 @@ void __i915_add_request(struct 
>> drm_i915_gem_request *request,
>>        */
>>       i915_gem_request_submit(request);
>>
>> +    /*
>> +     * If an external sync point has been requested for this request 
>> then
>> +     * it can be waited on without the driver's knowledge, i.e. without
>> +     * calling __i915_wait_request(). Thus interrupts must be enabled
>> +     * from the start rather than only on demand.
>> +     */
>> +    if (request->fence_external)
>> +        i915_gem_request_enable_interrupt(request);
>
> Maybe then fence_exported would be clearer, fence_external at first 
> sounds like it is coming from another driver or something.
Turns out it is not necessary anyway as mentioned below.

>> +
>>       if (i915.enable_execlists)
>>           ret = ring->emit_request(request);
>>       else {
>> @@ -2857,6 +2869,78 @@ static uint32_t 
>> i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
>>       return seqno;
>>   }
>>
>> +#ifdef CONFIG_SYNC
>> +int i915_create_sync_fence(struct drm_i915_gem_request *req, int 
>> *fence_fd)
>> +{
>> +    char ring_name[] = "i915_ring0";
>> +    struct sync_fence *sync_fence;
>> +    int fd;
>> +
>> +    fd = get_unused_fd_flags(O_CLOEXEC);
>> +    if (fd < 0) {
>> +        DRM_DEBUG("No available file descriptors!\n");
>> +        *fence_fd = -1;
>> +        return fd;
>> +    }
>> +
>> +    ring_name[9] += req->ring->id;
>> +    sync_fence = sync_fence_create_dma(ring_name, &req->fence);
>
> This will call ->enable_signalling so perhaps you could enable 
> interrupts in there for exported fences. Maybe it would be a tiny bit 
> more logically grouped. (Rather than have _add_request do it.)

Yeah, hadn't quite spotted this first time around. It now all happens 
'magically' without needing any explicit code - just some explicit 
comments to say that the behind the scenes magick is a) happening and b) 
necessary.

>
>> +    if (!sync_fence) {
>> +        put_unused_fd(fd);
>> +        *fence_fd = -1;
>> +        return -ENOMEM;
>> +    }
>> +
>> +    sync_fence_install(sync_fence, fd);
>> +    *fence_fd = fd;
>> +
>> +    // Necessary??? Who does the put???
>> +    fence_get(&req->fence);
>
> sync_fence_release?
Yes but where? Does the driver need to call this? Is it userland's 
responsibility? Does it happen automatically when the fd is closed? Do 
we even need to do the _get() in the first place? It seems to be working 
in that I don't get any unexpected free of the fence and I don't get 
huge numbers of leaked fences. However, it would be nice to know how it 
is working!

>
>> +
>> +    req->fence_external = true;
>> +
>> +    return 0;
>> +}
>> +
>> +bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct 
>> sync_fence *sync_fence)
>> +{
>> +    struct fence *dma_fence;
>> +    struct drm_i915_gem_request *req;
>> +    bool ignore;
>> +    int i;
>> +
>> +    if (atomic_read(&sync_fence->status) != 0)
>> +        return true;
>> +
>> +    ignore = true;
>> +    for(i = 0; i < sync_fence->num_fences; i++) {
>> +        dma_fence = sync_fence->cbs[i].sync_pt;
>> +
>> +        /* No need to worry about dead points: */
>> +        if (fence_is_signaled(dma_fence))
>> +            continue;
>> +
>> +        /* Can't ignore other people's points: */
>> +        if(dma_fence->ops != &i915_gem_request_fops) {
>> +            ignore = false;
>> +            break;
>
> The same as return false and then don't need bool ignore at all.
Yeah, left over from when there was cleanup to be done at function exit 
time. The cleanup code removed but the single exit point did not.

>
>> +        }
>> +
>> +        req = container_of(dma_fence, typeof(*req), fence);
>> +
>> +        /* Can't ignore points on other rings: */
>> +        if (req->ring != ring) {
>> +            ignore = false;
>> +            break;
>> +        }
>> +
>> +        /* Same ring means guaranteed to be in order so ignore it. */
>> +    }
>> +
>> +    return ignore;
>> +}
>> +#endif
>> +
>>   int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>                  struct intel_context *ctx,
>>                  struct drm_i915_gem_request **req_out)
>> diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c 
>> b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> index 923a3c4..b1a1659 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
>> @@ -26,6 +26,7 @@
>>    *
>>    */
>>
>> +#include <linux/syscalls.h>
>>   #include <drm/drmP.h>
>>   #include <drm/i915_drm.h>
>>   #include "i915_drv.h"
>> @@ -33,6 +34,9 @@
>>   #include "intel_drv.h"
>>   #include <linux/dma_remapping.h>
>>   #include <linux/uaccess.h>
>> +#ifdef CONFIG_SYNC
>> +#include <../drivers/staging/android/sync.h>
>> +#endif
>>
>>   #define  __EXEC_OBJECT_HAS_PIN (1<<31)
>>   #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
>> @@ -1403,6 +1407,35 @@ eb_get_batch(struct eb_vmas *eb)
>>       return vma->obj;
>>   }
>>
>> +#ifdef CONFIG_SYNC
>
> I don't expect you'll be able to get away with ifdef's in the code 
> like this so for non-RFC it will have to be cleaned up.

Not a lot of choice at the moment. The sync_* code is all #ifdef 
CONFIG_SYNC so any code that references it must be likewise. As to how 
we get the CONFIG_SYNC tag removed, that is another discussion...

>
>> +static int i915_early_fence_wait(struct intel_engine_cs *ring, int 
>> fence_fd)
>> +{
>> +    struct sync_fence *fence;
>> +    int ret = 0;
>> +
>> +    if (fence_fd < 0) {
>> +        DRM_ERROR("Invalid wait fence fd %d on ring %d\n", fence_fd,
>> +              (int) ring->id);
>> +        return 1;
>> +    }
>> +
>> +    fence = sync_fence_fdget(fence_fd);
>> +    if (fence == NULL) {
>> +        DRM_ERROR("Invalid wait fence %d on ring %d\n", fence_fd,
>> +              (int) ring->id);
>
> These two should be DRM_DEBUG to prevent userspace from spamming the 
> logs too easily.
>
>> +        return 1;
>> +    }
>> +
>> +    if (atomic_read(&fence->status) == 0) {
>> +        if (!i915_safe_to_ignore_fence(ring, fence))
>> +            ret = sync_fence_wait(fence, 1000);
>
> I expect you have to wait indefinitely here, not just for one second.
Causing the driver to wait indefinitely under userland control is surely 
a Bad Thing(tm)? Okay, this is done before acquiring the mutex lock and 
presumably the wait can be interrupted, e.g. if the user land process 
gets a KILL signal. It still seems a bad idea to wait forever. Various 
bits of Android generally use a timeout of either 1s or 3s.

Daniel or anyone else, any views of driver time outs?

>
>> +    }
>> +
>> +    sync_fence_put(fence);
>> +    return ret;
>> +}
>> +#endif
>> +
>>   static int
>>   i915_gem_do_execbuffer(struct drm_device *dev, void *data,
>>                  struct drm_file *file,
>> @@ -1422,6 +1455,18 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
>> void *data,
>>       u32 dispatch_flags;
>>       int ret;
>>       bool need_relocs;
>> +    int fd_fence_complete = -1;
>> +#ifdef CONFIG_SYNC
>> +    int fd_fence_wait = lower_32_bits(args->rsvd2);
>> +#endif
>> +
>> +    /*
>> +     * Make sure an broken fence handle is not returned no matter
>> +     * how early an error might be hit. Note that rsvd2 has to be
>> +     * saved away first because it is also an input parameter!
>> +     */
>> +    if (args->flags & I915_EXEC_CREATE_FENCE)
>> +        args->rsvd2 = (__u64) -1;
>>
>>       if (!i915_gem_check_execbuffer(args))
>>           return -EINVAL;
>> @@ -1505,6 +1550,19 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
>> void *data,
>>           dispatch_flags |= I915_DISPATCH_RS;
>>       }
>>
>> +#ifdef CONFIG_SYNC
>> +    /*
>> +     * Without a GPU scheduler, any fence waits must be done up front.
>> +     */
>> +    if (args->flags & I915_EXEC_WAIT_FENCE) {
>> +        ret = i915_early_fence_wait(ring, fd_fence_wait);
>> +        if (ret < 0)
>> +            return ret;
>> +
>> +        args->flags &= ~I915_EXEC_WAIT_FENCE;
>> +    }
>> +#endif
>> +
>>       intel_runtime_pm_get(dev_priv);
>>
>>       ret = i915_mutex_lock_interruptible(dev);
>> @@ -1652,6 +1710,27 @@ i915_gem_do_execbuffer(struct drm_device *dev, 
>> void *data,
>>       params->batch_obj               = batch_obj;
>>       params->ctx                     = ctx;
>>
>> +#ifdef CONFIG_SYNC
>> +    if (args->flags & I915_EXEC_CREATE_FENCE) {
>> +        /*
>> +         * Caller has requested a sync fence.
>> +         * User interrupts will be enabled to make sure that
>> +         * the timeline is signalled on completion.
>> +         */
>> +        ret = i915_create_sync_fence(params->request,
>> +                         &fd_fence_complete);
>> +        if (ret) {
>> +            DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
>> +                  ring->id, ctx);
>> +            args->rsvd2 = (__u64) -1;
>> +            goto err;
>> +        }
>> +
>> +        /* Return the fence through the rsvd2 field */
>> +        args->rsvd2 = (__u64) fd_fence_complete;
>> +    }
>> +#endif
>> +
>>       ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
>>
>>   err_batch_unpin:
>> @@ -1683,6 +1762,12 @@ pre_mutex_err:
>>       /* intel_gpu_busy should also get a ref, so it will free when 
>> the device
>>        * is really idle. */
>>       intel_runtime_pm_put(dev_priv);
>> +
>> +    if (fd_fence_complete != -1) {
>> +        sys_close(fd_fence_complete);
>
> I am not sure calling system call functions from driver code will be 
> allowed. that's why I was doing fd_install only when sure everything 
> went OK.

Daniel or others, any thoughts? Is the clean up allowed in the driver? 
Is there an alternative driver friendly option? It makes the sync 
creating code cleaner if we can do everything in one place rather than 
do some processing up front and some at the end.

>
> Regards,
>
> Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 6/9] drm/i915: Delay the freeing of requests until retire time
  2015-10-28 13:00     ` John Harrison
@ 2015-10-28 13:42       ` Tvrtko Ursulin
  0 siblings, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-10-28 13:42 UTC (permalink / raw)
  To: John Harrison, Intel-GFX


On 28/10/15 13:00, John Harrison wrote:
> On 23/07/2015 15:25, Tvrtko Ursulin wrote:
>>
>> Hi,
>>
>> On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The request structure is reference counted. When the count reached
>>> zero, the request was immediately freed and all associated objects
>>> were unrefereced/unallocated. This meant that the driver mutex lock
>>> must be held at the point where the count reaches zero. This was fine
>>> while all references were held internally to the driver. However, the
>>> plan is to allow the underlying fence object (and hence the request
>>> itself) to be returned to other drivers and to userland. External
>>> users cannot be expected to acquire a driver private mutex lock.
>>>
>>> Rather than attempt to disentangle the request structure from the
>>> driver mutex lock, the decsion was to defer the free code until a
>>> later (safer) point. Hence this patch changes the unreference callback
>>> to merely move the request onto a delayed free list. The driver's
>>> retire worker thread will then process the list and actually call the
>>> free function on the requests.
>>>
>>> [new patch in series]
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         | 22 +++---------------
>>>   drivers/gpu/drm/i915/i915_gem.c         | 41
>>> +++++++++++++++++++++++++++++----
>>>   drivers/gpu/drm/i915/intel_display.c    |  2 +-
>>>   drivers/gpu/drm/i915/intel_lrc.c        |  2 ++
>>>   drivers/gpu/drm/i915/intel_pm.c         |  2 +-
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |  2 ++
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  4 ++++
>>>   7 files changed, 50 insertions(+), 25 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h
>>> b/drivers/gpu/drm/i915/i915_drv.h
>>> index 88a4746..61c3db2 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2161,14 +2161,9 @@ void i915_gem_track_fb(struct
>>> drm_i915_gem_object *old,
>>>    * initial reference taken using kref_init
>>>    */
>>>   struct drm_i915_gem_request {
>>> -    /**
>>> -     * Underlying object for implementing the signal/wait stuff.
>>> -     * NB: Never return this fence object to user land! It is unsafe to
>>> -     * let anything outside of the i915 driver get hold of the fence
>>> -     * object as the clean up when decrementing the reference count
>>> -     * requires holding the driver mutex lock.
>>> -     */
>>> +    /** Underlying object for implementing the signal/wait stuff. */
>>>       struct fence fence;
>>> +    struct list_head delay_free_list;
>>
>> Maybe call this delay_free_link to continue the established convention.
>>
>>>
>>>       /** On Which ring this request was generated */
>>>       struct drm_i915_private *i915;
>>> @@ -2281,21 +2276,10 @@ i915_gem_request_reference(struct
>>> drm_i915_gem_request *req)
>>>   static inline void
>>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>>   {
>>> - WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
>>> -    fence_put(&req->fence);
>>> -}
>>> -
>>> -static inline void
>>> -i915_gem_request_unreference__unlocked(struct drm_i915_gem_request
>>> *req)
>>> -{
>>> -    struct drm_device *dev;
>>> -
>>>       if (!req)
>>>           return;
>>>
>>> -    dev = req->ring->dev;
>>> -    if (kref_put_mutex(&req->fence.refcount, fence_release,
>>> &dev->struct_mutex))
>>> -        mutex_unlock(&dev->struct_mutex);
>>> +    fence_put(&req->fence);
>>>   }
>>>
>>>   static inline void i915_gem_request_assign(struct
>>> drm_i915_gem_request **pdst,
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c
>>> b/drivers/gpu/drm/i915/i915_gem.c
>>> index af79716..482835a 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -2616,10 +2616,27 @@ static void i915_set_reset_status(struct
>>> drm_i915_private *dev_priv,
>>>       }
>>>   }
>>>
>>> -static void i915_gem_request_free(struct fence *req_fence)
>>> +static void i915_gem_request_release(struct fence *req_fence)
>>>   {
>>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>>                            typeof(*req), fence);
>>> +    struct intel_engine_cs *ring = req->ring;
>>> +    struct drm_i915_private *dev_priv = to_i915(ring->dev);
>>> +    unsigned long flags;
>>> +
>>> +    /*
>>> +     * Need to add the request to a deferred dereference list to be
>>> +     * processed at a mutex lock safe time.
>>> +     */
>>> +    spin_lock_irqsave(&ring->delayed_free_lock, flags);
>>
>> At the moment there is no request unreferencing from irq handlers
>> right? Unless (or until) you plan to add that you could use simple
>> spin_lock here. (And in the i915_gem_retire_requests_ring.)
>
> I don't believe there is an unreference at IRQ time at this precise
> moment. However, there certainly have been in various other iterations
> of the code (including one on the display side that has since
> disappeared due to changes by others completely unrelated to this work).
> So I would be nervous about not making it IRQ compatible. It seems like
> a bug waiting to happen.

I think it is bad to take the cost of disabling interrupts for nothing.

Once the unthinkable happens and driver is re-designed so that it is 
possible to unreference from IRQ context it could be added.

>>> @@ -2992,6 +3009,21 @@ i915_gem_retire_requests_ring(struct
>>> intel_engine_cs *ring)
>>>           i915_gem_request_assign(&ring->trace_irq_req, NULL);
>>>       }
>>>
>>> +    while (!list_empty(&ring->delayed_free_list)) {
>>> +        struct drm_i915_gem_request *request;
>>> +        unsigned long flags;
>>> +
>>> +        request = list_first_entry(&ring->delayed_free_list,
>>> +                       struct drm_i915_gem_request,
>>> +                       delay_free_list);
>>
>> Need a spinlock to sample list head here. Then maybe move it on a
>> temporary list and do the freeing afterwards.
>
> Not necessary. The only other usage of the list is to add to it. So this
> code can't pull an entry that gets removed beneath its feet. Either the
> list empty test will return true and nothing further happens or there is
> definitely a node on the list and list_first_entry() will return
> something sane. The spinlock is only required when actually deleting
> that node.

NAK! :D

It only works because you know how lists are implemented. Say if 
list_empty checked for head->prev == head, and list_first_entry 
obviously uses head->next, then depending on ordering in list_add, you 
could grab some garbage, take the lock and dereference that garbage.

I see no gain in doing this trickery and it is fragile. And you still 
lock/unlock once per loop.

Why not use the common pattern of replacing the list under the lock and 
then operating on your local copy unrestricted?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL
  2015-10-28 13:01     ` John Harrison
@ 2015-10-28 14:31       ` Tvrtko Ursulin
  2015-11-17 13:59       ` Daniel Vetter
  1 sibling, 0 replies; 38+ messages in thread
From: Tvrtko Ursulin @ 2015-10-28 14:31 UTC (permalink / raw)
  To: John Harrison, Intel-GFX


On 28/10/15 13:01, John Harrison wrote:
> On 27/07/2015 14:00, Tvrtko Ursulin wrote:

[snip]

>>> +    if (!sync_fence) {
>>> +        put_unused_fd(fd);
>>> +        *fence_fd = -1;
>>> +        return -ENOMEM;
>>> +    }
>>> +
>>> +    sync_fence_install(sync_fence, fd);
>>> +    *fence_fd = fd;
>>> +
>>> +    // Necessary??? Who does the put???
>>> +    fence_get(&req->fence);
>>
>> sync_fence_release?
> Yes but where? Does the driver need to call this? Is it userland's
> responsibility? Does it happen automatically when the fd is closed? Do
> we even need to do the _get() in the first place? It seems to be working
> in that I don't get any unexpected free of the fence and I don't get
> huge numbers of leaked fences. However, it would be nice to know how it
> is working!

When the fd is closed, implicitly or explicitly (close(2)/exit(2)/any 
process termination), kernel will automatically call sync_fence_release 
via the file_operations set in the inode->f_ops at sync_fence creation time

(Actually not when the fd is closed but when the last instance of struct 
file * in question goes away.)

>>> +        return 1;
>>> +    }
>>> +
>>> +    if (atomic_read(&fence->status) == 0) {
>>> +        if (!i915_safe_to_ignore_fence(ring, fence))
>>> +            ret = sync_fence_wait(fence, 1000);
>>
>> I expect you have to wait indefinitely here, not just for one second.
> Causing the driver to wait indefinitely under userland control is surely
> a Bad Thing(tm)? Okay, this is done before acquiring the mutex lock and
> presumably the wait can be interrupted, e.g. if the user land process
> gets a KILL signal. It still seems a bad idea to wait forever. Various
> bits of Android generally use a timeout of either 1s or 3s.
>
> Daniel or anyone else, any views of driver time outs?

It is slightly irrelevant since this is a temporary code path before the 
scheduler lands.

But I don't see it makes sense to have made up timeouts. If userspace 
gave us something to wait on, then I think we should wait until it is 
done or it fails. It is not blocking the driver but is running on the 
behalf of the same process which passed in the fd to wait on. So the 
process can only block itself.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 5/9] drm/i915: Add per context timelines to fence object
  2015-10-28 12:59     ` John Harrison
@ 2015-11-17 13:54       ` Daniel Vetter
  0 siblings, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-11-17 13:54 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Wed, Oct 28, 2015 at 12:59:31PM +0000, John Harrison wrote:
> Have finally had some time to come back to this and respond to/incorporate
> the comments made some while ago...
> 
> 
> On 23/07/2015 14:50, Tvrtko Ursulin wrote:
> >Hi,
> >
> >On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>The fence object used inside the request structure requires a sequence
> >>number.
> >>Although this is not used by the i915 driver itself, it could
> >>potentially be
> >>used by non-i915 code if the fence is passed outside of the driver. This
> >>is the
> >>intention as it allows external kernel drivers and user applications to
> >>wait on
> >>batch buffer completion asynchronously via the dma-buff fence API.
> >>
> >>To ensure that such external users are not confused by strange things
> >>happening
> >>with the seqno, this patch adds in a per context timeline that can
> >>provide a
> >>guaranteed in-order seqno value for the fence. This is safe because the
> >>scheduler will not re-order batch buffers within a context - they are
> >>considered
> >>to be mutually dependent.
> >>
> >>[new patch in series]
> >>
> >>For: VIZ-5190
> >>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>---
> >>  drivers/gpu/drm/i915/i915_drv.h         | 25 ++++++++----
> >>  drivers/gpu/drm/i915/i915_gem.c         | 69
> >>++++++++++++++++++++++++++++++---
> >>  drivers/gpu/drm/i915/i915_gem_context.c | 15 ++++++-
> >>  drivers/gpu/drm/i915/intel_lrc.c        |  8 ++++
> >>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
> >>  5 files changed, 103 insertions(+), 15 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>b/drivers/gpu/drm/i915/i915_drv.h
> >>index 0c7df46..88a4746 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -840,6 +840,15 @@ struct i915_ctx_hang_stats {
> >>      bool banned;
> >>  };
> >>
> >>+struct i915_fence_timeline {
> >>+    unsigned    fence_context;
> >>+    uint32_t    context;
> >
> >Unused field?
> >
> >>+    uint32_t    next;
> >
> >fence.h defines seqnos as 'unsigned', which matches this in practice, but
> >maybe it would be nicer to use the same type name.
> >
> >>+
> >>+    struct intel_context *ctx;
> >>+    struct intel_engine_cs *ring;
> >>+};
> >>+
> >>  /* This must match up with the value previously used for
> >>execbuf2.rsvd1. */
> >>  #define DEFAULT_CONTEXT_HANDLE 0
> >>
> >>@@ -885,6 +894,7 @@ struct intel_context {
> >>          struct drm_i915_gem_object *state;
> >>          struct intel_ringbuffer *ringbuf;
> >>          int pin_count;
> >>+        struct i915_fence_timeline fence_timeline;
> >>      } engine[I915_NUM_RINGS];
> >>
> >>      struct list_head link;
> >>@@ -2153,13 +2163,10 @@ void i915_gem_track_fb(struct
> >>drm_i915_gem_object *old,
> >>  struct drm_i915_gem_request {
> >>      /**
> >>       * Underlying object for implementing the signal/wait stuff.
> >>-     * NB: Never call fence_later() or return this fence object to user
> >>-     * land! Due to lazy allocation, scheduler re-ordering,
> >>pre-emption,
> >>-     * etc., there is no guarantee at all about the validity or
> >>-     * sequentiality of the fence's seqno! It is also unsafe to let
> >>-     * anything outside of the i915 driver get hold of the fence object
> >>-     * as the clean up when decrementing the reference count requires
> >>-     * holding the driver mutex lock.
> >>+     * NB: Never return this fence object to user land! It is unsafe to
> >>+     * let anything outside of the i915 driver get hold of the fence
> >>+     * object as the clean up when decrementing the reference count
> >>+     * requires holding the driver mutex lock.
> >>       */
> >>      struct fence fence;
> >>
> >>@@ -2239,6 +2246,10 @@ int i915_gem_request_alloc(struct intel_engine_cs
> >>*ring,
> >>                 struct drm_i915_gem_request **req_out);
> >>  void i915_gem_request_cancel(struct drm_i915_gem_request *req);
> >>
> >>+int i915_create_fence_timeline(struct drm_device *dev,
> >>+                   struct intel_context *ctx,
> >>+                   struct intel_engine_cs *ring);
> >>+
> >>  static inline bool i915_gem_request_completed(struct
> >>drm_i915_gem_request *req)
> >>  {
> >>      return fence_is_signaled(&req->fence);
> >>diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >>b/drivers/gpu/drm/i915/i915_gem.c
> >>index 3970250..af79716 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>@@ -2671,6 +2671,25 @@ static bool i915_gem_request_is_completed(struct
> >>fence *req_fence)
> >>      return i915_seqno_passed(seqno, req->seqno);
> >>  }
> >>
> >>+static void i915_fence_timeline_value_str(struct fence *fence, char
> >>*str, int size)
> >>+{
> >>+    struct drm_i915_gem_request *req;
> >>+
> >>+    req = container_of(fence, typeof(*req), fence);
> >>+
> >>+    /* Last signalled timeline value ??? */
> >>+    snprintf(str, size, "? [%d]"/*, tl->value*/,
> >>req->ring->get_seqno(req->ring, true));
> >>+}
> >
> >If timelines are per context now maybe we should update
> >i915_gem_request_get_timeline_name to be per context instead of per engine
> >as well? Like this we have a name space overlap / seqno collisions from
> >userspace point of view.
> >
> >>+static void i915_fence_value_str(struct fence *fence, char *str, int
> >>size)
> >>+{
> >>+    struct drm_i915_gem_request *req;
> >>+
> >>+    req = container_of(fence, typeof(*req), fence);
> >>+
> >>+    snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
> >>+}
> >>+
> >>  static const struct fence_ops i915_gem_request_fops = {
> >>      .get_driver_name    = i915_gem_request_get_driver_name,
> >>      .get_timeline_name    = i915_gem_request_get_timeline_name,
> >>@@ -2678,8 +2697,48 @@ static const struct fence_ops
> >>i915_gem_request_fops = {
> >>      .signaled        = i915_gem_request_is_completed,
> >>      .wait            = fence_default_wait,
> >>      .release        = i915_gem_request_free,
> >>+    .fence_value_str    = i915_fence_value_str,
> >>+    .timeline_value_str    = i915_fence_timeline_value_str,
> >>  };
> >>
> >>+int i915_create_fence_timeline(struct drm_device *dev,
> >>+                   struct intel_context *ctx,
> >>+                   struct intel_engine_cs *ring)
> >>+{
> >>+    struct i915_fence_timeline *timeline;
> >>+
> >>+    timeline = &ctx->engine[ring->id].fence_timeline;
> >>+
> >>+    if (timeline->ring)
> >>+        return 0;
> >>+
> >>+    timeline->fence_context = fence_context_alloc(1);
> >>+
> >>+    /*
> >>+     * Start the timeline from seqno 0 as this is a special value
> >>+     * that is reserved for invalid sync points.
> >>+     */
> >>+    timeline->next       = 1;
> >>+    timeline->ctx        = ctx;
> >>+    timeline->ring       = ring;
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+static uint32_t i915_fence_timeline_get_next_seqno(struct
> >>i915_fence_timeline *timeline)
> >>+{
> >>+    uint32_t seqno;
> >>+
> >>+    seqno = timeline->next;
> >>+
> >>+    /* Reserve zero for invalid */
> >>+    if (++timeline->next == 0 ) {
> >>+        timeline->next = 1;
> >>+    }
> >>+
> >>+    return seqno;
> >>+}
> >>+
> >>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> >>                 struct intel_context *ctx,
> >>                 struct drm_i915_gem_request **req_out)
> >>@@ -2715,7 +2774,9 @@ int i915_gem_request_alloc(struct intel_engine_cs
> >>*ring,
> >>          goto err;
> >>      }
> >>
> >>-    fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
> >>ring->fence_context, req->seqno);
> >>+    fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock,
> >>+ ctx->engine[ring->id].fence_timeline.fence_context,
> >>+
> >>i915_fence_timeline_get_next_seqno(&ctx->engine[ring->id].fence_timeline));
> >
> >I suppose for debugging it could be useful to add this new seqno in
> >i915_gem_request_info to have visibility at both sides. To map userspace
> >seqnos to driver state.
> >
> >>      /*
> >>       * Reserve space in the ring buffer for all the commands required
> >>to
> >>@@ -5065,7 +5126,7 @@ i915_gem_init_hw(struct drm_device *dev)
> >>  {
> >>      struct drm_i915_private *dev_priv = dev->dev_private;
> >>      struct intel_engine_cs *ring;
> >>-    int ret, i, j, fence_base;
> >>+    int ret, i, j;
> >>
> >>      if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
> >>          return -EIO;
> >>@@ -5117,16 +5178,12 @@ i915_gem_init_hw(struct drm_device *dev)
> >>              goto out;
> >>      }
> >>
> >>-    fence_base = fence_context_alloc(I915_NUM_RINGS);
> >>-
> >>      /* Now it is safe to go back round and do everything else: */
> >>      for_each_ring(ring, dev_priv, i) {
> >>          struct drm_i915_gem_request *req;
> >>
> >>          WARN_ON(!ring->default_context);
> >>
> >>-        ring->fence_context = fence_base + i;
> >>-
> >>          ret = i915_gem_request_alloc(ring, ring->default_context,
> >>&req);
> >>          if (ret) {
> >>              i915_gem_cleanup_ringbuffer(dev);
> >>diff --git a/drivers/gpu/drm/i915/i915_gem_context.c
> >>b/drivers/gpu/drm/i915/i915_gem_context.c
> >>index b77a8f7..7eb8694 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem_context.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem_context.c
> >>@@ -242,7 +242,7 @@ i915_gem_create_context(struct drm_device *dev,
> >>  {
> >>      const bool is_global_default_ctx = file_priv == NULL;
> >>      struct intel_context *ctx;
> >>-    int ret = 0;
> >>+    int i, ret = 0;
> >>
> >>      BUG_ON(!mutex_is_locked(&dev->struct_mutex));
> >>
> >>@@ -250,6 +250,19 @@ i915_gem_create_context(struct drm_device *dev,
> >>      if (IS_ERR(ctx))
> >>          return ctx;
> >>
> >>+    if (!i915.enable_execlists) {
> >>+        struct intel_engine_cs *ring;
> >>+
> >>+        /* Create a per context timeline for fences */
> >>+        for_each_ring(ring, to_i915(dev), i) {
> >>+            ret = i915_create_fence_timeline(dev, ctx, ring);
> >>+            if (ret) {
> >>+                DRM_ERROR("Fence timeline creation failed for legacy
> >>%s: %p\n", ring->name, ctx);
> >>+                goto err_destroy;
> >>+            }
> >>+        }
> >>+    }
> >>+
> >>      if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
> >>          /* We may need to do things with the shrinker which
> >>           * require us to immediately switch back to the default
> >>diff --git a/drivers/gpu/drm/i915/intel_lrc.c
> >>b/drivers/gpu/drm/i915/intel_lrc.c
> >>index ee4aecd..8f255de 100644
> >>--- a/drivers/gpu/drm/i915/intel_lrc.c
> >>+++ b/drivers/gpu/drm/i915/intel_lrc.c
> >>@@ -2376,6 +2376,14 @@ int intel_lr_context_deferred_create(struct
> >>intel_context *ctx,
> >>          goto error;
> >>      }
> >>
> >>+    /* Create a per context timeline for fences */
> >>+    ret = i915_create_fence_timeline(dev, ctx, ring);
> >>+    if (ret) {
> >>+        DRM_ERROR("Fence timeline creation failed for ring %s, ctx
> >>%p\n",
> >>+              ring->name, ctx);
> >>+        goto error;
> >>+    }
> >>+
> >
> >We must be 100% sure userspace cannot provoke context creation failure by
> >accident or deliberately. Otherwise we would leak fence contexts until
> >overflow which would be bad.
> >
> >Perhaps matching fence_context_release for existing fence_context_alloc
> >should be added?
> 
> Note that there is no fence_context_release. The fence_context_alloc code is
> simply 'return static_count++;'. There is no overflow checking. There is no
> anti-re-use checking. When 4GB contexts have been allocated, the old ones
> will get re-allocated and if they are still in use then tough. It's a really
> bad API! On the other hand, the context is not actually used for anything.
> So it doesn't really matter.

When it was written all the users of the fence api only allocated contexts
for hw resources, which means overflowing is pretty much impossible. With
execlist this is different. I guess it's time to fix up the fence timeline
creation api.
-Daniel

> 
> 
> >
> >Regards,
> >
> >Tvrtko
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL
  2015-10-28 13:01     ` John Harrison
  2015-10-28 14:31       ` Tvrtko Ursulin
@ 2015-11-17 13:59       ` Daniel Vetter
  1 sibling, 0 replies; 38+ messages in thread
From: Daniel Vetter @ 2015-11-17 13:59 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Wed, Oct 28, 2015 at 01:01:17PM +0000, John Harrison wrote:
> On 27/07/2015 14:00, Tvrtko Ursulin wrote:
> >On 07/17/2015 03:31 PM, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>Various projects desire a mechanism for managing dependencies between
> >>work items asynchronously. This can also include work items across
> >>complete different and independent systems. For example, an
> >>application wants to retreive a frame from a video in device,
> >>using it for rendering on a GPU then send it to the video out device
> >>for display all without having to stall waiting for completion along
> >>the way. The sync framework allows this. It encapsulates
> >>synchronisation events in file descriptors. The application can
> >>request a sync point for the completion of each piece of work. Drivers
> >>should also take sync points in with each new work request and not
> >>schedule the work to start until the sync has been signalled.
> >>
> >>This patch adds sync framework support to the exec buffer IOCTL. A
> >>sync point can be passed in to stall execution of the batch buffer
> >>until signalled. And a sync point can be returned after each batch
> >>buffer submission which will be signalled upon that batch buffer's
> >>completion.
> >>
> >>At present, the input sync point is simply waited on synchronously
> >>inside the exec buffer IOCTL call. Once the GPU scheduler arrives,
> >>this will be handled asynchronously inside the scheduler and the IOCTL
> >>can return without having to wait.
> >>
> >>Note also that the scheduler will re-order the execution of batch
> >>buffers, e.g. because a batch buffer is stalled on a sync point and
> >>cannot be submitted yet but other, independent, batch buffers are
> >>being presented to the driver. This means that the timeline within the
> >>sync points returned cannot be global to the engine. Instead they must
> >>be kept per context per engine (the scheduler may not re-order batches
> >>within a context). Hence the timeline cannot be based on the existing
> >>seqno values but must be a new implementation.
> >>
> >>This patch is a port of work by several people that has been pulled
> >>across from Android. It has been updated several times across several
> >>patches. Rather than attempt to port each individual patch, this
> >>version is the finished product as a single patch. The various
> >>contributors/authors along the way (in addition to myself) were:
> >>   Satyanantha RamaGopal M <rama.gopal.m.satyanantha@intel.com>
> >>   Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>   Michel Thierry <michel.thierry@intel.com>
> >>   Arun Siluvery <arun.siluvery@linux.intel.com>
> >>
> >>[new patch in series]
> >>
> >>For: VIZ-5190
> >>Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> >>---
> >>  drivers/gpu/drm/i915/i915_drv.h            |  6 ++
> >>  drivers/gpu/drm/i915/i915_gem.c            | 84
> >>++++++++++++++++++++++++++++
> >>  drivers/gpu/drm/i915/i915_gem_execbuffer.c | 90
> >>++++++++++++++++++++++++++++--
> >>  include/uapi/drm/i915_drm.h                | 16 +++++-
> >>  4 files changed, 188 insertions(+), 8 deletions(-)
> >>
> >>diff --git a/drivers/gpu/drm/i915/i915_drv.h
> >>b/drivers/gpu/drm/i915/i915_drv.h
> >>index d7f1aa5..cf6b7cd 100644
> >>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>@@ -2168,6 +2168,7 @@ struct drm_i915_gem_request {
> >>      struct list_head delay_free_list;
> >>      bool cancelled;
> >>      bool irq_enabled;
> >>+    bool fence_external;
> >>
> >>      /** On Which ring this request was generated */
> >>      struct drm_i915_private *i915;
> >>@@ -2252,6 +2253,11 @@ void i915_gem_request_notify(struct
> >>intel_engine_cs *ring);
> >>  int i915_create_fence_timeline(struct drm_device *dev,
> >>                     struct intel_context *ctx,
> >>                     struct intel_engine_cs *ring);
> >>+#ifdef CONFIG_SYNC
> >>+struct sync_fence;
> >>+int i915_create_sync_fence(struct drm_i915_gem_request *req, int
> >>*fence_fd);
> >>+bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct
> >>sync_fence *fence);
> >>+#endif
> >>
> >>  static inline bool i915_gem_request_completed(struct
> >>drm_i915_gem_request *req)
> >>  {
> >>diff --git a/drivers/gpu/drm/i915/i915_gem.c
> >>b/drivers/gpu/drm/i915/i915_gem.c
> >>index 3f20087..de93422 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem.c
> >>@@ -37,6 +37,9 @@
> >>  #include <linux/swap.h>
> >>  #include <linux/pci.h>
> >>  #include <linux/dma-buf.h>
> >>+#ifdef CONFIG_SYNC
> >>+#include <../drivers/staging/android/sync.h>
> >>+#endif
> >>
> >>  #define RQ_BUG_ON(expr)
> >>
> >>@@ -2549,6 +2552,15 @@ void __i915_add_request(struct
> >>drm_i915_gem_request *request,
> >>       */
> >>      i915_gem_request_submit(request);
> >>
> >>+    /*
> >>+     * If an external sync point has been requested for this request
> >>then
> >>+     * it can be waited on without the driver's knowledge, i.e. without
> >>+     * calling __i915_wait_request(). Thus interrupts must be enabled
> >>+     * from the start rather than only on demand.
> >>+     */
> >>+    if (request->fence_external)
> >>+        i915_gem_request_enable_interrupt(request);
> >
> >Maybe then fence_exported would be clearer, fence_external at first sounds
> >like it is coming from another driver or something.
> Turns out it is not necessary anyway as mentioned below.
> 
> >>+
> >>      if (i915.enable_execlists)
> >>          ret = ring->emit_request(request);
> >>      else {
> >>@@ -2857,6 +2869,78 @@ static uint32_t
> >>i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *t
> >>      return seqno;
> >>  }
> >>
> >>+#ifdef CONFIG_SYNC
> >>+int i915_create_sync_fence(struct drm_i915_gem_request *req, int
> >>*fence_fd)
> >>+{
> >>+    char ring_name[] = "i915_ring0";
> >>+    struct sync_fence *sync_fence;
> >>+    int fd;
> >>+
> >>+    fd = get_unused_fd_flags(O_CLOEXEC);
> >>+    if (fd < 0) {
> >>+        DRM_DEBUG("No available file descriptors!\n");
> >>+        *fence_fd = -1;
> >>+        return fd;
> >>+    }
> >>+
> >>+    ring_name[9] += req->ring->id;
> >>+    sync_fence = sync_fence_create_dma(ring_name, &req->fence);
> >
> >This will call ->enable_signalling so perhaps you could enable interrupts
> >in there for exported fences. Maybe it would be a tiny bit more logically
> >grouped. (Rather than have _add_request do it.)
> 
> Yeah, hadn't quite spotted this first time around. It now all happens
> 'magically' without needing any explicit code - just some explicit comments
> to say that the behind the scenes magick is a) happening and b) necessary.
> 
> >
> >>+    if (!sync_fence) {
> >>+        put_unused_fd(fd);
> >>+        *fence_fd = -1;
> >>+        return -ENOMEM;
> >>+    }
> >>+
> >>+    sync_fence_install(sync_fence, fd);
> >>+    *fence_fd = fd;
> >>+
> >>+    // Necessary??? Who does the put???
> >>+    fence_get(&req->fence);
> >
> >sync_fence_release?
> Yes but where? Does the driver need to call this? Is it userland's
> responsibility? Does it happen automatically when the fd is closed? Do we
> even need to do the _get() in the first place? It seems to be working in
> that I don't get any unexpected free of the fence and I don't get huge
> numbers of leaked fences. However, it would be nice to know how it is
> working!
> 
> >
> >>+
> >>+    req->fence_external = true;
> >>+
> >>+    return 0;
> >>+}
> >>+
> >>+bool i915_safe_to_ignore_fence(struct intel_engine_cs *ring, struct
> >>sync_fence *sync_fence)
> >>+{
> >>+    struct fence *dma_fence;
> >>+    struct drm_i915_gem_request *req;
> >>+    bool ignore;
> >>+    int i;
> >>+
> >>+    if (atomic_read(&sync_fence->status) != 0)
> >>+        return true;
> >>+
> >>+    ignore = true;
> >>+    for(i = 0; i < sync_fence->num_fences; i++) {
> >>+        dma_fence = sync_fence->cbs[i].sync_pt;
> >>+
> >>+        /* No need to worry about dead points: */
> >>+        if (fence_is_signaled(dma_fence))
> >>+            continue;
> >>+
> >>+        /* Can't ignore other people's points: */
> >>+        if(dma_fence->ops != &i915_gem_request_fops) {
> >>+            ignore = false;
> >>+            break;
> >
> >The same as return false and then don't need bool ignore at all.
> Yeah, left over from when there was cleanup to be done at function exit
> time. The cleanup code removed but the single exit point did not.
> 
> >
> >>+        }
> >>+
> >>+        req = container_of(dma_fence, typeof(*req), fence);
> >>+
> >>+        /* Can't ignore points on other rings: */
> >>+        if (req->ring != ring) {
> >>+            ignore = false;
> >>+            break;
> >>+        }
> >>+
> >>+        /* Same ring means guaranteed to be in order so ignore it. */
> >>+    }
> >>+
> >>+    return ignore;
> >>+}
> >>+#endif
> >>+
> >>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> >>                 struct intel_context *ctx,
> >>                 struct drm_i915_gem_request **req_out)
> >>diff --git a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>index 923a3c4..b1a1659 100644
> >>--- a/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>+++ b/drivers/gpu/drm/i915/i915_gem_execbuffer.c
> >>@@ -26,6 +26,7 @@
> >>   *
> >>   */
> >>
> >>+#include <linux/syscalls.h>
> >>  #include <drm/drmP.h>
> >>  #include <drm/i915_drm.h>
> >>  #include "i915_drv.h"
> >>@@ -33,6 +34,9 @@
> >>  #include "intel_drv.h"
> >>  #include <linux/dma_remapping.h>
> >>  #include <linux/uaccess.h>
> >>+#ifdef CONFIG_SYNC
> >>+#include <../drivers/staging/android/sync.h>
> >>+#endif
> >>
> >>  #define  __EXEC_OBJECT_HAS_PIN (1<<31)
> >>  #define  __EXEC_OBJECT_HAS_FENCE (1<<30)
> >>@@ -1403,6 +1407,35 @@ eb_get_batch(struct eb_vmas *eb)
> >>      return vma->obj;
> >>  }
> >>
> >>+#ifdef CONFIG_SYNC
> >
> >I don't expect you'll be able to get away with ifdef's in the code like
> >this so for non-RFC it will have to be cleaned up.
> 
> Not a lot of choice at the moment. The sync_* code is all #ifdef CONFIG_SYNC
> so any code that references it must be likewise. As to how we get the
> CONFIG_SYNC tag removed, that is another discussion...

Destaging the sync stuff is part of the merge criteria for this feature.
So yeah, all the #ifdefery has to go and be replace by a select FENCE or
whatever in the i915 Kconfig.

> >>+static int i915_early_fence_wait(struct intel_engine_cs *ring, int
> >>fence_fd)
> >>+{
> >>+    struct sync_fence *fence;
> >>+    int ret = 0;
> >>+
> >>+    if (fence_fd < 0) {
> >>+        DRM_ERROR("Invalid wait fence fd %d on ring %d\n", fence_fd,
> >>+              (int) ring->id);
> >>+        return 1;
> >>+    }
> >>+
> >>+    fence = sync_fence_fdget(fence_fd);
> >>+    if (fence == NULL) {
> >>+        DRM_ERROR("Invalid wait fence %d on ring %d\n", fence_fd,
> >>+              (int) ring->id);
> >
> >These two should be DRM_DEBUG to prevent userspace from spamming the logs
> >too easily.
> >
> >>+        return 1;
> >>+    }
> >>+
> >>+    if (atomic_read(&fence->status) == 0) {
> >>+        if (!i915_safe_to_ignore_fence(ring, fence))
> >>+            ret = sync_fence_wait(fence, 1000);
> >
> >I expect you have to wait indefinitely here, not just for one second.
> Causing the driver to wait indefinitely under userland control is surely a
> Bad Thing(tm)? Okay, this is done before acquiring the mutex lock and
> presumably the wait can be interrupted, e.g. if the user land process gets a
> KILL signal. It still seems a bad idea to wait forever. Various bits of
> Android generally use a timeout of either 1s or 3s.
> 
> Daniel or anyone else, any views of driver time outs?

Wait forever, but interruptibly. Have an igt to exercise the deadlock case
and make sure gpu reset can recover.

> >
> >>+    }
> >>+
> >>+    sync_fence_put(fence);
> >>+    return ret;
> >>+}
> >>+#endif
> >>+
> >>  static int
> >>  i915_gem_do_execbuffer(struct drm_device *dev, void *data,
> >>                 struct drm_file *file,
> >>@@ -1422,6 +1455,18 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >>void *data,
> >>      u32 dispatch_flags;
> >>      int ret;
> >>      bool need_relocs;
> >>+    int fd_fence_complete = -1;
> >>+#ifdef CONFIG_SYNC
> >>+    int fd_fence_wait = lower_32_bits(args->rsvd2);
> >>+#endif
> >>+
> >>+    /*
> >>+     * Make sure an broken fence handle is not returned no matter
> >>+     * how early an error might be hit. Note that rsvd2 has to be
> >>+     * saved away first because it is also an input parameter!
> >>+     */
> >>+    if (args->flags & I915_EXEC_CREATE_FENCE)
> >>+        args->rsvd2 = (__u64) -1;
> >>
> >>      if (!i915_gem_check_execbuffer(args))
> >>          return -EINVAL;
> >>@@ -1505,6 +1550,19 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >>void *data,
> >>          dispatch_flags |= I915_DISPATCH_RS;
> >>      }
> >>
> >>+#ifdef CONFIG_SYNC
> >>+    /*
> >>+     * Without a GPU scheduler, any fence waits must be done up front.
> >>+     */
> >>+    if (args->flags & I915_EXEC_WAIT_FENCE) {
> >>+        ret = i915_early_fence_wait(ring, fd_fence_wait);
> >>+        if (ret < 0)
> >>+            return ret;
> >>+
> >>+        args->flags &= ~I915_EXEC_WAIT_FENCE;
> >>+    }
> >>+#endif
> >>+
> >>      intel_runtime_pm_get(dev_priv);
> >>
> >>      ret = i915_mutex_lock_interruptible(dev);
> >>@@ -1652,6 +1710,27 @@ i915_gem_do_execbuffer(struct drm_device *dev,
> >>void *data,
> >>      params->batch_obj               = batch_obj;
> >>      params->ctx                     = ctx;
> >>
> >>+#ifdef CONFIG_SYNC
> >>+    if (args->flags & I915_EXEC_CREATE_FENCE) {
> >>+        /*
> >>+         * Caller has requested a sync fence.
> >>+         * User interrupts will be enabled to make sure that
> >>+         * the timeline is signalled on completion.
> >>+         */
> >>+        ret = i915_create_sync_fence(params->request,
> >>+                         &fd_fence_complete);
> >>+        if (ret) {
> >>+            DRM_ERROR("Fence creation failed for ring %d, ctx %p\n",
> >>+                  ring->id, ctx);
> >>+            args->rsvd2 = (__u64) -1;
> >>+            goto err;
> >>+        }
> >>+
> >>+        /* Return the fence through the rsvd2 field */
> >>+        args->rsvd2 = (__u64) fd_fence_complete;
> >>+    }
> >>+#endif
> >>+
> >>      ret = dev_priv->gt.execbuf_submit(params, args, &eb->vmas);
> >>
> >>  err_batch_unpin:
> >>@@ -1683,6 +1762,12 @@ pre_mutex_err:
> >>      /* intel_gpu_busy should also get a ref, so it will free when the
> >>device
> >>       * is really idle. */
> >>      intel_runtime_pm_put(dev_priv);
> >>+
> >>+    if (fd_fence_complete != -1) {
> >>+        sys_close(fd_fence_complete);
> >
> >I am not sure calling system call functions from driver code will be
> >allowed. that's why I was doing fd_install only when sure everything went
> >OK.
> 
> Daniel or others, any thoughts? Is the clean up allowed in the driver? Is
> there an alternative driver friendly option? It makes the sync creating code
> cleaner if we can do everything in one place rather than do some processing
> up front and some at the end.

You don't get to do this ;-) Fix it by only installing the fd if
everything has succeeded.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2015-11-17 14:00 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 14:31 [RFC 0/9] Convert requests to use struct fence John.C.Harrison
2015-07-17 14:31 ` [RFC 1/9] staging/android/sync: Support sync points created from dma-fences John.C.Harrison
2015-07-17 14:44   ` Tvrtko Ursulin
2015-07-17 14:31 ` [RFC 2/9] android: add sync_fence_create_dma John.C.Harrison
2015-07-17 14:31 ` [RFC 3/9] drm/i915: Convert requests to use struct fence John.C.Harrison
2015-07-21  7:05   ` Daniel Vetter
2015-07-28 10:01     ` John Harrison
2015-07-22 14:26   ` Tvrtko Ursulin
2015-07-28 10:10     ` John Harrison
2015-08-03  9:17       ` Tvrtko Ursulin
2015-07-22 14:45   ` Tvrtko Ursulin
2015-07-28 10:18     ` John Harrison
2015-08-03  9:18       ` Tvrtko Ursulin
2015-07-17 14:31 ` [RFC 4/9] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2015-07-17 14:31 ` [RFC 5/9] drm/i915: Add per context timelines to fence object John.C.Harrison
2015-07-23 13:50   ` Tvrtko Ursulin
2015-10-28 12:59     ` John Harrison
2015-11-17 13:54       ` Daniel Vetter
2015-07-17 14:31 ` [RFC 6/9] drm/i915: Delay the freeing of requests until retire time John.C.Harrison
2015-07-23 14:25   ` Tvrtko Ursulin
2015-10-28 13:00     ` John Harrison
2015-10-28 13:42       ` Tvrtko Ursulin
2015-07-17 14:31 ` [RFC 7/9] drm/i915: Interrupt driven fences John.C.Harrison
2015-07-20  9:09   ` Maarten Lankhorst
2015-07-21  7:19   ` Daniel Vetter
2015-07-27 11:33   ` Tvrtko Ursulin
2015-10-28 13:00     ` John Harrison
2015-07-27 13:20   ` Tvrtko Ursulin
2015-07-27 14:00     ` Daniel Vetter
2015-08-03  9:20       ` Tvrtko Ursulin
2015-08-05  8:05         ` Daniel Vetter
2015-08-05 11:05           ` Maarten Lankhorst
2015-07-17 14:31 ` [RFC 8/9] drm/i915: Updated request structure tracing John.C.Harrison
2015-07-17 14:31 ` [RFC 9/9] drm/i915: Add sync framework support to execbuff IOCTL John.C.Harrison
2015-07-27 13:00   ` Tvrtko Ursulin
2015-10-28 13:01     ` John Harrison
2015-10-28 14:31       ` Tvrtko Ursulin
2015-11-17 13:59       ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.