All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] Convert requests to use struct fence
@ 2015-06-26 12:58 John.C.Harrison
  2015-06-26 12:58 ` [PATCH 1/4] drm/i915: " John.C.Harrison
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: John.C.Harrison @ 2015-06-26 12:58 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is intended
to keep track of work that is executed on hardware. I.e. it solves the basic
problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
request structure does quite a lot more than simply track the execution progress
so is very definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain all the
advantages that provides.

This is work that was planned since the conversion of the driver from being
seqno value based to being request structure based. This patch series does that
work.

[Patches against drm-intel-nightly tree fetched 23/06/2015]

John Harrison (4):
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing

 drivers/gpu/drm/i915/i915_debugfs.c     |   2 +-
 drivers/gpu/drm/i915/i915_drv.h         |  52 +++++----
 drivers/gpu/drm/i915/i915_gem.c         | 195 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   4 +-
 drivers/gpu/drm/i915/i915_trace.h       |   7 +-
 drivers/gpu/drm/i915/intel_display.c    |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |   3 +
 drivers/gpu/drm/i915/intel_pm.c         |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   5 +
 10 files changed, 232 insertions(+), 45 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/4] drm/i915: Convert requests to use struct fence
  2015-06-26 12:58 [PATCH 0/4] Convert requests to use struct fence John.C.Harrison
@ 2015-06-26 12:58 ` John.C.Harrison
  2015-06-26 12:58 ` [PATCH 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 9+ messages in thread
From: John.C.Harrison @ 2015-06-26 12:58 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is intended
to keep track of work that is executed on hardware. I.e. it solves the basic
problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
request structure does quite a lot more than simply track the execution progress
so is very definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain all the
advantages that provides.

This patch makes the first step of integrating a struct fence into the request.
It replaces the explicit reference count with that of the fence. It also
replaces the 'is completed' test with the fence's equivalent. Currently, that
simply chains on to the original request implementation. A future patch will
improve this.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 45 +++++++++++++-------------
 drivers/gpu/drm/i915/i915_gem.c         | 56 ++++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |  1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 ++
 5 files changed, 78 insertions(+), 28 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index afcca15..fe2c1af 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -50,6 +50,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2134,7 +2135,17 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/*
+	 * Underlying object for implementing the signal/wait stuff.
+	 * NB: Never call fence_later() or return this fence object to user
+	 * land! Due to lazy allocation, scheduler re-ordering, pre-emption,
+	 * etc., there is no guarantee at all about the validity or
+	 * sequentiality of the fence's seqno! It is also unsafe to let
+	 * anything outside of the i915 driver get hold of the fence object
+	 * as the clean up when decrementing the reference count requires
+	 * holding the driver mutex lock.
+	 */
+	struct fence fence;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2211,7 +2222,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2231,7 +2248,7 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
@@ -2239,7 +2256,7 @@ static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void
@@ -2251,7 +2268,7 @@ i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
 		return;
 
 	dev = req->ring->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
+	if (kref_put_mutex(&req->fence.refcount, fence_release, &dev->struct_mutex))
 		mutex_unlock(&dev->struct_mutex);
 }
 
@@ -2268,12 +2285,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -2830,18 +2841,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	u32 seqno;
-
-	BUG_ON(req == NULL);
-
-	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
-	return i915_seqno_passed(seqno, req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 45f0460..5fc44bf 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2610,10 +2610,10 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
 	struct intel_context *ctx = req->ctx;
 
 	if (req->file_priv)
@@ -2633,6 +2633,47 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915_request";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	return req->ring->name;
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	BUG_ON(req == NULL);
+
+	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+};
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2654,7 +2695,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->ring = ring;
 	req->ctx  = ctx;
@@ -2669,6 +2709,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -5031,7 +5073,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i, j;
+	int ret, i, j, fence_base;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -5083,12 +5125,16 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 	}
 
+	fence_base = fence_context_alloc(I915_NUM_RINGS);
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
+		ring->fence_context = fence_base + i;
+
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e998a54..3dc7e38 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1651,6 +1651,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 7c5b4c2..cb63e2f 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2030,6 +2030,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 304cac4..8ad25b0 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -338,6 +338,9 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	unsigned fence_context;
+	spinlock_t fence_lock;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2015-06-26 12:58 [PATCH 0/4] Convert requests to use struct fence John.C.Harrison
  2015-06-26 12:58 ` [PATCH 1/4] drm/i915: " John.C.Harrison
@ 2015-06-26 12:58 ` John.C.Harrison
  2015-06-26 12:58 ` [PATCH 3/4] drm/i915: Interrupt driven fences John.C.Harrison
  2015-06-26 12:58 ` [PATCH 4/4] drm/i915: Updated request structure tracing John.C.Harrison
  3 siblings, 0 replies; 9+ messages in thread
From: John.C.Harrison @ 2015-06-26 12:58 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means that the
lazy coherency flag is no longer used. This can now be removed to simplify the
interface.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 18 +++++++++---------
 drivers/gpu/drm/i915/i915_irq.c      |  2 +-
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 6 files changed, 15 insertions(+), 16 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 114aa13..3e19d04 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -588,7 +588,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   ring->get_seqno(ring, true),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fe2c1af..2e6c151 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2223,8 +2223,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5fc44bf..0ae76b4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1179,7 +1179,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 	timeout = jiffies + 1;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			return 0;
 
 		if (time_after_eq(jiffies, timeout))
@@ -1187,7 +1187,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req)
 
 		cpu_relax_lowlatency();
 	}
-	if (i915_gem_request_completed(req, false))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	return -EAGAIN;
@@ -1231,7 +1231,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = timeout ?
@@ -1271,7 +1271,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2753,7 +2753,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2896,7 +2896,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -2920,7 +2920,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_completed(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
@@ -3026,7 +3026,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3138,7 +3138,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 9e5aea6..bab2ca2 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2499,7 +2499,7 @@ static bool
 ring_idle(struct intel_engine_cs *ring)
 {
 	return (list_empty(&ring->request_list) ||
-		i915_gem_request_completed(ring_last_request(ring), false));
+		i915_gem_request_completed(ring_last_request(ring)));
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index bf630a2..20e6675 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11292,7 +11292,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 32ff034..1c8e2b2 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -6877,7 +6877,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->ring->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -6893,7 +6893,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 3/4] drm/i915: Interrupt driven fences
  2015-06-26 12:58 [PATCH 0/4] Convert requests to use struct fence John.C.Harrison
  2015-06-26 12:58 ` [PATCH 1/4] drm/i915: " John.C.Harrison
  2015-06-26 12:58 ` [PATCH 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2015-06-26 12:58 ` John.C.Harrison
  2015-06-26 13:34   ` Chris Wilson
  2015-06-26 12:58 ` [PATCH 4/4] drm/i915: Updated request structure tracing John.C.Harrison
  3 siblings, 1 reply; 9+ messages in thread
From: John.C.Harrison @ 2015-06-26 12:58 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status should be
set on demand rather than polled. That is, there should not be a need for a
'signaled' function to be called everytime the status is queried. Instead,
'something' should be done to enable a signal callback from the hardware which
will update the state directly. In the case of requests, this is the seqno
update interrupt. The idea is that this callback will only be enabled on demand
when something actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback scheme.
Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke me' list
when a new seqno pops out and signals any matching fence/request. The fence is
then removed from the list so the entire request stack does not need to be
scanned every time. Note that the fence is added to the list before the commands
to generate the seqno interrupt are added to the ring. Thus the sequence is
guaranteed to be race free if the interrupt is already enabled.

One complication here is that the 'poke me' system requires holding a reference
count on the request to guarantee that it won't be freed prematurely.
Unfortunately, it is unsafe to decrement the reference count from the interrupt
handler because if that is the last reference, the clean up code gets run and
the clean up code is not IRQ friendly. Hence, the request is added to a 'please
clean me' list that gets processed at retire time. Any request in this list
simply has its count decremented and is then removed from that list.

Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
called). Thus there is still a potential race when enabling the interrupt as the
request may already have completed. However, this is simply solved by calling
the interrupt processing code immediately after enabling the interrupt and
thereby checking for already completed requests.

Lastly, the ring clean up code has the possibility to cancel outstanding
requests (e.g. because TDR has reset the ring). These requests will never get
signalled and so must be removed from the signal list manually. This is done by
setting a 'cancelled' flag and then calling the regular notify/retire code path
rather than attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the cancellation
request might occur after/during the completion interrupt actually arriving.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   8 ++
 drivers/gpu/drm/i915/i915_gem.c         | 136 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 143 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2e6c151..c1f69cc 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2146,6 +2146,10 @@ struct drm_i915_gem_request {
 	 * holding the driver mutex lock.
 	 */
 	struct fence fence;
+	struct list_head signal_list;
+	struct list_head unsignal_list;
+	bool cancelled;
+	bool irq_enabled;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2223,6 +2227,10 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+void i915_gem_request_submit(struct drm_i915_gem_request *req);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req);
+void i915_gem_request_notify(struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 0ae76b4..8aec326 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1231,6 +1231,11 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	i915_gem_request_enable_interrupt(req);
+
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1391,6 +1396,10 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/* In case the request is still in the signal pending list */
+	if (!list_empty(&request->signal_list))
+		request->cancelled = true;
+
 	i915_gem_request_unreference(request);
 }
 
@@ -2529,6 +2538,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = ring->emit_request(request);
 	else {
@@ -2630,6 +2645,9 @@ static void i915_gem_request_free(struct fence *req_fence)
 		i915_gem_context_unreference(ctx);
 	}
 
+	if (req->irq_enabled)
+		req->ring->irq_put(req->ring);
+
 	kmem_cache_free(req->i915->requests, req);
 }
 
@@ -2645,31 +2663,100 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 	return req->ring->name;
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request has been submitted to the hardware so add the fence to the
+ * list of signalable fences.
+ *
+ * NB: This does not enable interrupts yet. That only occurs on demand when
+ * the request is actually waited on. However, adding it to the list early
+ * ensures that there is no race condition where the interrupt could pop
+ * out prematurely and thus be completely lost. The race is merely that the
+ * interrupt must be manually checked for after being enabled.
+ */
+void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	fence_enable_sw_signaling(&req->fence);
+}
+
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req)
+{
+	if (req->irq_enabled)
+		return;
+
+	WARN_ON(!req->ring->irq_get(req->ring));
+	req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->ring);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	i915_gem_request_reference(req);
+	WARN_ON(!list_empty(&req->signal_list));
+	list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
+
+	/*
+	 * Note that signalling is always enabled for every request before
+	 * that request is submitted to the hardware. Therefore there is
+	 * no race condition whereby the signal could pop out before the
+	 * request has been added to the list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 *
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *ring)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-	BUG_ON(req == NULL);
+	if (list_empty(&ring->fence_signal_list))
+		return;
+
+	seqno = ring->get_seqno(ring, false);
+
+	spin_lock_irqsave(&ring->fence_lock, flags);
+	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				continue;
 
-	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+			fence_signal_locked(&req->fence);
+		}
+
+		list_del_init(&req->signal_list);
+		if (req->irq_enabled) {
+			req->ring->irq_put(req->ring);
+			req->irq_enabled = false;
+		}
 
-	return i915_seqno_passed(seqno, req->seqno);
+		list_add_tail(&req->unsignal_list, &req->ring->fence_unsignal_list);
+	}
+	spin_unlock_irqrestore(&ring->fence_lock, flags);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_free,
 };
@@ -2709,6 +2796,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		goto err;
 	}
 
+	INIT_LIST_HEAD(&req->signal_list);
 	fence_init(&req->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, req->seqno);
 
 	/*
@@ -2829,6 +2917,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 
 		i915_gem_request_retire(request);
 	}
+
+	/*
+	 * Make sure any requests that were on the signal pending list get
+	 * cleaned up.
+	 */
+	i915_gem_request_notify(ring);
+	i915_gem_retire_requests_ring(ring);
 }
 
 void i915_gem_restore_fences(struct drm_device *dev)
@@ -2884,6 +2979,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 {
 	WARN_ON(i915_verify_lists(ring->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(ring);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -2925,6 +3027,20 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	while (!list_empty(&ring->fence_unsignal_list)) {
+		struct drm_i915_gem_request *request;
+		unsigned long flags;
+
+		spin_lock_irqsave(&ring->fence_lock, flags);
+		request = list_first_entry(&ring->fence_unsignal_list,
+					   struct drm_i915_gem_request,
+					   unsignal_list);
+		list_del(&request->unsignal_list);
+		spin_unlock_irqrestore(&ring->fence_lock, flags);
+
+		i915_gem_request_unreference(request);
+	}
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
@@ -5256,6 +5372,8 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 }
 
 void i915_init_vm(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index bab2ca2..3390943 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -853,6 +853,8 @@ static void notify_ring(struct intel_engine_cs *ring)
 
 	trace_i915_gem_request_notify(ring);
 
+	i915_gem_request_notify(ring);
+
 	wake_up_all(&ring->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3dc7e38..b4e45c5 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1651,6 +1651,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	init_waitqueue_head(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index cb63e2f..4b2b669 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2030,6 +2030,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	spin_lock_init(&ring->fence_lock);
 	i915_gem_batch_pool_init(dev, &ring->batch_pool);
 	ringbuf->size = 32 * PAGE_SIZE;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 8ad25b0..3491b48 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -341,6 +341,8 @@ struct  intel_engine_cs {
 
 	unsigned fence_context;
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
+	struct list_head fence_unsignal_list;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 4/4] drm/i915: Updated request structure tracing
  2015-06-26 12:58 [PATCH 0/4] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2015-06-26 12:58 ` [PATCH 3/4] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-06-26 12:58 ` John.C.Harrison
  2015-06-28 15:07   ` shuang.he
  3 siblings, 1 reply; 9+ messages in thread
From: John.C.Harrison @ 2015-06-26 12:58 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is signaled
as complete. Also moved the notify event from the IRQ handler code to inside the
notify function itself.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   | 3 +++
 drivers/gpu/drm/i915/i915_irq.c   | 2 --
 drivers/gpu/drm/i915/i915_trace.h | 7 +++++--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 8aec326..ded5609 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2728,6 +2728,8 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 	unsigned long flags;
 	u32 seqno;
 
+	trace_i915_gem_request_notify(ring);
+
 	if (list_empty(&ring->fence_signal_list))
 		return;
 
@@ -2740,6 +2742,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 				continue;
 
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
 		}
 
 		list_del_init(&req->signal_list);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 3390943..8083d2f 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -851,8 +851,6 @@ static void notify_ring(struct intel_engine_cs *ring)
 	if (!intel_ring_initialized(ring))
 		return;
 
-	trace_i915_gem_request_notify(ring);
-
 	i915_gem_request_notify(ring);
 
 	wake_up_all(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 63328b6..e03d6fc 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -532,16 +532,19 @@ TRACE_EVENT(i915_gem_request_notify,
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
 			   __entry->seqno = ring->get_seqno(ring, false);
+			   __entry->is_empty = list_empty(&ring->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] drm/i915: Interrupt driven fences
  2015-06-26 12:58 ` [PATCH 3/4] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-06-26 13:34   ` Chris Wilson
  2015-06-26 17:00     ` John Harrison
  0 siblings, 1 reply; 9+ messages in thread
From: Chris Wilson @ 2015-06-26 13:34 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Jun 26, 2015 at 01:58:11PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
> 
> This change removes the polling test and replaces it with the callback scheme.
> Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke me' list
> when a new seqno pops out and signals any matching fence/request. The fence is
> then removed from the list so the entire request stack does not need to be
> scanned every time. Note that the fence is added to the list before the commands
> to generate the seqno interrupt are added to the ring. Thus the sequence is
> guaranteed to be race free if the interrupt is already enabled.
> 
> One complication here is that the 'poke me' system requires holding a reference
> count on the request to guarantee that it won't be freed prematurely.
> Unfortunately, it is unsafe to decrement the reference count from the interrupt
> handler because if that is the last reference, the clean up code gets run and
> the clean up code is not IRQ friendly. Hence, the request is added to a 'please
> clean me' list that gets processed at retire time. Any request in this list
> simply has its count decremented and is then removed from that list.
> 
> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> called). Thus there is still a potential race when enabling the interrupt as the
> request may already have completed. However, this is simply solved by calling
> the interrupt processing code immediately after enabling the interrupt and
> thereby checking for already completed requests.
> 
> Lastly, the ring clean up code has the possibility to cancel outstanding
> requests (e.g. because TDR has reset the ring). These requests will never get
> signalled and so must be removed from the signal list manually. This is done by
> setting a 'cancelled' flag and then calling the regular notify/retire code path
> rather than attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the cancellation
> request might occur after/during the completion interrupt actually arriving.

-nightly nop:
Time to exec x 1:                15.000µs (ring=render)
Time to exec x 1:                 2.000µs (ring=blt)
Time to exec x 131072:            1.827µs (ring=render)
Time to exec x 131072:            1.555µs (ring=blt)

rq tuning patches nop:
Time to exec x 1:		 12.200µs (ring=render)
Time to exec x 1:		  1.600µs (ring=blt)
Time to exec x 131072:		  1.516µs (ring=render)
Time to exec x 131072:		  0.812µs (ring=blt)

interrupt driven nop:
Time to exec x 1:		 19.200µs (ring=render)
Time to exec x 1:		  5.200µs (ring=blt)
Time to exec x 131072:		  2.381µs (ring=render)
Time to exec x 131072:		  2.009µs (ring=blt)

So the basic question that is left unanswered from last time is why
would we want to slow down __i915_wait_request? And enabling IRQs still
generates very high system load when processing the 30-40k IRQs per
second found under some workloads.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] drm/i915: Interrupt driven fences
  2015-06-26 13:34   ` Chris Wilson
@ 2015-06-26 17:00     ` John Harrison
  2015-06-26 17:19       ` Chris Wilson
  0 siblings, 1 reply; 9+ messages in thread
From: John Harrison @ 2015-06-26 17:00 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 26/06/2015 14:34, Chris Wilson wrote:
> On Fri, Jun 26, 2015 at 01:58:11PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The intended usage model for struct fence is that the signalled status should be
>> set on demand rather than polled. That is, there should not be a need for a
>> 'signaled' function to be called everytime the status is queried. Instead,
>> 'something' should be done to enable a signal callback from the hardware which
>> will update the state directly. In the case of requests, this is the seqno
>> update interrupt. The idea is that this callback will only be enabled on demand
>> when something actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the callback scheme.
>> Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 'poke me' list
>> when a new seqno pops out and signals any matching fence/request. The fence is
>> then removed from the list so the entire request stack does not need to be
>> scanned every time. Note that the fence is added to the list before the commands
>> to generate the seqno interrupt are added to the ring. Thus the sequence is
>> guaranteed to be race free if the interrupt is already enabled.
>>
>> One complication here is that the 'poke me' system requires holding a reference
>> count on the request to guarantee that it won't be freed prematurely.
>> Unfortunately, it is unsafe to decrement the reference count from the interrupt
>> handler because if that is the last reference, the clean up code gets run and
>> the clean up code is not IRQ friendly. Hence, the request is added to a 'please
>> clean me' list that gets processed at retire time. Any request in this list
>> simply has its count decremented and is then removed from that list.
>>
>> Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
>> called). Thus there is still a potential race when enabling the interrupt as the
>> request may already have completed. However, this is simply solved by calling
>> the interrupt processing code immediately after enabling the interrupt and
>> thereby checking for already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel outstanding
>> requests (e.g. because TDR has reset the ring). These requests will never get
>> signalled and so must be removed from the signal list manually. This is done by
>> setting a 'cancelled' flag and then calling the regular notify/retire code path
>> rather than attempting to duplicate the list manipulatation and clean up code in
>> multiple places. This also avoid any race condition where the cancellation
>> request might occur after/during the completion interrupt actually arriving.
> -nightly nop:
> Time to exec x 1:                15.000µs (ring=render)
> Time to exec x 1:                 2.000µs (ring=blt)
> Time to exec x 131072:            1.827µs (ring=render)
> Time to exec x 131072:            1.555µs (ring=blt)
>
> rq tuning patches nop:
> Time to exec x 1:		 12.200µs (ring=render)
> Time to exec x 1:		  1.600µs (ring=blt)
> Time to exec x 131072:		  1.516µs (ring=render)
> Time to exec x 131072:		  0.812µs (ring=blt)
>
> interrupt driven nop:
> Time to exec x 1:		 19.200µs (ring=render)
> Time to exec x 1:		  5.200µs (ring=blt)
> Time to exec x 131072:		  2.381µs (ring=render)
> Time to exec x 131072:		  2.009µs (ring=blt)
>
> So the basic question that is left unanswered from last time is why
> would we want to slow down __i915_wait_request? And enabling IRQs still
> generates very high system load when processing the 30-40k IRQs per
> second found under some workloads.
> -Chris
>
As previously stated, the scheduler requires enabling interrupts for 
each batch buffer as it needs to know when something more needs sending 
to the hardware. Android requires enabling interrupts for each batch 
buffer as it uses the sync framework to wait on batch buffer completion 
asynchronously to the driver (i.e. without calling __i915_wait_request 
or any other driver code). I presume much of the slow down to 
wait_request itself is because it has to check for missed interrupts. It 
should be possible to optimise that somewhat although it was completely 
unnecessary in the original version as you can't miss interrupts if they 
are always on.

How do you get consistent results from gem_exec_nop? For the x1 case, I 
see random variation from one run to the next of the order of 10us -> 
over 100us. And that is with a straight nightly build.

John.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 3/4] drm/i915: Interrupt driven fences
  2015-06-26 17:00     ` John Harrison
@ 2015-06-26 17:19       ` Chris Wilson
  0 siblings, 0 replies; 9+ messages in thread
From: Chris Wilson @ 2015-06-26 17:19 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Fri, Jun 26, 2015 at 06:00:03PM +0100, John Harrison wrote:
> On 26/06/2015 14:34, Chris Wilson wrote:
> >On Fri, Jun 26, 2015 at 01:58:11PM +0100, John.C.Harrison@Intel.com wrote:
> >>From: John Harrison <John.C.Harrison@Intel.com>
> >>
> >>The intended usage model for struct fence is that the signalled status should be
> >>set on demand rather than polled. That is, there should not be a need for a
> >>'signaled' function to be called everytime the status is queried. Instead,
> >>'something' should be done to enable a signal callback from the hardware which
> >>will update the state directly. In the case of requests, this is the seqno
> >>update interrupt. The idea is that this callback will only be enabled on demand
> >>when something actually tries to wait on the fence.
> >>
> >>This change removes the polling test and replaces it with the callback scheme.
> >>Each fence is added to a 'please poke me' list at the start of
> >>i915_add_request(). The interrupt handler then scans through the 'poke me' list
> >>when a new seqno pops out and signals any matching fence/request. The fence is
> >>then removed from the list so the entire request stack does not need to be
> >>scanned every time. Note that the fence is added to the list before the commands
> >>to generate the seqno interrupt are added to the ring. Thus the sequence is
> >>guaranteed to be race free if the interrupt is already enabled.
> >>
> >>One complication here is that the 'poke me' system requires holding a reference
> >>count on the request to guarantee that it won't be freed prematurely.
> >>Unfortunately, it is unsafe to decrement the reference count from the interrupt
> >>handler because if that is the last reference, the clean up code gets run and
> >>the clean up code is not IRQ friendly. Hence, the request is added to a 'please
> >>clean me' list that gets processed at retire time. Any request in this list
> >>simply has its count decremented and is then removed from that list.
> >>
> >>Note that the interrupt is only enabled on demand (i.e. when __wait_request() is
> >>called). Thus there is still a potential race when enabling the interrupt as the
> >>request may already have completed. However, this is simply solved by calling
> >>the interrupt processing code immediately after enabling the interrupt and
> >>thereby checking for already completed requests.
> >>
> >>Lastly, the ring clean up code has the possibility to cancel outstanding
> >>requests (e.g. because TDR has reset the ring). These requests will never get
> >>signalled and so must be removed from the signal list manually. This is done by
> >>setting a 'cancelled' flag and then calling the regular notify/retire code path
> >>rather than attempting to duplicate the list manipulatation and clean up code in
> >>multiple places. This also avoid any race condition where the cancellation
> >>request might occur after/during the completion interrupt actually arriving.
> >-nightly nop:
> >Time to exec x 1:                15.000µs (ring=render)
> >Time to exec x 1:                 2.000µs (ring=blt)
> >Time to exec x 131072:            1.827µs (ring=render)
> >Time to exec x 131072:            1.555µs (ring=blt)
> >
> >rq tuning patches nop:
> >Time to exec x 1:		 12.200µs (ring=render)
> >Time to exec x 1:		  1.600µs (ring=blt)
> >Time to exec x 131072:		  1.516µs (ring=render)
> >Time to exec x 131072:		  0.812µs (ring=blt)
> >
> >interrupt driven nop:
> >Time to exec x 1:		 19.200µs (ring=render)
> >Time to exec x 1:		  5.200µs (ring=blt)
> >Time to exec x 131072:		  2.381µs (ring=render)
> >Time to exec x 131072:		  2.009µs (ring=blt)
> >
> >So the basic question that is left unanswered from last time is why
> >would we want to slow down __i915_wait_request? And enabling IRQs still
> >generates very high system load when processing the 30-40k IRQs per
> >second found under some workloads.
> >-Chris
> >
> As previously stated, the scheduler requires enabling interrupts for
> each batch buffer as it needs to know when something more needs
> sending to the hardware. Android requires enabling interrupts for
> each batch buffer as it uses the sync framework to wait on batch
> buffer completion asynchronously to the driver (i.e. without calling
> __i915_wait_request or any other driver code). I presume much of the
> slow down to wait_request itself is because it has to check for
> missed interrupts. It should be possible to optimise that somewhat
> although it was completely unnecessary in the original version as
> you can't miss interrupts if they are always on.

That discussion is missing from the changelog, which is where any
discussion addressing review should be if not acted upon, and very much
misses why we should be introducing regressions.

> How do you get consistent results from gem_exec_nop? For the x1
> case, I see random variation from one run to the next of the order
> of 10us -> over 100us. And that is with a straight nightly build.

I have never seen variance of that magnitude. That strongly suggests
something is suspect in your environment and would be worth tracking
down. The first run should closely match the final run on each ring.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 4/4] drm/i915: Updated request structure tracing
  2015-06-26 12:58 ` [PATCH 4/4] drm/i915: Updated request structure tracing John.C.Harrison
@ 2015-06-28 15:07   ` shuang.he
  0 siblings, 0 replies; 9+ messages in thread
From: shuang.he @ 2015-06-28 15:07 UTC (permalink / raw)
  To: shuang.he, lei.a.liu, intel-gfx, John.C.Harrison

Tested-By: Intel Graphics QA PRTS (Patch Regression Test System Contact: shuang.he@intel.com)
Task id: 6615
-------------------------------------Summary-------------------------------------
Platform          Delta          drm-intel-nightly          Series Applied
ILK                 -8              302/302              294/302
SNB                                  312/316              312/316
IVB                                  343/343              343/343
BYT                 -3              287/287              284/287
HSW                                  380/380              380/380
-------------------------------------Detailed-------------------------------------
Platform  Test                                drm-intel-nightly          Series Applied
*ILK  igt@gem_persistent_relocs@forked-faulting-reloc-thrashing      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)drm:i915_context_is_banned[i915]]*ERROR*gpu_hanging_too_fast,banning@gpu hanging too
*ILK  igt@gem_persistent_relocs@forked-interruptible-thrashing      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)drm:i915_context_is_banned[i915]]*ERROR*gpu_hanging_too_fast,banning@gpu hanging too
*ILK  igt@gem_persistent_relocs@forked-thrashing      PASS(1)      TIMEOUT(1)
*ILK  igt@gem_reloc_vs_gpu@forked-faulting-reloc-thrashing      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)drm:i915_context_is_banned[i915]]*ERROR*gpu_hanging_too_fast,banning@gpu hanging too
*ILK  igt@gem_reloc_vs_gpu@forked-interruptible-faulting-reloc-thrashing      PASS(1)      TIMEOUT(1)
*ILK  igt@gem_reloc_vs_gpu@forked-interruptible-thrashing      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)drm:i915_context_is_banned[i915]]*ERROR*gpu_hanging_too_fast,banning@gpu hanging too
*ILK  igt@gem_reloc_vs_gpu@forked-thrashing      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)drm:i915_context_is_banned[i915]]*ERROR*gpu_hanging_too_fast,banning@gpu hanging too
*ILK  igt@gem_seqno_wrap      PASS(1)      DMESG_WARN(1)
(dmesg patch applied)drm:i915_hangcheck_elapsed[i915]]*ERROR*Hangcheck_timer_elapsed...bsd_ring_idle@Hangcheck timer elapsed... bsd ring idle
*BYT  igt@gem_partial_pwrite_pread@reads      PASS(1)      FAIL(1)
*BYT  igt@gem_partial_pwrite_pread@reads-display      PASS(1)      FAIL(1)
*BYT  igt@gem_partial_pwrite_pread@reads-uncached      PASS(1)      FAIL(1)
Note: You need to pay more attention to line start with '*'
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2015-06-28 15:07 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-26 12:58 [PATCH 0/4] Convert requests to use struct fence John.C.Harrison
2015-06-26 12:58 ` [PATCH 1/4] drm/i915: " John.C.Harrison
2015-06-26 12:58 ` [PATCH 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2015-06-26 12:58 ` [PATCH 3/4] drm/i915: Interrupt driven fences John.C.Harrison
2015-06-26 13:34   ` Chris Wilson
2015-06-26 17:00     ` John Harrison
2015-06-26 17:19       ` Chris Wilson
2015-06-26 12:58 ` [PATCH 4/4] drm/i915: Updated request structure tracing John.C.Harrison
2015-06-28 15:07   ` shuang.he

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.