All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Convert requests to use struct fence
@ 2015-03-20 17:48 John.C.Harrison
  2015-03-20 17:48 ` [RFC 1/4] drm/i915: " John.C.Harrison
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: John.C.Harrison @ 2015-03-20 17:48 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is intended
to keep track of work that is executed on hardware. I.e. it solves the basic
problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
request structure does quite a lot more than simply track the execution progress
so is very definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain all the
advantages that provides.

This is work that was planned since the conversion of the driver from being
seqno value based to being request structure based. This patch series does that
work.

The set is being posted as an RFC. It is built on top of the OLR removal patch
series, so can't be accepted upstream until that series has gone. However, it
would be useful to at least get the design review process going on these patches
while the OLR patches are working through the technical review process.

[Patches against drm-intel-nightly tree fetched 18/03/2015 with Anti-OLR patches
ontop]

John Harrison (4):
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing

 drivers/gpu/drm/i915/i915_debugfs.c     |    2 +-
 drivers/gpu/drm/i915/i915_drv.h         |   41 +++++----
 drivers/gpu/drm/i915/i915_gem.c         |  140 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |    5 +-
 drivers/gpu/drm/i915/i915_trace.h       |    7 +-
 drivers/gpu/drm/i915/intel_display.c    |    2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |    3 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |    3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |    5 ++
 9 files changed, 168 insertions(+), 40 deletions(-)

-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [RFC 1/4] drm/i915: Convert requests to use struct fence
  2015-03-20 17:48 [RFC 0/4] Convert requests to use struct fence John.C.Harrison
@ 2015-03-20 17:48 ` John.C.Harrison
  2015-04-07  9:18   ` [RFC, " Maarten Lankhorst
  2015-03-20 17:48 ` [RFC 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 17+ messages in thread
From: John.C.Harrison @ 2015-03-20 17:48 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is intended
to keep track of work that is executed on hardware. I.e. it solves the basic
problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
request structure does quite a lot more than simply track the execution progress
so is very definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain all the
advantages that provides.

This patch makes the first step of integrating a struct fence into the request.
It replaces the explicit reference count with that of the fence. It also
replaces the 'is completed' test with the fence's equivalent. Currently, that
simply chains on to the original request implementation. A future patch will
improve this.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   37 +++++++++------------
 drivers/gpu/drm/i915/i915_gem.c         |   55 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |    1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
 5 files changed, 70 insertions(+), 27 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index ce3a536..7dcaf8c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -50,6 +50,7 @@
 #include <linux/intel-iommu.h>
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
+#include <linux/fence.h>
 
 /* General customization:
  */
@@ -2048,7 +2049,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/** Underlying object for implementing the signal/wait stuff.
+	  * NB: Never call fence_later()! Due to lazy allocation, scheduler
+	  * re-ordering, pre-emption, etc., there is no guarantee at all
+	  * about the validity or sequentialiaty of the fence's seqno! */
+	struct fence fence;
 
 	/** On Which ring this request was generated */
 	struct intel_engine_cs *ring;
@@ -2126,7 +2131,13 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 void i915_gem_request_remove_from_client(struct drm_i915_gem_request *request);
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
@@ -2146,14 +2157,14 @@ i915_gem_request_get_ring(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
-	kref_get(&req->ref);
+	fence_get(&req->fence);
 }
 
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
 	WARN_ON(!mutex_is_locked(&req->ring->dev->struct_mutex));
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
@@ -2168,12 +2179,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 	*pdst = src;
 }
 
-/*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
 struct drm_i915_file_private {
 	struct drm_i915_private *dev_priv;
 	struct drm_file *file;
@@ -2691,18 +2696,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	u32 seqno;
-
-	BUG_ON(req == NULL);
-
-	seqno = req->ring->get_seqno(req->ring, lazy_coherency);
-
-	return i915_seqno_passed(seqno, req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 int __must_check i915_gem_object_get_fence(struct drm_i915_gem_object *obj);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4eb7fc2..5ede297 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2495,10 +2495,10 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
 	i915_gem_request_unreference(request);
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free(struct fence *req_fence)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
 	struct intel_context *ctx = req->ctx;
 
 	if (req->file_priv)
@@ -2518,6 +2518,46 @@ void i915_gem_request_free(struct kref *req_ref)
 	kfree(req);
 }
 
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915_request";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	return req->ring->name;
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	WARN(true, "Is this required?");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	BUG_ON(req == NULL);
+
+	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+};
+
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct intel_context *ctx,
 			   struct drm_i915_gem_request **req_out)
@@ -2541,7 +2581,6 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		return ret;
 	}
 
-	kref_init(&request->ref);
 	request->ring = ring;
 	request->uniq = dev_private->request_uniq++;
 	request->ctx  = ctx;
@@ -2557,6 +2596,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		return ret;
 	}
 
+	fence_init(&request->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, request->seqno);
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
@@ -4825,7 +4866,7 @@ i915_gem_init_hw(struct drm_device *dev)
 {
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct intel_engine_cs *ring;
-	int ret, i;
+	int ret, i, fence_base;
 
 	if (INTEL_INFO(dev)->gen < 6 && !intel_enable_gtt())
 		return -EIO;
@@ -4877,12 +4918,16 @@ i915_gem_init_hw(struct drm_device *dev)
 			goto out;
 	}
 
+	fence_base = fence_context_alloc(I915_NUM_RINGS);
+
 	/* Now it is safe to go back round and do everything else: */
 	for_each_ring(ring, dev_priv, i) {
 		struct drm_i915_gem_request *req;
 
 		WARN_ON(!ring->default_context);
 
+		ring->fence_context = fence_base + i;
+
 		ret = i915_gem_request_alloc(ring, ring->default_context, &req);
 		if (ret) {
 			i915_gem_cleanup_ringbuffer(dev);
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ae00054..c1072b1 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1337,6 +1337,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	spin_lock_init(&ring->fence_lock);
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 6099fce..fd65c0d 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1981,6 +1981,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	spin_lock_init(&ring->fence_lock);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
 	memset(ring->semaphore.sync_seqno, 0, sizeof(ring->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 68097c1..a0ce08e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -308,6 +308,9 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	unsigned fence_context;
+	spinlock_t fence_lock;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2015-03-20 17:48 [RFC 0/4] Convert requests to use struct fence John.C.Harrison
  2015-03-20 17:48 ` [RFC 1/4] drm/i915: " John.C.Harrison
@ 2015-03-20 17:48 ` John.C.Harrison
  2015-04-17 18:57   ` Dave Gordon
  2015-03-20 17:48 ` [RFC 3/4] drm/i915: Interrupt driven fences John.C.Harrison
  2015-03-20 17:48 ` [RFC 4/4] drm/i915: Updated request structure tracing John.C.Harrison
  3 siblings, 1 reply; 17+ messages in thread
From: John.C.Harrison @ 2015-03-20 17:48 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means that the
lazy coherency flag is no longer used. This can now be removed to simplify the
interface.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |    2 +-
 drivers/gpu/drm/i915/i915_drv.h      |    3 +--
 drivers/gpu/drm/i915/i915_gem.c      |   14 +++++++-------
 drivers/gpu/drm/i915/i915_irq.c      |    2 +-
 drivers/gpu/drm/i915/intel_display.c |    2 +-
 5 files changed, 11 insertions(+), 12 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index d70b9d3..678459c 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -575,7 +575,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   ring->get_seqno(ring, true),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 7dcaf8c..28b3c3c 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2132,8 +2132,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ede297..b1cde7d 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1219,7 +1219,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = timeout ?
@@ -1256,7 +1256,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2237,7 +2237,7 @@ i915_gem_object_retire(struct drm_i915_gem_object *obj)
 	if (obj->last_read_req == NULL)
 		return;
 
-	if (i915_gem_request_completed(obj->last_read_req, true))
+	if (i915_gem_request_completed(obj->last_read_req))
 		i915_gem_object_move_to_inactive(obj);
 }
 
@@ -2637,7 +2637,7 @@ i915_gem_find_active_request(struct intel_engine_cs *ring)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &ring->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -2781,7 +2781,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 				      struct drm_i915_gem_object,
 				      ring_list);
 
-		if (!i915_gem_request_completed(obj->last_read_req, true))
+		if (!i915_gem_request_completed(obj->last_read_req))
 			break;
 
 		i915_gem_object_move_to_inactive(obj);
@@ -2795,7 +2795,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		trace_i915_gem_request_retire(request);
@@ -2811,7 +2811,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_completed(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a495c41..cc2796b 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -2661,7 +2661,7 @@ static bool
 ring_idle(struct intel_engine_cs *ring)
 {
 	return (list_empty(&ring->request_list) ||
-		i915_gem_request_completed(ring_last_request(ring), false));
+		i915_gem_request_completed(ring_last_request(ring)));
 }
 
 static bool
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index f9ac3c2..e507303 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -9825,7 +9825,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-20 17:48 [RFC 0/4] Convert requests to use struct fence John.C.Harrison
  2015-03-20 17:48 ` [RFC 1/4] drm/i915: " John.C.Harrison
  2015-03-20 17:48 ` [RFC 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2015-03-20 17:48 ` John.C.Harrison
  2015-03-20 21:11   ` Chris Wilson
  2015-03-20 17:48 ` [RFC 4/4] drm/i915: Updated request structure tracing John.C.Harrison
  3 siblings, 1 reply; 17+ messages in thread
From: John.C.Harrison @ 2015-03-20 17:48 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status should be
set on demand rather than polled. That is, there should not be a need for a
'signaled' function to be called everytime the status is queried. Instead,
'something' should be done to enable a signal callback from the hardware which
will update the state directly. In the case of requests, this is the seqno
update interrupt. The idea is that this callback will only be enabled on demand
when something actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback scheme.
To avoid race conditions where signals can be sent before anyone is waiting for
them, it does not implement the callback on demand feature. When the GPU
scheduler arrives, it will need to know about the completion of every single
request anyway. So it is far simpler to not put in complex and messy anti-race
code in the first place given that it will not be needed in the future.

Instead, each fence is added to a 'please poke me' list at the start of
i915_add_request(). This happens before the commands to generate the seqno
interrupt are added to the ring thus is guaranteed to be race free. The
interrupt handler then scans through the 'poke me' list when a new seqno pops
out and signals any matching fence/request. The fence is then removed from the
list so the entire request stack does not need to be scanned every time.

The only complication here is that the 'poke me' system requires holding a
reference count on the request to guarantee that it won't be freed prematurely.
Unfortunately, it is unsafe to decrement the reference count from the interrupt
handler because if that is the last reference, the clean up code gets run and
the clean up code is not IRQ friendly. Hence, the request is added to a 'please
clean me' list that gets processed at retire time. Any request in this list
simply has its count decremented and is then removed from that list.

Lastly, the ring clean up code has the possibility to cancel outstanding
requests (e.g. because TDR has reset the ring). These requests will never get
signalled and so must be removed from the signal list manually. This is done by
setting a 'cancelled' flag and then calling the regular notify/retire code path
rather than attempting to duplicate the list manipulatation and clean up code in
multiple places.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |    5 ++
 drivers/gpu/drm/i915/i915_gem.c         |   84 ++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |    3 ++
 drivers/gpu/drm/i915/intel_lrc.c        |    2 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |    2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |    2 +
 6 files changed, 90 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 28b3c3c..ff662c9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2054,6 +2054,9 @@ struct drm_i915_gem_request {
 	  * re-ordering, pre-emption, etc., there is no guarantee at all
 	  * about the validity or sequentialiaty of the fence's seqno! */
 	struct fence fence;
+	struct list_head signal_list;
+	struct list_head unsignal_list;
+	bool cancelled;
 
 	/** On Which ring this request was generated */
 	struct intel_engine_cs *ring;
@@ -2132,6 +2135,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 			   struct drm_i915_gem_request **req_out);
 void i915_gem_request_cancel(struct drm_i915_gem_request *req);
 
+void i915_gem_request_notify(struct intel_engine_cs *ring);
+
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b1cde7d..27b8893 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2364,6 +2364,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	fence_enable_sw_signaling(&request->fence);
+
 	if (i915.enable_execlists)
 		ret = ring->emit_request(request);
 	else
@@ -2492,6 +2498,10 @@ static void i915_gem_free_request(struct drm_i915_gem_request *request)
 
 	put_pid(request->pid);
 
+	/* In case the request is still in the signal pending list */
+	if (!list_empty(&request->signal_list))
+		request->cancelled = true;
+
 	i915_gem_request_unreference(request);
 }
 
@@ -2532,28 +2542,62 @@ static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
 
 static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
-	WARN(true, "Is this required?");
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	bool was_empty;
+
+	was_empty = list_empty(&req->ring->fence_signal_list);
+	if (was_empty)
+		WARN_ON(!req->ring->irq_get(req->ring));
+
+	i915_gem_request_reference(req);
+	list_add_tail(&req->signal_list, &req->ring->fence_signal_list);
+
+	/*
+	 * Note that signalling is always enabled for every request before
+	 * that request is submitted to the hardware. Therefore there is
+	 * no race condition whereby the signal could pop out before the
+	 * request has been added to the list. Hence no need to check
+	 * for completion and undo to the list add and return false.
+	 */
+
 	return true;
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+void i915_gem_request_notify(struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_request *req = container_of(req_fence,
-						 typeof(*req), fence);
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-	BUG_ON(req == NULL);
+	if (list_empty(&ring->fence_signal_list))
+		return;
+
+	seqno = ring->get_seqno(ring, false);
+
+	spin_lock_irqsave(&ring->fence_lock, flags);
+	list_for_each_entry_safe(req, req_next, &ring->fence_signal_list, signal_list) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				continue;
 
-	seqno = req->ring->get_seqno(req->ring, false/*lazy_coherency*/);
+			fence_signal_locked(&req->fence);
+		}
+
+		list_del(&req->signal_list);
+		INIT_LIST_HEAD(&req->signal_list);
+		if (list_empty(&req->ring->fence_signal_list))
+			req->ring->irq_put(req->ring);
 
-	return i915_seqno_passed(seqno, req->seqno);
+		list_add_tail(&req->unsignal_list, &req->ring->fence_unsignal_list);
+	}
+	spin_unlock_irqrestore(&ring->fence_lock, flags);
 }
 
 static const struct fence_ops i915_gem_request_fops = {
 	.get_driver_name	= i915_gem_request_get_driver_name,
 	.get_timeline_name	= i915_gem_request_get_timeline_name,
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_free,
 };
@@ -2596,6 +2640,7 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
 		return ret;
 	}
 
+	INIT_LIST_HEAD(&request->signal_list);
 	fence_init(&request->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, request->seqno);
 
 	/*
@@ -2714,6 +2759,13 @@ static void i915_gem_reset_ring_cleanup(struct drm_i915_private *dev_priv,
 
 		i915_gem_free_request(request);
 	}
+
+	/*
+	 * Make sure any requests that were on the signal pending list get
+	 * cleaned up.
+	 */
+	i915_gem_request_notify(ring);
+	i915_gem_retire_requests_ring(ring);
 }
 
 void i915_gem_restore_fences(struct drm_device *dev)
@@ -2816,6 +2868,20 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
 
+	while (!list_empty(&ring->fence_unsignal_list)) {
+		struct drm_i915_gem_request *request;
+		unsigned long flags;
+
+		spin_lock_irqsave(&ring->fence_lock, flags);
+		request = list_first_entry(&ring->fence_unsignal_list,
+					   struct drm_i915_gem_request,
+					   unsignal_list);
+		list_del(&request->unsignal_list);
+		spin_unlock_irqrestore(&ring->fence_lock, flags);
+
+		i915_gem_request_unreference(request);
+	}
+
 	WARN_ON(i915_verify_lists(ring->dev));
 }
 
@@ -5049,6 +5115,8 @@ init_ring_lists(struct intel_engine_cs *ring)
 {
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 }
 
 void i915_init_vm(struct drm_i915_private *dev_priv,
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index cc2796b..d1cf226 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -994,6 +994,8 @@ static void notify_ring(struct drm_device *dev,
 
 	trace_i915_gem_request_notify(ring);
 
+	i915_gem_request_notify(ring);
+
 	wake_up_all(&ring->irq_queue);
 }
 
@@ -2959,6 +2961,7 @@ static void i915_hangcheck_elapsed(struct work_struct *work)
 			DRM_INFO("%s on %s\n",
 				 stuck[i] ? "stuck" : "no progress",
 				 ring->name);
+			trace_printk("%s:%d> \x1B[31;1m<%s> Borked: %s @ %d!\x1B[0m\n", __func__, __LINE__, ring->name, stuck[i] ? "stuck" : "no progress", ring->hangcheck.seqno);
 			rings_hung++;
 		}
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index c1072b1..d87126e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1337,6 +1337,8 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	ring->dev = dev;
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	spin_lock_init(&ring->fence_lock);
 	init_waitqueue_head(&ring->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index fd65c0d..9d7ad51 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1981,6 +1981,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&ring->active_list);
 	INIT_LIST_HEAD(&ring->request_list);
 	INIT_LIST_HEAD(&ring->execlist_queue);
+	INIT_LIST_HEAD(&ring->fence_signal_list);
+	INIT_LIST_HEAD(&ring->fence_unsignal_list);
 	spin_lock_init(&ring->fence_lock);
 	ringbuf->size = 32 * PAGE_SIZE;
 	ringbuf->ring = ring;
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index a0ce08e..7412fe4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -311,6 +311,8 @@ struct  intel_engine_cs {
 
 	unsigned fence_context;
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
+	struct list_head fence_unsignal_list;
 };
 
 bool intel_ring_initialized(struct intel_engine_cs *ring);
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [RFC 4/4] drm/i915: Updated request structure tracing
  2015-03-20 17:48 [RFC 0/4] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2015-03-20 17:48 ` [RFC 3/4] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-03-20 17:48 ` John.C.Harrison
  3 siblings, 0 replies; 17+ messages in thread
From: John.C.Harrison @ 2015-03-20 17:48 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is signaled
as complete. Also moved the notify event from the IRQ handler code to inside the
notify function itself.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |    3 +++
 drivers/gpu/drm/i915/i915_irq.c   |    2 --
 drivers/gpu/drm/i915/i915_trace.h |    7 +++++--
 3 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 27b8893..e3ed94a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2570,6 +2570,8 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 	unsigned long flags;
 	u32 seqno;
 
+	trace_i915_gem_request_notify(ring);
+
 	if (list_empty(&ring->fence_signal_list))
 		return;
 
@@ -2582,6 +2584,7 @@ void i915_gem_request_notify(struct intel_engine_cs *ring)
 				continue;
 
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
 		}
 
 		list_del(&req->signal_list);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index d1cf226..34e7933 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -992,8 +992,6 @@ static void notify_ring(struct drm_device *dev,
 	if (!intel_ring_initialized(ring))
 		return;
 
-	trace_i915_gem_request_notify(ring);
-
 	i915_gem_request_notify(ring);
 
 	wake_up_all(&ring->irq_queue);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 8ca536c..0a215d5 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -441,16 +441,19 @@ TRACE_EVENT(i915_gem_request_notify,
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = ring->dev->primary->index;
 			   __entry->ring = ring->id;
 			   __entry->seqno = ring->get_seqno(ring, false);
+			   __entry->is_empty = list_empty(&ring->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.7.9.5

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-20 17:48 ` [RFC 3/4] drm/i915: Interrupt driven fences John.C.Harrison
@ 2015-03-20 21:11   ` Chris Wilson
  2015-03-23  9:22     ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: Chris Wilson @ 2015-03-20 21:11 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The intended usage model for struct fence is that the signalled status should be
> set on demand rather than polled. That is, there should not be a need for a
> 'signaled' function to be called everytime the status is queried. Instead,
> 'something' should be done to enable a signal callback from the hardware which
> will update the state directly. In the case of requests, this is the seqno
> update interrupt. The idea is that this callback will only be enabled on demand
> when something actually tries to wait on the fence.
> 
> This change removes the polling test and replaces it with the callback scheme.
> To avoid race conditions where signals can be sent before anyone is waiting for
> them, it does not implement the callback on demand feature. When the GPU
> scheduler arrives, it will need to know about the completion of every single
> request anyway. So it is far simpler to not put in complex and messy anti-race
> code in the first place given that it will not be needed in the future.
> 
> Instead, each fence is added to a 'please poke me' list at the start of
> i915_add_request(). This happens before the commands to generate the seqno
> interrupt are added to the ring thus is guaranteed to be race free. The
> interrupt handler then scans through the 'poke me' list when a new seqno pops
> out and signals any matching fence/request. The fence is then removed from the
> list so the entire request stack does not need to be scanned every time.

No. Please let's not go back to the bad old days of generating an interrupt
per batch, and doing a lot more work inside the interrupt handler.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-20 21:11   ` Chris Wilson
@ 2015-03-23  9:22     ` Daniel Vetter
  2015-03-23 12:13       ` John Harrison
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Vetter @ 2015-03-23  9:22 UTC (permalink / raw)
  To: Chris Wilson, John.C.Harrison, Intel-GFX

On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote:
> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
> > From: John Harrison <John.C.Harrison@Intel.com>
> > 
> > The intended usage model for struct fence is that the signalled status should be
> > set on demand rather than polled. That is, there should not be a need for a
> > 'signaled' function to be called everytime the status is queried. Instead,
> > 'something' should be done to enable a signal callback from the hardware which
> > will update the state directly. In the case of requests, this is the seqno
> > update interrupt. The idea is that this callback will only be enabled on demand
> > when something actually tries to wait on the fence.
> > 
> > This change removes the polling test and replaces it with the callback scheme.
> > To avoid race conditions where signals can be sent before anyone is waiting for
> > them, it does not implement the callback on demand feature. When the GPU
> > scheduler arrives, it will need to know about the completion of every single
> > request anyway. So it is far simpler to not put in complex and messy anti-race
> > code in the first place given that it will not be needed in the future.
> > 
> > Instead, each fence is added to a 'please poke me' list at the start of
> > i915_add_request(). This happens before the commands to generate the seqno
> > interrupt are added to the ring thus is guaranteed to be race free. The
> > interrupt handler then scans through the 'poke me' list when a new seqno pops
> > out and signals any matching fence/request. The fence is then removed from the
> > list so the entire request stack does not need to be scanned every time.
> 
> No. Please let's not go back to the bad old days of generating an interrupt
> per batch, and doing a lot more work inside the interrupt handler.

Yeah, enable_signalling should be the place where we grab the interrupt
reference. Also that we shouldn't call this unconditionally, that pretty
much defeats the point of that fastpath optimization.

Another complication is missed interrupts. If we detect those and someone
calls enable_signalling then we need to fire up a timer to wake up once
per jiffy and save stuck fences. To avoid duplication with the threaded
wait code we could remove the fallback wakeups from there and just rely on
that timer everywhere.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-23  9:22     ` Daniel Vetter
@ 2015-03-23 12:13       ` John Harrison
  2015-03-26 13:22         ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: John Harrison @ 2015-03-23 12:13 UTC (permalink / raw)
  To: Daniel Vetter, Chris Wilson, Intel-GFX

On 23/03/2015 09:22, Daniel Vetter wrote:
> On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote:
>> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The intended usage model for struct fence is that the signalled status should be
>>> set on demand rather than polled. That is, there should not be a need for a
>>> 'signaled' function to be called everytime the status is queried. Instead,
>>> 'something' should be done to enable a signal callback from the hardware which
>>> will update the state directly. In the case of requests, this is the seqno
>>> update interrupt. The idea is that this callback will only be enabled on demand
>>> when something actually tries to wait on the fence.
>>>
>>> This change removes the polling test and replaces it with the callback scheme.
>>> To avoid race conditions where signals can be sent before anyone is waiting for
>>> them, it does not implement the callback on demand feature. When the GPU
>>> scheduler arrives, it will need to know about the completion of every single
>>> request anyway. So it is far simpler to not put in complex and messy anti-race
>>> code in the first place given that it will not be needed in the future.
>>>
>>> Instead, each fence is added to a 'please poke me' list at the start of
>>> i915_add_request(). This happens before the commands to generate the seqno
>>> interrupt are added to the ring thus is guaranteed to be race free. The
>>> interrupt handler then scans through the 'poke me' list when a new seqno pops
>>> out and signals any matching fence/request. The fence is then removed from the
>>> list so the entire request stack does not need to be scanned every time.
>> No. Please let's not go back to the bad old days of generating an interrupt
>> per batch, and doing a lot more work inside the interrupt handler.
> Yeah, enable_signalling should be the place where we grab the interrupt
> reference. Also that we shouldn't call this unconditionally, that pretty
> much defeats the point of that fastpath optimization.
>
> Another complication is missed interrupts. If we detect those and someone
> calls enable_signalling then we need to fire up a timer to wake up once
> per jiffy and save stuck fences. To avoid duplication with the threaded
> wait code we could remove the fallback wakeups from there and just rely on
> that timer everywhere.
> -Daniel

As has been discussed many times in many forums, the scheduler requires 
notification of each batch buffer's completion. It needs to know so that 
it can submit new work, keep dependencies of outstanding work up to 
date, etc.

Android is similar. With the native sync API, Android wants to be 
signaled about the completion of everything. Every single batch buffer 
submission comes with a request for a sync point that will be poked when 
that buffer completes. The kernel has no way of knowing which buffers 
are actually going to be waited on. There is no driver call anymore. 
User land simply waits on a file descriptor.

I don't see how we can get away without generating an interrupt per batch.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-23 12:13       ` John Harrison
@ 2015-03-26 13:22         ` Daniel Vetter
  2015-03-26 17:27           ` Jesse Barnes
  0 siblings, 1 reply; 17+ messages in thread
From: Daniel Vetter @ 2015-03-26 13:22 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote:
> On 23/03/2015 09:22, Daniel Vetter wrote:
> >On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote:
> >>On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
> >>>From: John Harrison <John.C.Harrison@Intel.com>
> >>>
> >>>The intended usage model for struct fence is that the signalled status should be
> >>>set on demand rather than polled. That is, there should not be a need for a
> >>>'signaled' function to be called everytime the status is queried. Instead,
> >>>'something' should be done to enable a signal callback from the hardware which
> >>>will update the state directly. In the case of requests, this is the seqno
> >>>update interrupt. The idea is that this callback will only be enabled on demand
> >>>when something actually tries to wait on the fence.
> >>>
> >>>This change removes the polling test and replaces it with the callback scheme.
> >>>To avoid race conditions where signals can be sent before anyone is waiting for
> >>>them, it does not implement the callback on demand feature. When the GPU
> >>>scheduler arrives, it will need to know about the completion of every single
> >>>request anyway. So it is far simpler to not put in complex and messy anti-race
> >>>code in the first place given that it will not be needed in the future.
> >>>
> >>>Instead, each fence is added to a 'please poke me' list at the start of
> >>>i915_add_request(). This happens before the commands to generate the seqno
> >>>interrupt are added to the ring thus is guaranteed to be race free. The
> >>>interrupt handler then scans through the 'poke me' list when a new seqno pops
> >>>out and signals any matching fence/request. The fence is then removed from the
> >>>list so the entire request stack does not need to be scanned every time.
> >>No. Please let's not go back to the bad old days of generating an interrupt
> >>per batch, and doing a lot more work inside the interrupt handler.
> >Yeah, enable_signalling should be the place where we grab the interrupt
> >reference. Also that we shouldn't call this unconditionally, that pretty
> >much defeats the point of that fastpath optimization.
> >
> >Another complication is missed interrupts. If we detect those and someone
> >calls enable_signalling then we need to fire up a timer to wake up once
> >per jiffy and save stuck fences. To avoid duplication with the threaded
> >wait code we could remove the fallback wakeups from there and just rely on
> >that timer everywhere.
> >-Daniel
> 
> As has been discussed many times in many forums, the scheduler requires
> notification of each batch buffer's completion. It needs to know so that it
> can submit new work, keep dependencies of outstanding work up to date, etc.
> 
> Android is similar. With the native sync API, Android wants to be signaled
> about the completion of everything. Every single batch buffer submission
> comes with a request for a sync point that will be poked when that buffer
> completes. The kernel has no way of knowing which buffers are actually going
> to be waited on. There is no driver call anymore. User land simply waits on
> a file descriptor.
> 
> I don't see how we can get away without generating an interrupt per batch.

I've explained this a bit offline in a meeting, but here's finally the
mail version for the record. The reason we want to enable interrupts only
when needed is that interrupts don't scale. Looking around high throughput
pheriferals all try to avoid interrupts like the plague: netdev has
netpoll, block devices just gained the same because of ridiculously fast
ssds connected to pcie. And there's lots of people talking about insanely
tightly coupled gpu compute workloads (maybe not yet on intel gpus, but
it'll come).

Now I fully agree that unfortunately the execlist hw design isn't awesome
and there's no way around receiving and processing an interrupt per batch.
But the hw folks are working on fixing these overheads again (or at least
attempting using the guc, I haven't seen the new numbers yet) and old hw
without the scheduler works perfectly fine with interrupts mostly
disabled. So just because we currently have a suboptimal hw design is imo
not a good reason to throw all the on-demand interrupt enabling and
handling overboard. I fully expect that we'll need it again. And I think
it's easier to keep it working than to first kick it out and then rebuild
it again.

That's in a nutshell why I think we should keep all that machinery, even
though it won't be terribly useful for execlist (with or without the
scheduler).

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-26 13:22         ` Daniel Vetter
@ 2015-03-26 17:27           ` Jesse Barnes
  2015-03-27  8:24             ` Daniel Vetter
  0 siblings, 1 reply; 17+ messages in thread
From: Jesse Barnes @ 2015-03-26 17:27 UTC (permalink / raw)
  To: Daniel Vetter, John Harrison; +Cc: Intel-GFX

On 03/26/2015 06:22 AM, Daniel Vetter wrote:
> On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote:
>> On 23/03/2015 09:22, Daniel Vetter wrote:
>>> On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote:
>>>> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The intended usage model for struct fence is that the signalled status should be
>>>>> set on demand rather than polled. That is, there should not be a need for a
>>>>> 'signaled' function to be called everytime the status is queried. Instead,
>>>>> 'something' should be done to enable a signal callback from the hardware which
>>>>> will update the state directly. In the case of requests, this is the seqno
>>>>> update interrupt. The idea is that this callback will only be enabled on demand
>>>>> when something actually tries to wait on the fence.
>>>>>
>>>>> This change removes the polling test and replaces it with the callback scheme.
>>>>> To avoid race conditions where signals can be sent before anyone is waiting for
>>>>> them, it does not implement the callback on demand feature. When the GPU
>>>>> scheduler arrives, it will need to know about the completion of every single
>>>>> request anyway. So it is far simpler to not put in complex and messy anti-race
>>>>> code in the first place given that it will not be needed in the future.
>>>>>
>>>>> Instead, each fence is added to a 'please poke me' list at the start of
>>>>> i915_add_request(). This happens before the commands to generate the seqno
>>>>> interrupt are added to the ring thus is guaranteed to be race free. The
>>>>> interrupt handler then scans through the 'poke me' list when a new seqno pops
>>>>> out and signals any matching fence/request. The fence is then removed from the
>>>>> list so the entire request stack does not need to be scanned every time.
>>>> No. Please let's not go back to the bad old days of generating an interrupt
>>>> per batch, and doing a lot more work inside the interrupt handler.
>>> Yeah, enable_signalling should be the place where we grab the interrupt
>>> reference. Also that we shouldn't call this unconditionally, that pretty
>>> much defeats the point of that fastpath optimization.
>>>
>>> Another complication is missed interrupts. If we detect those and someone
>>> calls enable_signalling then we need to fire up a timer to wake up once
>>> per jiffy and save stuck fences. To avoid duplication with the threaded
>>> wait code we could remove the fallback wakeups from there and just rely on
>>> that timer everywhere.
>>> -Daniel
>>
>> As has been discussed many times in many forums, the scheduler requires
>> notification of each batch buffer's completion. It needs to know so that it
>> can submit new work, keep dependencies of outstanding work up to date, etc.
>>
>> Android is similar. With the native sync API, Android wants to be signaled
>> about the completion of everything. Every single batch buffer submission
>> comes with a request for a sync point that will be poked when that buffer
>> completes. The kernel has no way of knowing which buffers are actually going
>> to be waited on. There is no driver call anymore. User land simply waits on
>> a file descriptor.
>>
>> I don't see how we can get away without generating an interrupt per batch.
> 
> I've explained this a bit offline in a meeting, but here's finally the
> mail version for the record. The reason we want to enable interrupts only
> when needed is that interrupts don't scale. Looking around high throughput
> pheriferals all try to avoid interrupts like the plague: netdev has
> netpoll, block devices just gained the same because of ridiculously fast
> ssds connected to pcie. And there's lots of people talking about insanely
> tightly coupled gpu compute workloads (maybe not yet on intel gpus, but
> it'll come).
> 
> Now I fully agree that unfortunately the execlist hw design isn't awesome
> and there's no way around receiving and processing an interrupt per batch.
> But the hw folks are working on fixing these overheads again (or at least
> attempting using the guc, I haven't seen the new numbers yet) and old hw
> without the scheduler works perfectly fine with interrupts mostly
> disabled. So just because we currently have a suboptimal hw design is imo
> not a good reason to throw all the on-demand interrupt enabling and
> handling overboard. I fully expect that we'll need it again. And I think
> it's easier to keep it working than to first kick it out and then rebuild
> it again.
> 
> That's in a nutshell why I think we should keep all that machinery, even
> though it won't be terribly useful for execlist (with or without the
> scheduler).

What is our interrupt frequency these days anyway, for an interrupt per
batch completion, for a somewhat real set of workloads?  There's
probably more to shave off of our interrupt handling overhead, which
ought to help universally, but especially with execlists and sync point
usages.  I think Chris was looking at that awhile back and removed some
MMIO and such and got the overhead down, but I don't know where we stand
today...

None of this means that there isn't room for polling and interrupt
disabling etc, even in the context of scheduling and execlists of course.

Thanks,
Jesse
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 3/4] drm/i915: Interrupt driven fences
  2015-03-26 17:27           ` Jesse Barnes
@ 2015-03-27  8:24             ` Daniel Vetter
  0 siblings, 0 replies; 17+ messages in thread
From: Daniel Vetter @ 2015-03-27  8:24 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: Intel-GFX

On Thu, Mar 26, 2015 at 10:27:25AM -0700, Jesse Barnes wrote:
> On 03/26/2015 06:22 AM, Daniel Vetter wrote:
> > On Mon, Mar 23, 2015 at 12:13:56PM +0000, John Harrison wrote:
> >> On 23/03/2015 09:22, Daniel Vetter wrote:
> >>> On Fri, Mar 20, 2015 at 09:11:35PM +0000, Chris Wilson wrote:
> >>>> On Fri, Mar 20, 2015 at 05:48:36PM +0000, John.C.Harrison@Intel.com wrote:
> >>>>> From: John Harrison <John.C.Harrison@Intel.com>
> >>>>>
> >>>>> The intended usage model for struct fence is that the signalled status should be
> >>>>> set on demand rather than polled. That is, there should not be a need for a
> >>>>> 'signaled' function to be called everytime the status is queried. Instead,
> >>>>> 'something' should be done to enable a signal callback from the hardware which
> >>>>> will update the state directly. In the case of requests, this is the seqno
> >>>>> update interrupt. The idea is that this callback will only be enabled on demand
> >>>>> when something actually tries to wait on the fence.
> >>>>>
> >>>>> This change removes the polling test and replaces it with the callback scheme.
> >>>>> To avoid race conditions where signals can be sent before anyone is waiting for
> >>>>> them, it does not implement the callback on demand feature. When the GPU
> >>>>> scheduler arrives, it will need to know about the completion of every single
> >>>>> request anyway. So it is far simpler to not put in complex and messy anti-race
> >>>>> code in the first place given that it will not be needed in the future.
> >>>>>
> >>>>> Instead, each fence is added to a 'please poke me' list at the start of
> >>>>> i915_add_request(). This happens before the commands to generate the seqno
> >>>>> interrupt are added to the ring thus is guaranteed to be race free. The
> >>>>> interrupt handler then scans through the 'poke me' list when a new seqno pops
> >>>>> out and signals any matching fence/request. The fence is then removed from the
> >>>>> list so the entire request stack does not need to be scanned every time.
> >>>> No. Please let's not go back to the bad old days of generating an interrupt
> >>>> per batch, and doing a lot more work inside the interrupt handler.
> >>> Yeah, enable_signalling should be the place where we grab the interrupt
> >>> reference. Also that we shouldn't call this unconditionally, that pretty
> >>> much defeats the point of that fastpath optimization.
> >>>
> >>> Another complication is missed interrupts. If we detect those and someone
> >>> calls enable_signalling then we need to fire up a timer to wake up once
> >>> per jiffy and save stuck fences. To avoid duplication with the threaded
> >>> wait code we could remove the fallback wakeups from there and just rely on
> >>> that timer everywhere.
> >>> -Daniel
> >>
> >> As has been discussed many times in many forums, the scheduler requires
> >> notification of each batch buffer's completion. It needs to know so that it
> >> can submit new work, keep dependencies of outstanding work up to date, etc.
> >>
> >> Android is similar. With the native sync API, Android wants to be signaled
> >> about the completion of everything. Every single batch buffer submission
> >> comes with a request for a sync point that will be poked when that buffer
> >> completes. The kernel has no way of knowing which buffers are actually going
> >> to be waited on. There is no driver call anymore. User land simply waits on
> >> a file descriptor.
> >>
> >> I don't see how we can get away without generating an interrupt per batch.
> > 
> > I've explained this a bit offline in a meeting, but here's finally the
> > mail version for the record. The reason we want to enable interrupts only
> > when needed is that interrupts don't scale. Looking around high throughput
> > pheriferals all try to avoid interrupts like the plague: netdev has
> > netpoll, block devices just gained the same because of ridiculously fast
> > ssds connected to pcie. And there's lots of people talking about insanely
> > tightly coupled gpu compute workloads (maybe not yet on intel gpus, but
> > it'll come).
> > 
> > Now I fully agree that unfortunately the execlist hw design isn't awesome
> > and there's no way around receiving and processing an interrupt per batch.
> > But the hw folks are working on fixing these overheads again (or at least
> > attempting using the guc, I haven't seen the new numbers yet) and old hw
> > without the scheduler works perfectly fine with interrupts mostly
> > disabled. So just because we currently have a suboptimal hw design is imo
> > not a good reason to throw all the on-demand interrupt enabling and
> > handling overboard. I fully expect that we'll need it again. And I think
> > it's easier to keep it working than to first kick it out and then rebuild
> > it again.
> > 
> > That's in a nutshell why I think we should keep all that machinery, even
> > though it won't be terribly useful for execlist (with or without the
> > scheduler).
> 
> What is our interrupt frequency these days anyway, for an interrupt per
> batch completion, for a somewhat real set of workloads?  There's
> probably more to shave off of our interrupt handling overhead, which
> ought to help universally, but especially with execlists and sync point
> usages.  I think Chris was looking at that awhile back and removed some
> MMIO and such and got the overhead down, but I don't know where we stand
> today...

I guess you're referring to the pile of patches to reorder the
reads/writes for subordinate irq sources to only happen when they need to?
I.e. read only when we have a bit indicating so (unfortunately not
available for all of them) and write only if there's something to clear.

On a quick scan those patches all landed.

The other bit is making the mmio debug stuff faster. That one hasn't
converged yet to a version which both reduces the overhead without
destroying the usefulness of the debug functionality itself - unclaimed
mmio has helped a lot in chasing down runtime pm and power domain bugs in
our driver. So I really want to keep it around in some form by default, if
at all possible.

Maybe check out Chris latest patch and see whether you have a good idea?
I've run out on them a bit.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC, 1/4] drm/i915: Convert requests to use struct fence
  2015-03-20 17:48 ` [RFC 1/4] drm/i915: " John.C.Harrison
@ 2015-04-07  9:18   ` Maarten Lankhorst
  2015-04-07 10:59     ` John Harrison
  0 siblings, 1 reply; 17+ messages in thread
From: Maarten Lankhorst @ 2015-04-07  9:18 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Hey,

Op 20-03-15 om 18:48 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is intended
> to keep track of work that is executed on hardware. I.e. it solves the basic
> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
> request structure does quite a lot more than simply track the execution progress
> so is very definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain all the
> advantages that provides.
>
> This patch makes the first step of integrating a struct fence into the request.
> It replaces the explicit reference count with that of the fence. It also
> replaces the 'is completed' test with the fence's equivalent. Currently, that
> simply chains on to the original request implementation. A future patch will
> improve this.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>
> ---
> drivers/gpu/drm/i915/i915_drv.h         |   37 +++++++++------------
>  drivers/gpu/drm/i915/i915_gem.c         |   55 ++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/intel_lrc.c        |    1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
>  5 files changed, 70 insertions(+), 27 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index ce3a536..7dcaf8c 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -50,6 +50,7 @@
>  #include <linux/intel-iommu.h>
>  #include <linux/kref.h>
>  #include <linux/pm_qos.h>
> +#include <linux/fence.h>
>  
>  /* General customization:
>   */
> @@ -2048,7 +2049,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>   * initial reference taken using kref_init
>   */
>  struct drm_i915_gem_request {
> -	struct kref ref;
> +	/** Underlying object for implementing the signal/wait stuff.
> +	  * NB: Never call fence_later()! Due to lazy allocation, scheduler
> +	  * re-ordering, pre-emption, etc., there is no guarantee at all
> +	  * about the validity or sequentialiaty of the fence's seqno! */
> +	struct fence fence;
Set fence.context differently for each per context timeline. :-)

>+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>+{
>+	WARN(true, "Is this required?");
>+	return true;
>+}

Yes, try calling fence_wait() on the fence. :-) This function should call irq_get and add itself to ring->irq_queue.
See for an example radeon_fence_enable_signaling.

>@@ -2557,6 +2596,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
> 		return ret;
> 	}
> 
>+	fence_init(&request->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, request->seqno);
>+
> 	/*
> 	 * Reserve space in the ring buffer for all the commands required to
> 	 * eventually emit this request. This is to guarantee that the

Use ring->irq_queue.lock instead of making a new lock? This will make implementing enable_signaling easier too.

~Maarten

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC, 1/4] drm/i915: Convert requests to use struct fence
  2015-04-07  9:18   ` [RFC, " Maarten Lankhorst
@ 2015-04-07 10:59     ` John Harrison
  2015-04-07 11:18       ` Maarten Lankhorst
  0 siblings, 1 reply; 17+ messages in thread
From: John Harrison @ 2015-04-07 10:59 UTC (permalink / raw)
  To: Maarten Lankhorst, Intel-GFX

On 07/04/2015 10:18, Maarten Lankhorst wrote:
> Hey,
>
> Op 20-03-15 om 18:48 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that is intended
>> to keep track of work that is executed on hardware. I.e. it solves the basic
>> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
>> request structure does quite a lot more than simply track the execution progress
>> so is very definitely still required. However, the basic completion status side
>> could be updated to use the ready made fence implementation and gain all the
>> advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into the request.
>> It replaces the explicit reference count with that of the fence. It also
>> replaces the 'is completed' test with the fence's equivalent. Currently, that
>> simply chains on to the original request implementation. A future patch will
>> improve this.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>
>> ---
>> drivers/gpu/drm/i915/i915_drv.h         |   37 +++++++++------------
>>   drivers/gpu/drm/i915/i915_gem.c         |   55 ++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/intel_lrc.c        |    1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
>>   5 files changed, 70 insertions(+), 27 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index ce3a536..7dcaf8c 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -50,6 +50,7 @@
>>   #include <linux/intel-iommu.h>
>>   #include <linux/kref.h>
>>   #include <linux/pm_qos.h>
>> +#include <linux/fence.h>
>>   
>>   /* General customization:
>>    */
>> @@ -2048,7 +2049,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -	struct kref ref;
>> +	/** Underlying object for implementing the signal/wait stuff.
>> +	  * NB: Never call fence_later()! Due to lazy allocation, scheduler
>> +	  * re-ordering, pre-emption, etc., there is no guarantee at all
>> +	  * about the validity or sequentialiaty of the fence's seqno! */
>> +	struct fence fence;
> Set fence.context differently for each per context timeline. :-)

Yeah, I didn't like the way the description for fence_later() says 
'returns NULL if both fences are signaled' and then also returns null on 
a context mismatch. I was also not entirely sure what the fence context 
thing is meant to be for. AFAICT, the expectation is that there is only 
supposed to be a finite and small number of contexts as there is no 
management of them. They are simply an incrementing number with no way 
to 'release' a previously allocated context. Whereas, the i915 context 
is per application in an execlist enabled system. Potentially, multiple 
contexts per application even. So there is an unbounded and large number 
of them about. That sounds like a bad idea for the fence context 
implementation!


>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +{
>> +	WARN(true, "Is this required?");
>> +	return true;
>> +}
> Yes, try calling fence_wait() on the fence. :-) This function should call irq_get and add itself to ring->irq_queue.
> See for an example radeon_fence_enable_signaling.

See patch three in the series :). The above warning should really say 
'This should not be required yet.' but I didn't get around to updating it.


>> @@ -2557,6 +2596,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>> 		return ret;
>> 	}
>>
>> +	fence_init(&request->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, request->seqno);
>> +
>> 	/*
>> 	 * Reserve space in the ring buffer for all the commands required to
>> 	 * eventually emit this request. This is to guarantee that the
> Use ring->irq_queue.lock instead of making a new lock? This will make implementing enable_signaling easier too.

Is that definitely safe? It won't cause conflicts or unnecessary 
complications? Indeed, is one supposed to play around with the implicit 
lock inside a wait queue?


>
> ~Maarten
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC, 1/4] drm/i915: Convert requests to use struct fence
  2015-04-07 10:59     ` John Harrison
@ 2015-04-07 11:18       ` Maarten Lankhorst
  2015-04-17 19:22         ` Dave Gordon
  0 siblings, 1 reply; 17+ messages in thread
From: Maarten Lankhorst @ 2015-04-07 11:18 UTC (permalink / raw)
  To: John Harrison, Intel-GFX

Hey,

Op 07-04-15 om 12:59 schreef John Harrison:
> On 07/04/2015 10:18, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 20-03-15 om 18:48 schreef John.C.Harrison@Intel.com:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> There is a construct in the linux kernel called 'struct fence' that is intended
>>> to keep track of work that is executed on hardware. I.e. it solves the basic
>>> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
>>> request structure does quite a lot more than simply track the execution progress
>>> so is very definitely still required. However, the basic completion status side
>>> could be updated to use the ready made fence implementation and gain all the
>>> advantages that provides.
>>>
>>> This patch makes the first step of integrating a struct fence into the request.
>>> It replaces the explicit reference count with that of the fence. It also
>>> replaces the 'is completed' test with the fence's equivalent. Currently, that
>>> simply chains on to the original request implementation. A future patch will
>>> improve this.
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> ---
>>> drivers/gpu/drm/i915/i915_drv.h         |   37 +++++++++------------
>>>   drivers/gpu/drm/i915/i915_gem.c         |   55 ++++++++++++++++++++++++++++---
>>>   drivers/gpu/drm/i915/intel_lrc.c        |    1 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
>>>   5 files changed, 70 insertions(+), 27 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index ce3a536..7dcaf8c 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -50,6 +50,7 @@
>>>   #include <linux/intel-iommu.h>
>>>   #include <linux/kref.h>
>>>   #include <linux/pm_qos.h>
>>> +#include <linux/fence.h>
>>>     /* General customization:
>>>    */
>>> @@ -2048,7 +2049,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
>>>    * initial reference taken using kref_init
>>>    */
>>>   struct drm_i915_gem_request {
>>> -    struct kref ref;
>>> +    /** Underlying object for implementing the signal/wait stuff.
>>> +      * NB: Never call fence_later()! Due to lazy allocation, scheduler
>>> +      * re-ordering, pre-emption, etc., there is no guarantee at all
>>> +      * about the validity or sequentialiaty of the fence's seqno! */
>>> +    struct fence fence;
>> Set fence.context differently for each per context timeline. :-)
>
> Yeah, I didn't like the way the description for fence_later() says 'returns NULL if both fences are signaled' and then also returns null on a context mismatch. I was also not entirely sure what the fence context thing is meant to be for. AFAICT, the expectation is that there is only supposed to be a finite and small number of contexts as there is no management of them. They are simply an incrementing number with no way to 'release' a previously allocated context. Whereas, the i915 context is per application in an execlist enabled system. Potentially, multiple contexts per application even. So there is an unbounded and large number of them about. That sounds like a bad idea for the fence context implementation!
No memory is allocated for them, they're just numbers. Worst thing that can happen is an integer overflow, if that would ever happen we can bump it to int64_t. :-)

If you allocate 1000 contexts/second for 50 days you'll hit the overflow, realistically that will never happen.. I wouldn't worry about it.

>
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +{
>>> +    WARN(true, "Is this required?");
>>> +    return true;
>>> +}
>> Yes, try calling fence_wait() on the fence. :-) This function should call irq_get and add itself to ring->irq_queue.
>> See for an example radeon_fence_enable_signaling.
>
> See patch three in the series :). The above warning should really say 'This should not be required yet.' but I didn't get around to updating it.
Oke.

>
>>> @@ -2557,6 +2596,8 @@ int i915_gem_request_alloc(struct intel_engine_cs *ring,
>>>         return ret;
>>>     }
>>>
>>> +    fence_init(&request->fence, &i915_gem_request_fops, &ring->fence_lock, ring->fence_context, request->seqno);
>>> +
>>>     /*
>>>      * Reserve space in the ring buffer for all the commands required to
>>>      * eventually emit this request. This is to guarantee that the
>> Use ring->irq_queue.lock instead of making a new lock? This will make implementing enable_signaling easier too.
>
> Is that definitely safe? It won't cause conflicts or unnecessary complications? Indeed, is one supposed to play around with the implicit lock inside a wait queue?
It's your own workqueue, it's the only way to add a waiter reliably. The spinlock's not taken unless absolutely needed.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed()
  2015-03-20 17:48 ` [RFC 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
@ 2015-04-17 18:57   ` Dave Gordon
  0 siblings, 0 replies; 17+ messages in thread
From: Dave Gordon @ 2015-04-17 18:57 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

On 20/03/15 17:48, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The change to the implementation of i915_gem_request_completed() means that the
> lazy coherency flag is no longer used. This can now be removed to simplify the
> interface.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  |    2 +-
>  drivers/gpu/drm/i915/i915_drv.h      |    3 +--
>  drivers/gpu/drm/i915/i915_gem.c      |   14 +++++++-------
>  drivers/gpu/drm/i915/i915_irq.c      |    2 +-
>  drivers/gpu/drm/i915/intel_display.c |    2 +-
>  5 files changed, 11 insertions(+), 12 deletions(-)

Just to bring this up to date, there are three more instances that have
appeared since this patch was written; here are the additional changes:

> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index 4dd8b41..f4ba6fe 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -6801,7 +6801,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
>  {
>         struct request_boost *boost = container_of(work, struct request_boost, work);
>  
> -       if (!i915_gem_request_completed(boost->rq, true))
> +       if (!i915_gem_request_completed(boost->rq))
>                 gen6_rps_boost(to_i915(boost->rq->ring->dev), NULL);
>  
>         i915_gem_request_unreference__unlocked(boost->rq);
> dsgordon@dsgordon-linux:/usr2/dsgordon/Source/JH-OTC-DRM/OTC-DRM$ git diff drivers/gpu/drm/i915/i915_gem.c
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 7c97005..9d6b8bf 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1186,7 +1186,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *rq)
>  
>         timeout = jiffies + 1;
>         while (!need_resched()) {
> -               if (i915_gem_request_completed(rq, true))
> +               if (i915_gem_request_completed(rq))
>                         return 0;
>  
>                 if (time_after_eq(jiffies, timeout))
> @@ -1194,7 +1194,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *rq)
>  
>                 cpu_relax_lowlatency();
>         }
> -       if (i915_gem_request_completed(rq, false))
> +       if (i915_gem_request_completed(rq))
>                 return 0;
>  
>         return -EAGAIN;

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC, 1/4] drm/i915: Convert requests to use struct fence
  2015-04-07 11:18       ` Maarten Lankhorst
@ 2015-04-17 19:22         ` Dave Gordon
  2015-04-20  5:13           ` Maarten Lankhorst
  0 siblings, 1 reply; 17+ messages in thread
From: Dave Gordon @ 2015-04-17 19:22 UTC (permalink / raw)
  To: Maarten Lankhorst, John Harrison, Intel-GFX

On 07/04/15 12:18, Maarten Lankhorst wrote:
> Hey,
> 
> Op 07-04-15 om 12:59 schreef John Harrison:
>> On 07/04/2015 10:18, Maarten Lankhorst wrote:
>>> Hey,
>>>
>>> Op 20-03-15 om 18:48 schreef John.C.Harrison@Intel.com:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> There is a construct in the linux kernel called 'struct fence' that is intended
>>>> to keep track of work that is executed on hardware. I.e. it solves the basic
>>>> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
>>>> request structure does quite a lot more than simply track the execution progress
>>>> so is very definitely still required. However, the basic completion status side
>>>> could be updated to use the ready made fence implementation and gain all the
>>>> advantages that provides.
>>>>
>>>> This patch makes the first step of integrating a struct fence into the request.
>>>> It replaces the explicit reference count with that of the fence. It also
>>>> replaces the 'is completed' test with the fence's equivalent. Currently, that
>>>> simply chains on to the original request implementation. A future patch will
>>>> improve this.
>>>>
>>>> For: VIZ-5190
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> ---
>>>> drivers/gpu/drm/i915/i915_drv.h         |   37 +++++++++------------
>>>>   drivers/gpu/drm/i915/i915_gem.c         |   55 ++++++++++++++++++++++++++++---
>>>>   drivers/gpu/drm/i915/intel_lrc.c        |    1 +
>>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
>>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
>>>>   5 files changed, 70 insertions(+), 27 deletions(-)

Since Maarten provided i915_gem_request_unreference__unlocked() in the
interval since this was first posted, you'll need to convert that too:

> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 4213dfb..97073fe 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2218,14 +2218,8 @@ void i915_gem_request_unreference_irq(struct drm_i915_gem_request *req);
>  static inline void
>  i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>  {
> -       struct drm_device *dev;
> -
> -       if (!req)
> -               return;
> -
> -       dev = req->ring->dev;
> -       if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
> -               mutex_unlock(&dev->struct_mutex);
> +       if (req)
> +               fence_put_mutex(&req->fence, &req->ring->dev->struct_mutex);
>  }
>  static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,

... and here's the implementation of fence_put_mutex() ...

> diff --git a/include/linux/fence.h b/include/linux/fence.h
> index 39efee1..c4fbb25 100644
> --- a/include/linux/fence.h
> +++ b/include/linux/fence.h
> @@ -218,6 +218,24 @@ static inline void fence_put(struct fence *fence)
>                 kref_put(&fence->refcount, fence_release);
>  }
>  
> +/**
> + * fence_put_mutex - decrement refcount for object.
> + * @fence: object.
> + * @lock: lock to take in release case
> + *
> + * A version of fence_put() that doesn't require the caller to hold the
> + * associated lock already. If the refcount doesn't go to zero, the lock
> + * is not needed; if it does, the lock will automatically be acquired and
> + * released around the call to fence_release().
> + */
> +static inline void fence_put_mutex(struct fence *fence,
> +                                  struct mutex *lock)
> +{
> +       if (fence)
> +               if (kref_put_mutex(&fence->refcount, fence_release, lock))
> +                       mutex_unlock(lock);
> +}
> +
>  int fence_signal(struct fence *fence);
>  int fence_signal_locked(struct fence *fence);
>  signed long fence_default_wait(struct fence *fence, bool intr, signed long timeout);

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [RFC, 1/4] drm/i915: Convert requests to use struct fence
  2015-04-17 19:22         ` Dave Gordon
@ 2015-04-20  5:13           ` Maarten Lankhorst
  0 siblings, 0 replies; 17+ messages in thread
From: Maarten Lankhorst @ 2015-04-20  5:13 UTC (permalink / raw)
  To: Dave Gordon, John Harrison, Intel-GFX

Op 17-04-15 om 21:22 schreef Dave Gordon:
> On 07/04/15 12:18, Maarten Lankhorst wrote:
>> Hey,
>>
>> Op 07-04-15 om 12:59 schreef John Harrison:
>>> On 07/04/2015 10:18, Maarten Lankhorst wrote:
>>>> Hey,
>>>>
>>>> Op 20-03-15 om 18:48 schreef John.C.Harrison@Intel.com:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> There is a construct in the linux kernel called 'struct fence' that is intended
>>>>> to keep track of work that is executed on hardware. I.e. it solves the basic
>>>>> problem that the drivers 'struct drm_i915_gem_request' is trying to address. The
>>>>> request structure does quite a lot more than simply track the execution progress
>>>>> so is very definitely still required. However, the basic completion status side
>>>>> could be updated to use the ready made fence implementation and gain all the
>>>>> advantages that provides.
>>>>>
>>>>> This patch makes the first step of integrating a struct fence into the request.
>>>>> It replaces the explicit reference count with that of the fence. It also
>>>>> replaces the 'is completed' test with the fence's equivalent. Currently, that
>>>>> simply chains on to the original request implementation. A future patch will
>>>>> improve this.
>>>>>
>>>>> For: VIZ-5190
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> ---
>>>>> drivers/gpu/drm/i915/i915_drv.h         |   37 +++++++++------------
>>>>>   drivers/gpu/drm/i915/i915_gem.c         |   55 ++++++++++++++++++++++++++++---
>>>>>   drivers/gpu/drm/i915/intel_lrc.c        |    1 +
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.c |    1 +
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |    3 ++
>>>>>   5 files changed, 70 insertions(+), 27 deletions(-)
> Since Maarten provided i915_gem_request_unreference__unlocked() in the
> interval since this was first posted, you'll need to convert that too:
>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 4213dfb..97073fe 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2218,14 +2218,8 @@ void i915_gem_request_unreference_irq(struct drm_i915_gem_request *req);
>>  static inline void
>>  i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
>>  {
>> -       struct drm_device *dev;
>> -
>> -       if (!req)
>> -               return;
>> -
>> -       dev = req->ring->dev;
>> -       if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
>> -               mutex_unlock(&dev->struct_mutex);
>> +       if (req)
>> +               fence_put_mutex(&req->fence, &req->ring->dev->struct_mutex);
>>  }
>>  static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> ... and here's the implementation of fence_put_mutex() ...
>
>> diff --git a/include/linux/fence.h b/include/linux/fence.h
>> index 39efee1..c4fbb25 100644
>> --- a/include/linux/fence.h
>> +++ b/include/linux/fence.h
>> @@ -218,6 +218,24 @@ static inline void fence_put(struct fence *fence)
>>                 kref_put(&fence->refcount, fence_release);
>>  }
>>  
>> +/**
>> + * fence_put_mutex - decrement refcount for object.
>> + * @fence: object.
>> + * @lock: lock to take in release case
>> + *
>> + * A version of fence_put() that doesn't require the caller to hold the
>> + * associated lock already. If the refcount doesn't go to zero, the lock
>> + * is not needed; if it does, the lock will automatically be acquired and
>> + * released around the call to fence_release().
>> + */
>> +static inline void fence_put_mutex(struct fence *fence,
>> +                                  struct mutex *lock)
>> +{
>> +       if (fence)
>> +               if (kref_put_mutex(&fence->refcount, fence_release, lock))
>> +                       mutex_unlock(lock);
>> +}
>> +
>>  int fence_signal(struct fence *fence);
>>  int fence_signal_locked(struct fence *fence);
>>  signed long fence_default_wait(struct fence *fence, bool intr, signed long timeout);
>
I think this is wrong, fence can't assume a lock is held by the caller of fence_put, because the caller might call it from a different driver
without knowing what lock to take.
It's fine to have something like this internally for now with a big TODO that it has to go for cross-device sync, but fence_put_mutex
is broken.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2015-04-20  5:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-20 17:48 [RFC 0/4] Convert requests to use struct fence John.C.Harrison
2015-03-20 17:48 ` [RFC 1/4] drm/i915: " John.C.Harrison
2015-04-07  9:18   ` [RFC, " Maarten Lankhorst
2015-04-07 10:59     ` John Harrison
2015-04-07 11:18       ` Maarten Lankhorst
2015-04-17 19:22         ` Dave Gordon
2015-04-20  5:13           ` Maarten Lankhorst
2015-03-20 17:48 ` [RFC 2/4] drm/i915: Removed now redudant parameter to i915_gem_request_completed() John.C.Harrison
2015-04-17 18:57   ` Dave Gordon
2015-03-20 17:48 ` [RFC 3/4] drm/i915: Interrupt driven fences John.C.Harrison
2015-03-20 21:11   ` Chris Wilson
2015-03-23  9:22     ` Daniel Vetter
2015-03-23 12:13       ` John Harrison
2015-03-26 13:22         ` Daniel Vetter
2015-03-26 17:27           ` Jesse Barnes
2015-03-27  8:24             ` Daniel Vetter
2015-03-20 17:48 ` [RFC 4/4] drm/i915: Updated request structure tracing John.C.Harrison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.