All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v9 0/6] Convert requests to use struct fence
@ 2016-06-01 17:07 John.C.Harrison
  2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
                   ` (6 more replies)
  0 siblings, 7 replies; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

Note that in order to allow the full fence API to be used (e.g.
merging multiple fences together), the driver needs to provide an
incrementing timeline for the fence. Currently this timeline is
specific to the fence code as it must be per context. There is future
work planned to make the driver's internal seqno value also be per
context rather than driver global (VIZ-7443). Once this is done the
fence specific timeline code can be dropped in favour of just using
the driver's seqno value.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.

Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
latency.

v6: Updated to newer nigthly and resolved conflicts around updates
to the wait_request optimisations.

v7: Updated to newer nightly and resolved conflicts around massive
ring -> engine rename and interface change to get_seqno(). Also fixed
up a race condition issue with stale request pointers in file client
lists and added a minor optimisation to not acquire spinlocks when a
list is empty and does not need processing.

v8: Updated to yet another nightly and resolved the merge conflicts.
Dropped 'delay freeing of requests' patch as no longer needed to due
changes in request clean up code. Likewise with the deferred
processing of the fence signalling. Also moved the fence timeline
patch to before the fence conversion. It now means the timeline is
initially added with no actual user but also means the fence
conversion patch does not need to add a horrid hack timeline which is
then removed again in a subsequent patch.

Added support for possible RCU usage of fence object (Review comments
by Maarten Lankhorst).

v9: Updated to another newer nightly (changes to context structure
naming).

Moved the request completion processing out of the interrupt handler
and into a worker thread (Chris Wilson).

[Patches against drm-intel-nightly tree fetched 31/05/2016]

John Harrison (6):
  drm/i915: Add per context timelines for fence objects
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redundant parameter to i915_gem_request_completed()
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

 drivers/gpu/drm/i915/i915_debugfs.c     |   7 +-
 drivers/gpu/drm/i915/i915_dma.c         |   9 +-
 drivers/gpu/drm/i915/i915_drv.h         |  67 +++---
 drivers/gpu/drm/i915/i915_gem.c         | 410 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.c |  16 ++
 drivers/gpu/drm/i915/i915_irq.c         |   3 +-
 drivers/gpu/drm/i915/i915_trace.h       |  14 +-
 drivers/gpu/drm/i915/intel_display.c    |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  14 ++
 drivers/gpu/drm/i915/intel_pm.c         |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   6 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   6 +
 12 files changed, 488 insertions(+), 70 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
@ 2016-06-01 17:07 ` John.C.Harrison
  2016-06-02 10:28   ` Tvrtko Ursulin
  2016-06-07 11:17   ` Maarten Lankhorst
  2016-06-01 17:07 ` [PATCH v9 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
                   ` (5 subsequent siblings)
  6 siblings, 2 replies; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The purpose of this patch series is to convert the requst structure to
use fence objects for the underlying completion tracking. The fence
object requires a sequence number. The ultimate aim is to use the same
sequence number as for the request itself (or rather, to remove the
request's seqno field and just use the fence's value throughout the
driver). However, this is not currently possible and so this patch
introduces a separate numbering scheme as an intermediate step.

A major advantage of using the fence object is that it can be passed
outside of the i915 driver and used externally. The fence API allows
for various operations such as combining multiple fences. This
requires that fence seqnos within a single fence context be guaranteed
in-order. The GPU scheduler that is coming can re-order request
execution but not within a single GPU context. Thus the fence context
must be tied to the i915 context (and the engine within the context as
each engine runs asynchronously).

On the other hand, the driver as a whole currently only works with
request seqnos that are allocated from a global in-order timeline. It
will require a fair chunk of re-work to allow multiple independent
seqno timelines to be used. Hence the introduction of a temporary,
fence specific timeline. Once the work to update the rest of the
driver has been completed then the request can use the fence seqno
instead.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

v5: Line wrapping and other white space fixes to keep style checker
happy.

v7: Updated to newer nightly (lots of ring -> engine renaming).

v8: Moved to earlier in patch series so no longer needs to remove the
quick hack timeline that was being added before.

v9: Updated to another newer nightly (changes to context structure
naming). Also updated commit message to match previous changes.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
 drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c | 16 +++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
 4 files changed, 78 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 2a88a46..a5f8ad8 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -831,6 +831,19 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_fence_timeline {
+	char        name[32];
+	unsigned    fence_context;
+	unsigned    next;
+
+	struct i915_gem_context *ctx;
+	struct intel_engine_cs *engine;
+};
+
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct i915_gem_context *ctx,
+			       struct intel_engine_cs *ring);
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -875,6 +888,7 @@ struct i915_gem_context {
 		u64 lrc_desc;
 		int pin_count;
 		bool initialised;
+		struct i915_fence_timeline fence_timeline;
 	} engine[I915_NUM_ENGINES];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 5ffc6fa..57d3593 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2743,6 +2743,46 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct i915_gem_context *ctx,
+			       struct intel_engine_cs *engine)
+{
+	struct i915_fence_timeline *timeline;
+
+	timeline = &ctx->engine[engine->id].fence_timeline;
+
+	if (timeline->engine)
+		return 0;
+
+	timeline->fence_context = fence_context_alloc(1);
+
+	/*
+	 * Start the timeline from seqno 0 as this is a special value
+	 * that is reserved for invalid sync points.
+	 */
+	timeline->next       = 1;
+	timeline->ctx        = ctx;
+	timeline->engine     = engine;
+
+	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
+		 timeline->fence_context, engine->name, ctx->user_handle);
+
+	return 0;
+}
+
+unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+	unsigned seqno;
+
+	seqno = timeline->next;
+
+	/* Reserve zero for invalid */
+	if (++timeline->next == 0)
+		timeline->next = 1;
+
+	return seqno;
+}
+
 static inline int
 __i915_gem_request_alloc(struct intel_engine_cs *engine,
 			 struct i915_gem_context *ctx,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index d0e7fc6..07d8c63 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -320,6 +320,22 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (!i915.enable_execlists) {
+		struct intel_engine_cs *engine;
+
+		/* Create a per context timeline for fences */
+		for_each_engine(engine, to_i915(dev)) {
+			int ret = i915_create_fence_timeline(dev, ctx, engine);
+			if (ret) {
+				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n",
+					  engine->name, ctx);
+				idr_remove(&file_priv->context_idr, ctx->user_handle);
+				i915_gem_context_unreference(ctx);
+				return ERR_PTR(ret);
+			}
+		}
+	}
+
 	if (USES_FULL_PPGTT(dev)) {
 		struct i915_hw_ppgtt *ppgtt = i915_ppgtt_create(dev, file_priv);
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5c191a1..14bcfb7 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2496,6 +2496,14 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
 		goto error_ringbuf;
 	}
 
+	/* Create a per context timeline for fences */
+	ret = i915_create_fence_timeline(ctx->i915->dev, ctx, engine);
+	if (ret) {
+		DRM_ERROR("Fence timeline creation failed for engine %s, ctx %p\n",
+			  engine->name, ctx);
+		goto error_ringbuf;
+	}
+
 	ce->ringbuf = ringbuf;
 	ce->state = ctx_obj;
 	ce->initialised = engine->init_context == NULL;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
  2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
@ 2016-06-01 17:07 ` John.C.Harrison
  2016-06-02 11:07   ` Tvrtko Ursulin
  2016-06-01 17:07 ` [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

v5: Changed seqno format in debugfs to %x rather than %u as that is
apparently the preferred appearance. Line wrapped some long lines to
keep the style checker happy.

v6: Updated to newer nigthly and resolved conflicts. The biggest issue
was with the re-worked busy spin precursor to waiting on a request. In
particular, the addition of a 'request_started' helper function. This
has no corresponding concept within the fence framework. However, it
is only ever used in one place and the whole point of that place is to
always directly read the seqno for absolutely lowest latency possible.
So the simple solution is to just make the seqno test explicit at that
point now rather than later in the series (it was previously being
done anyway when fences become interrupt driven).

v7: Rebased to newer nightly - lots of ring -> engine renaming and
interface change to get_seqno().

v8: Rebased to newer nightly - no longer needs to worry about mutex
locking in the request free code path. Moved to after fence timeline
patch so no longer needs to add a horrid hack timeline.

Removed commented out code block. Added support for possible RCU usage
of fence object (Review comments by Maarten Lankhorst).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
 drivers/gpu/drm/i915/i915_drv.h         |  43 +++++---------
 drivers/gpu/drm/i915/i915_gem.c         | 101 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 115 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index ac7e569..844cc4b 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -767,11 +767,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			task = NULL;
 			if (req->pid)
 				task = pid_task(req->pid, PIDTYPE_PID);
-			seq_printf(m, "    %x @ %d: %s [%d]\n",
+			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
 				   req->seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
-				   task ? task->pid : -1);
+				   task ? task->pid : -1,
+				   req->fence.context, req->fence.seqno);
 			rcu_read_unlock();
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index a5f8ad8..905feae 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -42,6 +42,7 @@
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
 #include <linux/shmem_fs.h>
+#include <linux/fence.h>
 
 #include <drm/drmP.h>
 #include <drm/intel-gtt.h>
@@ -2353,7 +2354,11 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/**
+	 * Underlying object for implementing the signal/wait stuff.
+	 */
+	struct fence fence;
+	struct rcu_head rcu_head;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2455,7 +2460,13 @@ struct drm_i915_gem_request {
 struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct i915_gem_context *ctx);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2475,14 +2486,14 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
@@ -2498,12 +2509,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -3211,24 +3216,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-					   bool lazy_coherency)
-{
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
-				 req->previous_seqno);
-}
-
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
-				 req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 57d3593..b67fd7c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
+	uint32_t seqno;
 
 	/* When waiting for high frequency requests, e.g. during synchronous
 	 * rendering split between the CPU and GPU, the finite amount of time
@@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 		return -EBUSY;
 
 	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req, true))
+	seqno = req->engine->get_seqno(req->engine);
+	if (!i915_seqno_passed(seqno, req->previous_seqno))
 		return -EAGAIN;
 
 	timeout = local_clock_us(&cpu) + 5;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		seqno = req->engine->get_seqno(req->engine);
+		if (i915_seqno_passed(seqno, req->seqno))
 			return 0;
 
 		if (signal_pending_state(state, current))
@@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 		cpu_relax_lowlatency();
 	}
 
-	if (i915_gem_request_completed(req, false))
+	if (req->engine->irq_seqno_barrier)
+		req->engine->irq_seqno_barrier(req->engine);
+	seqno = req->engine->get_seqno(req->engine);
+	if (i915_seqno_passed(seqno, req->seqno))
 		return 0;
 
 	return -EAGAIN;
@@ -2736,13 +2742,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free_rcu(struct rcu_head *head)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req;
+
+	req = container_of(head, typeof(*req), rcu_head);
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static void i915_gem_request_free(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	seqno = req->engine->get_seqno(req->engine);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req;
+	struct i915_fence_timeline *timeline;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	timeline = &req->ctx->engine[req->engine->id].fence_timeline;
+
+	return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence,
+						char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	/* Last signalled timeline value ??? */
+	snprintf(str, size, "? [%d]"/*, timeline->value*/,
+		 req->engine->get_seqno(req->engine));
+}
+
+static void i915_gem_request_fence_value_str(struct fence *req_fence,
+					     char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.fence_value_str	= i915_gem_request_fence_value_str,
+	.timeline_value_str	= i915_gem_request_timeline_value_str,
+};
+
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct i915_gem_context *ctx,
 			       struct intel_engine_cs *engine)
@@ -2770,7 +2852,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
 	return 0;
 }
 
-unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
 {
 	unsigned seqno;
 
@@ -2814,13 +2896,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->engine = engine;
 	req->reset_counter = reset_counter;
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
+	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
+		   ctx->engine[engine->id].fence_timeline.fence_context,
+		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 14bcfb7..f126bcb 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2030,6 +2030,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
 	INIT_LIST_HEAD(&engine->buffers);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	spin_lock_init(&engine->execlist_lock);
+	spin_lock_init(&engine->fence_lock);
 
 	tasklet_init(&engine->irq_tasklet,
 		     intel_lrc_irq_handler, (unsigned long)engine);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8d35a39..fbd3f12 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index b33c876..3f39daf 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -345,6 +345,8 @@ struct intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	spinlock_t fence_lock;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed()
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
  2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
  2016-06-01 17:07 ` [PATCH v9 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2016-06-01 17:07 ` John.C.Harrison
  2016-06-07 12:07   ` Maarten Lankhorst
  2016-06-01 17:07 ` [PATCH v9 4/6] drm/i915: Interrupt driven fences John.C.Harrison
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

v6: Updated to newer nightly and resolved conflicts.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 14 +++++++-------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 844cc4b..923af20 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -663,7 +663,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   engine->get_seqno(engine),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 905feae..69c3412 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2461,8 +2461,7 @@ struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct i915_gem_context *ctx);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index b67fd7c..97e3138 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1250,7 +1250,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = 0;
@@ -1301,7 +1301,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2963,7 +2963,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &engine->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -3094,7 +3094,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -3118,7 +3118,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 	}
 
 	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req, true))) {
+		     i915_gem_request_completed(engine->trace_irq_req))) {
 		engine->irq_put(engine);
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
@@ -3215,7 +3215,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (i915_gem_request_completed(req, true))
+		if (i915_gem_request_completed(req))
 			i915_gem_object_retire__read(obj, i);
 	}
 
@@ -3321,7 +3321,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(to_i915(obj->base.dev))) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index 5b99a69..21a4c38 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11592,7 +11592,7 @@ static bool __pageflip_stall_check_cs(struct drm_i915_private *dev_priv,
 	vblank = intel_crtc_get_vblank_counter(intel_crtc);
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = vblank;
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index b6dfd02..00eb86b 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7544,7 +7544,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(req->i915, NULL, req->emitted_jiffies);
 
 	i915_gem_request_unreference(req);
@@ -7558,7 +7558,7 @@ void intel_queue_rps_boost_for_request(struct drm_i915_gem_request *req)
 	if (req == NULL || INTEL_GEN(req->i915) < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 4/6] drm/i915: Interrupt driven fences
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2016-06-01 17:07 ` [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-06-01 17:07 ` John.C.Harrison
  2016-06-02 13:25   ` Tvrtko Ursulin
  2016-06-01 17:07 ` [PATCH v9 5/6] drm/i915: Updated request structure tracing John.C.Harrison
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

v5: Made i915_gem_request_submit a static as it is only ever called
from one place.

Fixed up the low latency wait optimisation. The time delay between the
seqno value being to memory and the drive's ISR running can be
significant, at least for the wait request micro-benchmark. This can
be greatly improved by explicitly checking for seqno updates in the
pre-wait busy poll loop. Also added some documentation comments to the
busy poll code.

Fixed up support for the faking of lost interrupts
(test_irq_rings/missed_irq_rings). That is, there is an IGT test that
tells the driver to loose interrupts deliberately and then check that
everything still works as expected (albeit much slower).

Updates from review comments: use non IRQ-save spinlocking, early exit
on WARN and improved comments (Tvrtko Ursulin).

v6: Updated to newer nigthly and resolved conflicts around the
wait_request busy spin optimisation. Also fixed a race condition
between this early exit path and the regular completion path.

v7: Updated to newer nightly - lots of ring -> engine renaming plus an
interface change on get_seqno(). Also added a list_empty() check
before acquring spinlocks and doing list processing.

v8: Updated to newer nightly - changes to request clean up code mean
non of the deferred free mess is needed any more.

v9: Moved the request completion processing out of the interrupt
handler and into a worker thread (Chris Wilson).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_dma.c         |   9 +-
 drivers/gpu/drm/i915/i915_drv.h         |  11 ++
 drivers/gpu/drm/i915/i915_gem.c         | 248 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
 7 files changed, 260 insertions(+), 23 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
index 07edaed..f8f60bb 100644
--- a/drivers/gpu/drm/i915/i915_dma.c
+++ b/drivers/gpu/drm/i915/i915_dma.c
@@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
 	if (dev_priv->wq == NULL)
 		goto out_err;
 
+	dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
+	if (dev_priv->req_wq == NULL)
+		goto out_free_wq;
+
 	dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
 	if (dev_priv->hotplug.dp_wq == NULL)
-		goto out_free_wq;
+		goto out_free_req_wq;
 
 	dev_priv->gpu_error.hangcheck_wq =
 		alloc_ordered_workqueue("i915-hangcheck", 0);
@@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
 
 out_free_dp_wq:
 	destroy_workqueue(dev_priv->hotplug.dp_wq);
+out_free_req_wq:
+	destroy_workqueue(dev_priv->req_wq);
 out_free_wq:
 	destroy_workqueue(dev_priv->wq);
 out_err:
@@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
 {
 	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
 	destroy_workqueue(dev_priv->hotplug.dp_wq);
+	destroy_workqueue(dev_priv->req_wq);
 	destroy_workqueue(dev_priv->wq);
 }
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 69c3412..5a7f256 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -1851,6 +1851,9 @@ struct drm_i915_private {
 	 */
 	struct workqueue_struct *wq;
 
+	/* Work queue for request completion processing */
+	struct workqueue_struct *req_wq;
+
 	/* Display functions */
 	struct drm_i915_display_funcs display;
 
@@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
 	 */
 	struct fence fence;
 	struct rcu_head rcu_head;
+	struct list_head signal_link;
+	bool cancelled;
+	bool irq_enabled;
+	bool signal_requested;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
 struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct i915_gem_context *ctx);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
+void i915_gem_request_worker(struct work_struct *work);
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 97e3138..83cf9b0 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -39,6 +39,8 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
+static void i915_gem_request_submit(struct drm_i915_gem_request *req);
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
@@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 {
 	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
 	struct drm_i915_private *dev_priv = req->i915;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	uint32_t seqno;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
 	s64 before = 0; /* Only to silence a compiler warning. */
@@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
-	if (list_empty(&req->list))
-		return 0;
-
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	trace_i915_gem_request_wait_begin(req);
 
 	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
-
-	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
-		ret = -ENODEV;
-		goto out;
+	if (req->seqno) {
+		ret = __i915_spin_request(req, state);
+		if (ret == 0)
+			goto out;
 	}
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	fence_enable_sw_signaling(&req->fence);
+
 	for (;;) {
 		struct timer_list timer;
 
@@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		if (req->seqno) {
+			/*
+			 * There is quite a lot of latency in the user interrupt
+			 * path. So do an explicit seqno check and potentially
+			 * remove all that delay.
+			 */
+			if (req->engine->irq_seqno_barrier)
+				req->engine->irq_seqno_barrier(req->engine);
+			seqno = engine->get_seqno(engine);
+			if (i915_seqno_passed(seqno, req->seqno)) {
+				ret = 0;
+				break;
+			}
+		}
+
 		if (signal_pending_state(state, current)) {
 			ret = -ERESTARTSYS;
 			break;
@@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
-	if (!irq_test_in_progress)
-		engine->irq_put(engine);
 
 	finish_wait(&engine->irq_queue, &wait);
 
 out:
 	trace_i915_gem_request_wait_end(req);
 
+	if ((ret == 0) && (req->seqno)) {
+		if (req->engine->irq_seqno_barrier)
+			req->engine->irq_seqno_barrier(req->engine);
+		seqno = engine->get_seqno(engine);
+		if (i915_seqno_passed(seqno, req->seqno) &&
+		    !i915_gem_request_completed(req)) {
+			/*
+			 * Make sure the request is marked as completed before
+			 * returning. NB: Need to acquire the spinlock around
+			 * the whole call to avoid a race condition with the
+			 * interrupt handler is running concurrently and could
+			 * cause this invocation to early exit even though the
+			 * request has not actually been fully processed yet.
+			 */
+			spin_lock_irq(&req->engine->fence_lock);
+			i915_gem_request_notify(req->engine, true);
+			spin_unlock_irq(&req->engine->fence_lock);
+		}
+	}
+
 	if (timeout) {
 		s64 tres = *timeout - (ktime_get_raw_ns() - before);
 
@@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 {
 	trace_i915_gem_request_retire(request);
 
+	if (request->irq_enabled) {
+		request->engine->irq_put(request->engine);
+		request->irq_enabled = false;
+	}
+
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
 	 * of tail of the request to update the last known position
@@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/*
+	 * In case the request is still in the signal pending list,
+	 * e.g. due to being cancelled by TDR, preemption, etc.
+	 */
+	if (!list_empty(&request->signal_link)) {
+		/*
+		 * The request must be marked as cancelled and the underlying
+		 * fence as failed. NB: There is no explicit fence fail API,
+		 * there is only a manual poke and signal.
+		 */
+		request->cancelled = true;
+		/* How to propagate to any associated sync_fence??? */
+		request->fence.status = -EIO;
+		fence_signal_locked(&request->fence);
+	}
+
 	if (request->previous_context) {
 		if (i915.enable_execlists)
 			intel_lr_context_unpin(request->previous_context,
@@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = engine->emit_request(request);
 	else {
@@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence *req_fence)
 	struct drm_i915_gem_request *req;
 
 	req = container_of(req_fence, typeof(*req), fence);
+
+	WARN_ON(req->irq_enabled);
+
 	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request is about to be submitted to the hardware so add the fence to
+ * the list of signalable fences.
+ *
+ * NB: This does not necessarily enable interrupts yet. That only occurs on
+ * demand when the request is actually waited on. However, adding it to the
+ * list early ensures that there is no race condition where the interrupt
+ * could pop out prematurely and thus be completely lost. The race is merely
+ * that the interrupt must be manually checked for after being enabled.
+ */
+static void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	/*
+	 * Always enable signal processing for the request's fence object
+	 * before that request is submitted to the hardware. Thus there is no
+	 * race condition whereby the interrupt could pop out before the
+	 * request has been added to the signal list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 */
+	i915_gem_request_reference(req);
+	spin_lock_irq(&req->engine->fence_lock);
+	WARN_ON(!list_empty(&req->signal_link));
+	list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
+	spin_unlock_irq(&req->engine->fence_lock);
+
+	/*
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	/* Have interrupts already been requested? */
+	if (req->signal_requested)
+		i915_gem_request_enable_interrupt(req, false);
+}
+
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked)
+{
+	struct drm_i915_private *dev_priv = req->engine->i915;
+	const bool irq_test_in_progress =
+		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
+						intel_engine_flag(req->engine);
+
+	if (req->irq_enabled)
+		return;
+
+	if (irq_test_in_progress)
+		return;
+
+	if (!WARN_ON(!req->engine->irq_get(req->engine)))
+		req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->engine, fence_locked);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	/*
+	 * No need to actually enable interrupt based processing until the
+	 * request has been submitted to the hardware. At which point
+	 * 'i915_gem_request_submit()' is called. So only really enable
+	 * signalling in there. Just set a flag to say that interrupts are
+	 * wanted when the request is eventually submitted. On the other hand
+	 * if the request has already been submitted then interrupts do need
+	 * to be enabled now.
+	 */
+
+	req->signal_requested = true;
+
+	if (!list_empty(&req->signal_link))
+		i915_gem_request_enable_interrupt(req, true);
+
+	return true;
+}
+
+/**
+ * i915_gem_request_worker - request work handler callback.
+ * @work: Work structure
+ * Called in response to a seqno interrupt to process the completed requests.
+ */
+void i915_gem_request_worker(struct work_struct *work)
+{
+	struct intel_engine_cs *engine;
+
+	engine = container_of(work, struct intel_engine_cs, request_work);
+	i915_gem_request_notify(engine, false);
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-	seqno = req->engine->get_seqno(req->engine);
+	if (list_empty(&engine->fence_signal_list))
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&engine->fence_lock, flags);
 
-	return i915_seqno_passed(seqno, req->seqno);
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+	seqno = engine->get_seqno(engine);
+
+	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				break;
+		}
+
+		/*
+		 * Start by removing the fence from the signal list otherwise
+		 * the retire code can run concurrently and get confused.
+		 */
+		list_del_init(&req->signal_link);
+
+		if (!req->cancelled)
+			fence_signal_locked(&req->fence);
+
+		if (req->irq_enabled) {
+			req->engine->irq_put(req->engine);
+			req->irq_enabled = false;
+		}
+
+		i915_gem_request_unreference(req);
+	}
+
+	if (!fence_locked)
+		spin_unlock_irqrestore(&engine->fence_lock, flags);
 }
 
 static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
@@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
 
 static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_free,
 	.get_driver_name	= i915_gem_request_get_driver_name,
@@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
+	INIT_LIST_HEAD(&req->signal_link);
 	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
 		   ctx->engine[engine->id].fence_timeline.fence_context,
 		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
@@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_request_retire(request);
 	}
 
+	/*
+	 * Tidy up anything left over. This includes a call to
+	 * i915_gem_request_notify() which will make sure that any requests
+	 * that were on the signal pending list get also cleaned up.
+	 */
+	i915_gem_retire_requests_ring(engine);
+
 	/* Having flushed all requests from all queues, we know that all
 	 * ringbuffers must now be empty. However, since we do not reclaim
 	 * all space when retiring the request (to prevent HEADs colliding
@@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 {
 	WARN_ON(i915_verify_lists(engine->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(engine, false);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
 {
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index f780421..a87a3c5 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
 	trace_i915_gem_request_notify(engine);
 	engine->user_interrupts++;
 
+	queue_work(engine->i915->req_wq, &engine->request_work);
+
 	wake_up_all(&engine->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f126bcb..134759d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
 
 	dev_priv = engine->i915;
 
+	cancel_work_sync(&engine->request_work);
+
 	if (engine->buffer) {
 		intel_logical_ring_stop(engine);
 		WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
@@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
 
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
 	INIT_LIST_HEAD(&engine->buffers);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	spin_lock_init(&engine->execlist_lock);
@@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
 	tasklet_init(&engine->irq_tasklet,
 		     intel_lrc_irq_handler, (unsigned long)engine);
 
+	INIT_WORK(&engine->request_work, i915_gem_request_worker);
+
 	logical_ring_init_platform_invariants(engine);
 	logical_ring_default_vfuncs(engine);
 	logical_ring_default_irqs(engine, info->irq_shift);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index fbd3f12..1641096 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
 	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0,
@@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 
 	init_waitqueue_head(&engine->irq_queue);
 
+	INIT_WORK(&engine->request_work, i915_gem_request_worker);
+
 	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
 	if (IS_ERR(ringbuf)) {
 		ret = PTR_ERR(ringbuf);
@@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
 
 	dev_priv = engine->i915;
 
+	cancel_work_sync(&engine->request_work);
+
 	if (engine->buffer) {
 		intel_stop_engine(engine);
 		WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) == 0);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3f39daf..51779b4 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -347,6 +347,9 @@ struct intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
+
+	struct work_struct request_work;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 5/6] drm/i915: Updated request structure tracing
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (3 preceding siblings ...)
  2016-06-01 17:07 ` [PATCH v9 4/6] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-06-01 17:07 ` John.C.Harrison
  2016-06-07 12:15   ` Maarten Lankhorst
  2016-06-01 17:07 ` [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
  2016-06-02 11:17 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev6) Patchwork
  6 siblings, 1 reply; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

v5: Line wrapping to keep the style checker happy.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  9 +++++++--
 drivers/gpu/drm/i915/i915_irq.c   |  1 -
 drivers/gpu/drm/i915/i915_trace.h | 14 +++++++++-----
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 83cf9b0..a8b4887 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2928,8 +2928,10 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 	unsigned long flags;
 	u32 seqno;
 
-	if (list_empty(&engine->fence_signal_list))
+	if (list_empty(&engine->fence_signal_list)) {
+		trace_i915_gem_request_notify(engine, 0);
 		return;
+	}
 
 	if (!fence_locked)
 		spin_lock_irqsave(&engine->fence_lock, flags);
@@ -2937,6 +2939,7 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 	seqno = engine->get_seqno(engine);
+	trace_i915_gem_request_notify(engine, seqno);
 
 	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -2950,8 +2953,10 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 		 */
 		list_del_init(&req->signal_link);
 
-		if (!req->cancelled)
+		if (!req->cancelled) {
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
+		}
 
 		if (req->irq_enabled) {
 			req->engine->irq_put(req->engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index a87a3c5..7a08281 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -991,7 +991,6 @@ static void notify_ring(struct intel_engine_cs *engine)
 	if (!intel_engine_initialized(engine))
 		return;
 
-	trace_i915_gem_request_notify(engine);
 	engine->user_interrupts++;
 
 	queue_work(engine->i915->req_wq, &engine->request_work);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index 6768db0..409a249 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -546,23 +546,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-	    TP_PROTO(struct intel_engine_cs *engine),
-	    TP_ARGS(engine),
+	    TP_PROTO(struct intel_engine_cs *engine, uint32_t seqno),
+	    TP_ARGS(engine, seqno),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = engine->i915->dev->primary->index;
 			   __entry->ring = engine->id;
-			   __entry->seqno = engine->get_seqno(engine);
+			   __entry->seqno = seqno;
+			   __entry->is_empty =
+					list_empty(&engine->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (4 preceding siblings ...)
  2016-06-01 17:07 ` [PATCH v9 5/6] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-06-01 17:07 ` John.C.Harrison
  2016-06-07 12:47   ` Maarten Lankhorst
  2016-06-02 11:17 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev6) Patchwork
  6 siblings, 1 reply; 26+ messages in thread
From: John.C.Harrison @ 2016-06-01 17:07 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The notify function can be called many times without the seqno
changing. Some are to prevent races due to the requirement of not
enabling interrupts until requested. However, when interrupts are
enabled the IRQ handler can be called multiple times without the
ring's seqno value changing. E.g. two interrupts are generated by
batch buffers completing in quick succession, the first call to the
handler processes both completions but the handler still gets executed
a second time. This patch reduces the overhead of these extra calls by
caching the last processed seqno value and early exiting if it has not
changed.

v3: New patch for series.

v5: Added comment about last_irq_seqno usage due to code review
feedback (Tvrtko Ursulin).

v6: Minor update to resolve a race condition with the wait_request
optimisation.

v7: Updated to newer nightly - lots of ring -> engine renaming plus an
interface change to get_seqno().

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index a8b4887..67f65f8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1368,6 +1368,7 @@ out:
 			 * request has not actually been fully processed yet.
 			 */
 			spin_lock_irq(&req->engine->fence_lock);
+			req->engine->last_irq_seqno = 0;
 			i915_gem_request_notify(req->engine, true);
 			spin_unlock_irq(&req->engine->fence_lock);
 		}
@@ -2599,9 +2600,12 @@ i915_gem_init_seqno(struct drm_i915_private *dev_priv, u32 seqno)
 	i915_gem_retire_requests(dev_priv);
 
 	/* Finally reset hw state */
-	for_each_engine(engine, dev_priv)
+	for_each_engine(engine, dev_priv) {
 		intel_ring_init_seqno(engine, seqno);
 
+		engine->last_irq_seqno = 0;
+	}
+
 	return 0;
 }
 
@@ -2933,13 +2937,24 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 		return;
 	}
 
-	if (!fence_locked)
-		spin_lock_irqsave(&engine->fence_lock, flags);
-
+	/*
+	 * Check for a new seqno. If it hasn't actually changed then early
+	 * exit without even grabbing the spinlock. Note that this is safe
+	 * because any corruption of last_irq_seqno merely results in doing
+	 * the full processing when there is potentially no work to be done.
+	 * It can never lead to not processing work that does need to happen.
+	 */
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 	seqno = engine->get_seqno(engine);
 	trace_i915_gem_request_notify(engine, seqno);
+	if (seqno == engine->last_irq_seqno)
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&engine->fence_lock, flags);
+
+	engine->last_irq_seqno = seqno;
 
 	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -3234,7 +3249,10 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 	 * Tidy up anything left over. This includes a call to
 	 * i915_gem_request_notify() which will make sure that any requests
 	 * that were on the signal pending list get also cleaned up.
+	 * NB: The seqno cache must be cleared otherwise the notify call will
+	 * simply return immediately.
 	 */
+	engine->last_irq_seqno = 0;
 	i915_gem_retire_requests_ring(engine);
 
 	/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 51779b4..90de84e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -348,6 +348,7 @@ struct intel_engine_cs {
 
 	spinlock_t fence_lock;
 	struct list_head fence_signal_list;
+	uint32_t last_irq_seqno;
 
 	struct work_struct request_work;
 };
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects
  2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
@ 2016-06-02 10:28   ` Tvrtko Ursulin
  2016-06-09 16:08     ` John Harrison
  2016-06-07 11:17   ` Maarten Lankhorst
  1 sibling, 1 reply; 26+ messages in thread
From: Tvrtko Ursulin @ 2016-06-02 10:28 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The purpose of this patch series is to convert the requst structure to
> use fence objects for the underlying completion tracking. The fence
> object requires a sequence number. The ultimate aim is to use the same
> sequence number as for the request itself (or rather, to remove the
> request's seqno field and just use the fence's value throughout the
> driver). However, this is not currently possible and so this patch
> introduces a separate numbering scheme as an intermediate step.
>
> A major advantage of using the fence object is that it can be passed
> outside of the i915 driver and used externally. The fence API allows
> for various operations such as combining multiple fences. This
> requires that fence seqnos within a single fence context be guaranteed
> in-order. The GPU scheduler that is coming can re-order request
> execution but not within a single GPU context. Thus the fence context
> must be tied to the i915 context (and the engine within the context as
> each engine runs asynchronously).
>
> On the other hand, the driver as a whole currently only works with
> request seqnos that are allocated from a global in-order timeline. It
> will require a fair chunk of re-work to allow multiple independent
> seqno timelines to be used. Hence the introduction of a temporary,
> fence specific timeline. Once the work to update the rest of the
> driver has been completed then the request can use the fence seqno
> instead.
>
> v2: New patch in series.
>
> v3: Renamed/retyped timeline structure fields after review comments by
> Tvrtko Ursulin.
>
> Added context information to the timeline's name string for better
> identification in debugfs output.
>
> v5: Line wrapping and other white space fixes to keep style checker
> happy.
>
> v7: Updated to newer nightly (lots of ring -> engine renaming).
>
> v8: Moved to earlier in patch series so no longer needs to remove the
> quick hack timeline that was being added before.
>
> v9: Updated to another newer nightly (changes to context structure
> naming). Also updated commit message to match previous changes.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>   drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>   drivers/gpu/drm/i915/i915_gem_context.c | 16 +++++++++++++
>   drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>   4 files changed, 78 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2a88a46..a5f8ad8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -831,6 +831,19 @@ struct i915_ctx_hang_stats {
>   	bool banned;
>   };
>
> +struct i915_fence_timeline {
> +	char        name[32];
> +	unsigned    fence_context;
> +	unsigned    next;
> +
> +	struct i915_gem_context *ctx;
> +	struct intel_engine_cs *engine;

Are these backpointers used in the patch series? I did a quick search 
with the "timeline->" string and did not find anything.

> +};
> +
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct i915_gem_context *ctx,
> +			       struct intel_engine_cs *ring);
> +
>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>   #define DEFAULT_CONTEXT_HANDLE 0
>
> @@ -875,6 +888,7 @@ struct i915_gem_context {
>   		u64 lrc_desc;
>   		int pin_count;
>   		bool initialised;
> +		struct i915_fence_timeline fence_timeline;
>   	} engine[I915_NUM_ENGINES];
>
>   	struct list_head link;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5ffc6fa..57d3593 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2743,6 +2743,46 @@ void i915_gem_request_free(struct kref *req_ref)
>   	kmem_cache_free(req->i915->requests, req);
>   }
>
> +int i915_create_fence_timeline(struct drm_device *dev,

dev is not used in the function. Maybe it will be in later patches? In 
which case I think dev_priv is the current bkm for i915 
specific/internal code.

> +			       struct i915_gem_context *ctx,
> +			       struct intel_engine_cs *engine)
> +{
> +	struct i915_fence_timeline *timeline;
> +
> +	timeline = &ctx->engine[engine->id].fence_timeline;
> +
> +	if (timeline->engine)
> +		return 0;

Is this an expected case? Unless I am missing something it shouldn't be 
so maybe a WARN_ON would be warranted?

> +
> +	timeline->fence_context = fence_context_alloc(1);
> +
> +	/*
> +	 * Start the timeline from seqno 0 as this is a special value
> +	 * that is reserved for invalid sync points.
> +	 */

Comment and init to 1 below look in disagreement. Maybe comment should 
be something like "Start the timeline from seqno 1 as 0 is a special 
value.." ?

> +	timeline->next       = 1;
> +	timeline->ctx        = ctx;
> +	timeline->engine     = engine;
> +
> +	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
> +		 timeline->fence_context, engine->name, ctx->user_handle);
> +

For rings like "video enhancement ring" name is 22 chars on its own, 
leaving only 9 for the two integers. If a lot of contexts or long 
runtime I suppose that could overflow. It is a bit of a stretch but 
perhaps 32 is not enough, maybe available space for the name should be 
better defined as longest ring name (with a comment) plus maximum for 
two integers.

I think timeline->name is only for debug but still feels better to make 
sure it will fit rather than truncate it.

And we should proably just shorten the "video enhancement ring" to "vecs 
ring"...

> +	return 0;
> +}
> +
> +unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)

It is strange to add a public function in this patch which is even 
unused especially since the following patch makes it private.

Would it make more sense for it to be static straight away and maybe 
even called from __i915_gem_request_alloc unconditionally so that the 
patch does not add dead code?

Don't know, not terribly important but would perhaps look more logical 
as a patch series.

> +{
> +	unsigned seqno;
> +
> +	seqno = timeline->next;
> +
> +	/* Reserve zero for invalid */
> +	if (++timeline->next == 0)
> +		timeline->next = 1;
> +
> +	return seqno;
> +}
> +
>   static inline int
>   __i915_gem_request_alloc(struct intel_engine_cs *engine,
>   			 struct i915_gem_context *ctx,
> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
> index d0e7fc6..07d8c63 100644
> --- a/drivers/gpu/drm/i915/i915_gem_context.c
> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
> @@ -320,6 +320,22 @@ i915_gem_create_context(struct drm_device *dev,
>   	if (IS_ERR(ctx))
>   		return ctx;
>
> +	if (!i915.enable_execlists) {
> +		struct intel_engine_cs *engine;
> +
> +		/* Create a per context timeline for fences */
> +		for_each_engine(engine, to_i915(dev)) {
> +			int ret = i915_create_fence_timeline(dev, ctx, engine);
> +			if (ret) {
> +				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n",
> +					  engine->name, ctx);
> +				idr_remove(&file_priv->context_idr, ctx->user_handle);
> +				i915_gem_context_unreference(ctx);
> +				return ERR_PTR(ret);
> +			}
> +		}
> +	}
> +
>   	if (USES_FULL_PPGTT(dev)) {
>   		struct i915_hw_ppgtt *ppgtt = i915_ppgtt_create(dev, file_priv);
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 5c191a1..14bcfb7 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2496,6 +2496,14 @@ static int execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>   		goto error_ringbuf;
>   	}
>
> +	/* Create a per context timeline for fences */
> +	ret = i915_create_fence_timeline(ctx->i915->dev, ctx, engine);
> +	if (ret) {
> +		DRM_ERROR("Fence timeline creation failed for engine %s, ctx %p\n",

"engine %s" will log something like "engine render ring" which will be 
weird.

Also pointer to the context is not that interesting as DRM_ERROR I 
think. ctxc->user_handle instead? Same in the legacy mode.

> +			  engine->name, ctx);
> +		goto error_ringbuf;
> +	}
> +
>   	ce->ringbuf = ringbuf;
>   	ce->state = ctx_obj;
>   	ce->initialised = engine->init_context == NULL;
>

So in summary just some minor things, otherwise it looks OK I think.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
  2016-06-01 17:07 ` [PATCH v9 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2016-06-02 11:07   ` Tvrtko Ursulin
  2016-06-07 11:42     ` Maarten Lankhorst
  0 siblings, 1 reply; 26+ messages in thread
From: Tvrtko Ursulin @ 2016-06-02 11:07 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> There is a construct in the linux kernel called 'struct fence' that is
> intended to keep track of work that is executed on hardware. I.e. it
> solves the basic problem that the drivers 'struct
> drm_i915_gem_request' is trying to address. The request structure does
> quite a lot more than simply track the execution progress so is very
> definitely still required. However, the basic completion status side
> could be updated to use the ready made fence implementation and gain
> all the advantages that provides.
>
> This patch makes the first step of integrating a struct fence into the
> request. It replaces the explicit reference count with that of the
> fence. It also replaces the 'is completed' test with the fence's
> equivalent. Currently, that simply chains on to the original request
> implementation. A future patch will improve this.
>
> v3: Updated after review comments by Tvrtko Ursulin. Added fence
> context/seqno pair to the debugfs request info. Renamed fence 'driver
> name' to just 'i915'. Removed BUG_ONs.
>
> v5: Changed seqno format in debugfs to %x rather than %u as that is
> apparently the preferred appearance. Line wrapped some long lines to
> keep the style checker happy.
>
> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
> was with the re-worked busy spin precursor to waiting on a request. In
> particular, the addition of a 'request_started' helper function. This
> has no corresponding concept within the fence framework. However, it
> is only ever used in one place and the whole point of that place is to
> always directly read the seqno for absolutely lowest latency possible.
> So the simple solution is to just make the seqno test explicit at that
> point now rather than later in the series (it was previously being
> done anyway when fences become interrupt driven).
>
> v7: Rebased to newer nightly - lots of ring -> engine renaming and
> interface change to get_seqno().
>
> v8: Rebased to newer nightly - no longer needs to worry about mutex
> locking in the request free code path. Moved to after fence timeline
> patch so no longer needs to add a horrid hack timeline.
>
> Removed commented out code block. Added support for possible RCU usage
> of fence object (Review comments by Maarten Lankhorst).
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>

Was it an r-b or an ack from Jesse? If the former does it need a "(v?)" 
suffix, depending on the amount of code changes after his r-b?

> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
>   drivers/gpu/drm/i915/i915_drv.h         |  43 +++++---------
>   drivers/gpu/drm/i915/i915_gem.c         | 101 +++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>   6 files changed, 115 insertions(+), 38 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index ac7e569..844cc4b 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -767,11 +767,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>   			task = NULL;
>   			if (req->pid)
>   				task = pid_task(req->pid, PIDTYPE_PID);
> -			seq_printf(m, "    %x @ %d: %s [%d]\n",
> +			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",

In the previous patch fence context and seqno were %d in the 
timeline->name so it would probably be more consistent.

>   				   req->seqno,
>   				   (int) (jiffies - req->emitted_jiffies),
>   				   task ? task->comm : "<unknown>",
> -				   task ? task->pid : -1);
> +				   task ? task->pid : -1,
> +				   req->fence.context, req->fence.seqno);
>   			rcu_read_unlock();
>   		}
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index a5f8ad8..905feae 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -42,6 +42,7 @@
>   #include <linux/kref.h>
>   #include <linux/pm_qos.h>
>   #include <linux/shmem_fs.h>
> +#include <linux/fence.h>
>
>   #include <drm/drmP.h>
>   #include <drm/intel-gtt.h>
> @@ -2353,7 +2354,11 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
>    * initial reference taken using kref_init
>    */
>   struct drm_i915_gem_request {
> -	struct kref ref;
> +	/**
> +	 * Underlying object for implementing the signal/wait stuff.
> +	 */
> +	struct fence fence;
> +	struct rcu_head rcu_head;
>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2455,7 +2460,13 @@ struct drm_i915_gem_request {
>   struct drm_i915_gem_request * __must_check
>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>   		       struct i915_gem_context *ctx);
> -void i915_gem_request_free(struct kref *req_ref);
> +
> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> +					      bool lazy_coherency)
> +{
> +	return fence_is_signaled(&req->fence);
> +}

I would squash the following patch into this one, it makes no sense to 
keep a function with an unused parameter. And fewer patches in the 
series makes it less scary to review. :) Of course if they are also not 
too big. :D

> +
>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>   				   struct drm_file *file);
>
> @@ -2475,14 +2486,14 @@ static inline struct drm_i915_gem_request *
>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>   {
>   	if (req)
> -		kref_get(&req->ref);
> +		fence_get(&req->fence);
>   	return req;
>   }
>
>   static inline void
>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>   {
> -	kref_put(&req->ref, i915_gem_request_free);
> +	fence_put(&req->fence);
>   }
>
>   static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
> @@ -2498,12 +2509,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>   }
>
>   /*
> - * XXX: i915_gem_request_completed should be here but currently needs the
> - * definition of i915_seqno_passed() which is below. It will be moved in
> - * a later patch when the call to i915_seqno_passed() is obsoleted...
> - */
> -
> -/*
>    * A command that requires special handling by the command parser.
>    */
>   struct drm_i915_cmd_descriptor {
> @@ -3211,24 +3216,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>   	return (int32_t)(seq1 - seq2) >= 0;
>   }
>
> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
> -					   bool lazy_coherency)
> -{
> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> -		req->engine->irq_seqno_barrier(req->engine);
> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
> -				 req->previous_seqno);
> -}
> -
> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
> -					      bool lazy_coherency)
> -{
> -	if (!lazy_coherency && req->engine->irq_seqno_barrier)
> -		req->engine->irq_seqno_barrier(req->engine);
> -	return i915_seqno_passed(req->engine->get_seqno(req->engine),
> -				 req->seqno);
> -}
> -
>   int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 57d3593..b67fd7c 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   {
>   	unsigned long timeout;
>   	unsigned cpu;
> +	uint32_t seqno;
>
>   	/* When waiting for high frequency requests, e.g. during synchronous
>   	 * rendering split between the CPU and GPU, the finite amount of time
> @@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   		return -EBUSY;
>
>   	/* Only spin if we know the GPU is processing this request */
> -	if (!i915_gem_request_started(req, true))
> +	seqno = req->engine->get_seqno(req->engine);
> +	if (!i915_seqno_passed(seqno, req->previous_seqno))
>   		return -EAGAIN;
>
>   	timeout = local_clock_us(&cpu) + 5;
>   	while (!need_resched()) {
> -		if (i915_gem_request_completed(req, true))
> +		seqno = req->engine->get_seqno(req->engine);
> +		if (i915_seqno_passed(seqno, req->seqno))
>   			return 0;
>
>   		if (signal_pending_state(state, current))
> @@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>   		cpu_relax_lowlatency();
>   	}
>
> -	if (i915_gem_request_completed(req, false))
> +	if (req->engine->irq_seqno_barrier)
> +		req->engine->irq_seqno_barrier(req->engine);
> +	seqno = req->engine->get_seqno(req->engine);
> +	if (i915_seqno_passed(seqno, req->seqno))
>   		return 0;
>
>   	return -EAGAIN;
> @@ -2736,13 +2742,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>   	}
>   }
>
> -void i915_gem_request_free(struct kref *req_ref)
> +static void i915_gem_request_free_rcu(struct rcu_head *head)
>   {
> -	struct drm_i915_gem_request *req = container_of(req_ref,
> -						 typeof(*req), ref);
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(head, typeof(*req), rcu_head);
>   	kmem_cache_free(req->i915->requests, req);
>   }
>
> +static void i915_gem_request_free(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
> +}
> +
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +{
> +	/* Interrupt driven fences are not implemented yet.*/
> +	WARN(true, "This should not be called!");
> +	return true;
> +}
> +
> +static bool i915_gem_request_is_completed(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req = container_of(req_fence,
> +						 typeof(*req), fence);
> +	u32 seqno;
> +
> +	seqno = req->engine->get_seqno(req->engine);
> +
> +	return i915_seqno_passed(seqno, req->seqno);
> +}
> +
> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> +{
> +	return "i915";
> +}
> +
> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
> +{
> +	struct drm_i915_gem_request *req;
> +	struct i915_fence_timeline *timeline;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +	timeline = &req->ctx->engine[req->engine->id].fence_timeline;
> +
> +	return timeline->name;
> +}
> +
> +static void i915_gem_request_timeline_value_str(struct fence *req_fence,
> +						char *str, int size)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +
> +	/* Last signalled timeline value ??? */
> +	snprintf(str, size, "? [%d]"/*, timeline->value*/,

Reference to timeline->value a leftover from the past?

Is the string format defined by the API? Asking because "? [%d]" looks 
intriguing.

> +		 req->engine->get_seqno(req->engine));
> +}
> +
> +static void i915_gem_request_fence_value_str(struct fence *req_fence,
> +					     char *str, int size)
> +{
> +	struct drm_i915_gem_request *req;
> +
> +	req = container_of(req_fence, typeof(*req), fence);
> +
> +	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);

Is it OK to put req->seqno in this one? OR it is just for debug anyway 
so it helps us and fence framework does not care?

> +}
> +
> +static const struct fence_ops i915_gem_request_fops = {
> +	.enable_signaling	= i915_gem_request_enable_signaling,
> +	.signaled		= i915_gem_request_is_completed,
> +	.wait			= fence_default_wait,
> +	.release		= i915_gem_request_free,
> +	.get_driver_name	= i915_gem_request_get_driver_name,
> +	.get_timeline_name	= i915_gem_request_get_timeline_name,
> +	.fence_value_str	= i915_gem_request_fence_value_str,
> +	.timeline_value_str	= i915_gem_request_timeline_value_str,
> +};
> +
>   int i915_create_fence_timeline(struct drm_device *dev,
>   			       struct i915_gem_context *ctx,
>   			       struct intel_engine_cs *engine)
> @@ -2770,7 +2852,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
>   	return 0;
>   }
>
> -unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
> +static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>   {
>   	unsigned seqno;
>
> @@ -2814,13 +2896,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>   	if (ret)
>   		goto err;
>
> -	kref_init(&req->ref);
>   	req->i915 = dev_priv;
>   	req->engine = engine;
>   	req->reset_counter = reset_counter;
>   	req->ctx  = ctx;
>   	i915_gem_context_reference(req->ctx);
>
> +	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
> +		   ctx->engine[engine->id].fence_timeline.fence_context,
> +		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
> +
>   	/*
>   	 * Reserve space in the ring buffer for all the commands required to
>   	 * eventually emit this request. This is to guarantee that the
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 14bcfb7..f126bcb 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -2030,6 +2030,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>   	INIT_LIST_HEAD(&engine->buffers);
>   	INIT_LIST_HEAD(&engine->execlist_queue);
>   	spin_lock_init(&engine->execlist_lock);
> +	spin_lock_init(&engine->fence_lock);
>
>   	tasklet_init(&engine->irq_tasklet,
>   		     intel_lrc_irq_handler, (unsigned long)engine);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index 8d35a39..fbd3f12 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>   	INIT_LIST_HEAD(&engine->request_list);
>   	INIT_LIST_HEAD(&engine->execlist_queue);
>   	INIT_LIST_HEAD(&engine->buffers);
> +	spin_lock_init(&engine->fence_lock);
>   	i915_gem_batch_pool_init(dev, &engine->batch_pool);
>   	memset(engine->semaphore.sync_seqno, 0,
>   	       sizeof(engine->semaphore.sync_seqno));
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index b33c876..3f39daf 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -345,6 +345,8 @@ struct intel_engine_cs {
>   	 * to encode the command length in the header).
>   	 */
>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
> +
> +	spinlock_t fence_lock;

Why is this lock per-engine, and not for example per timeline? Aren't 
fencees living completely isolated in their per-context-per-engine 
domains? So presumably there is something somewhere which is shared 
outside that domain to need a lock at the engine level?

>   };
>
>   static inline bool
>

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev6)
  2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (5 preceding siblings ...)
  2016-06-01 17:07 ` [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-06-02 11:17 ` Patchwork
  6 siblings, 0 replies; 26+ messages in thread
From: Patchwork @ 2016-06-02 11:17 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: intel-gfx

== Series Details ==

Series: Convert requests to use struct fence (rev6)
URL   : https://patchwork.freedesktop.org/series/1068/
State : failure

== Summary ==

Series 1068v6 Convert requests to use struct fence
http://patchwork.freedesktop.org/api/1.0/series/1068/revisions/6/mbox

Test drv_getparams_basic:
        Subgroup basic-subslice-total:
                dmesg-warn -> PASS       (ro-ivb2-i7-3770)
Test drv_module_reload_basic:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
                pass       -> DMESG-WARN (ro-bdw-i5-5250u)
Test gem_busy:
        Subgroup basic-blt:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
        Subgroup basic-bsd:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
        Subgroup basic-parallel-blt:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
        Subgroup basic-parallel-bsd:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
        Subgroup basic-parallel-render:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
        Subgroup basic-render:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
Test gem_close_race:
        Subgroup basic-threads:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
Test gem_cs_tlb:
        Subgroup basic-default:
                pass       -> DMESG-WARN (ro-hsw-i7-4770r)
                pass       -> DMESG-WARN (ro-snb-i7-2620M)
Test gem_ctx_switch:
        Subgroup basic-default:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
Test gem_exec_basic:
        Subgroup gtt-blt:
                dmesg-warn -> PASS       (ro-skl-i7-6700hq)
        Subgroup readonly-default:
                dmesg-warn -> PASS       (ro-ivb2-i7-3770)
Test gem_exec_flush:
        Subgroup basic-batch-kernel-default-uc:
                pass       -> DMESG-WARN (ro-bdw-i7-5600u)
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
        Subgroup basic-batch-kernel-default-wb:
                dmesg-warn -> PASS       (ro-skl-i7-6700hq)
                pass       -> DMESG-WARN (ro-hsw-i3-4010u)
        Subgroup basic-uc-ro-default:
                pass       -> DMESG-WARN (ro-ivb-i7-3770)
        Subgroup basic-uc-set-default:
                pass       -> DMESG-WARN (ro-byt-n2820)
        Subgroup basic-wb-ro-default:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
        Subgroup basic-wb-set-default:
                dmesg-warn -> PASS       (ro-skl-i7-6700hq)
Test gem_exec_gttfill:
        Subgroup basic:
                pass       -> DMESG-FAIL (ro-byt-n2820)
Test gem_exec_nop:
        Subgroup basic:
                pass       -> DMESG-WARN (ro-byt-n2820)
                pass       -> DMESG-WARN (ro-snb-i7-2620M)
                pass       -> DMESG-FAIL (ro-bdw-i7-5600u)
                pass       -> DMESG-WARN (ro-hsw-i3-4010u)
Test gem_exec_parallel:
        Subgroup basic:
                dmesg-warn -> PASS       (ro-ivb2-i7-3770)
                pass       -> DMESG-WARN (ro-snb-i7-2620M)
                pass       -> DMESG-WARN (ro-hsw-i7-4770r)
Test gem_exec_parse:
        Subgroup basic-allowed:
                pass       -> DMESG-WARN (ro-byt-n2820)
Test gem_exec_store:
        Subgroup basic-all:
                pass       -> DMESG-WARN (ro-bdw-i7-5600u)
                pass       -> DMESG-WARN (ro-hsw-i3-4010u)
        Subgroup basic-bsd:
                dmesg-warn -> PASS       (ro-ivb2-i7-3770)
Test gem_linear_blits:
        Subgroup basic:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
                pass       -> DMESG-WARN (ro-ilk1-i5-650)
Test gem_mmap_gtt:
        Subgroup basic-small-copy:
                dmesg-warn -> PASS       (ro-skl-i7-6700hq)
        Subgroup basic-write:
                dmesg-warn -> PASS       (ro-skl-i7-6700hq)
        Subgroup basic-write-gtt-no-prefault:
                dmesg-warn -> PASS       (ro-skl-i7-6700hq)
Test gem_pread:
        Subgroup basic:
                pass       -> DMESG-WARN (ro-skl-i7-6700hq)
Test gem_render_linear_blits:
        Subgroup basic:
                dmesg-warn -> PASS       (ro-ivb2-i7-3770)
                pass       -> DMESG-WARN (ro-bdw-i5-5250u)
Test gem_render_tiled_blits:
        Subgroup basic:
                pass       -> DMESG-WARN (ro-ivb2-i7-3770)
                pass       -> DMESG-WARN (ro-bdw-i7-5600u)
                pass       -> DMESG-WARN (ro-snb-i7-2620M)
                pass       -> DMESG-WARN (ro-hsw-i7-4770r)
Test gem_storedw_loop:
WARNING: Long output truncated
ro-bdw-i7-5557U failed to connect after reboot

Results at /archive/results/CI_IGT_test/RO_Patchwork_1077/

96c5c4d drm-intel-nightly: 2016y-06m-02d-08h-26m-32s UTC integration manifest
f93b682 drm/i915: Cache last IRQ seqno to reduce IRQ overhead
bd5b584 drm/i915: Updated request structure tracing
0996a5c drm/i915: Interrupt driven fences
584a254 drm/i915: Removed now redundant parameter to i915_gem_request_completed()
87e13c4 drm/i915: Convert requests to use struct fence
de97221 drm/i915: Add per context timelines for fence objects

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences
  2016-06-01 17:07 ` [PATCH v9 4/6] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-06-02 13:25   ` Tvrtko Ursulin
  2016-06-07 12:02     ` Maarten Lankhorst
  0 siblings, 1 reply; 26+ messages in thread
From: Tvrtko Ursulin @ 2016-06-02 13:25 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX


On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> v8: Updated to newer nightly - changes to request clean up code mean
> non of the deferred free mess is needed any more.
>
> v9: Moved the request completion processing out of the interrupt
> handler and into a worker thread (Chris Wilson).
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>   drivers/gpu/drm/i915/i915_dma.c         |   9 +-
>   drivers/gpu/drm/i915/i915_drv.h         |  11 ++
>   drivers/gpu/drm/i915/i915_gem.c         | 248 +++++++++++++++++++++++++++++---
>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>   drivers/gpu/drm/i915/intel_lrc.c        |   5 +
>   drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
>   7 files changed, 260 insertions(+), 23 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
> index 07edaed..f8f60bb 100644
> --- a/drivers/gpu/drm/i915/i915_dma.c
> +++ b/drivers/gpu/drm/i915/i915_dma.c
> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>   	if (dev_priv->wq == NULL)
>   		goto out_err;
>
> +	dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
> +	if (dev_priv->req_wq == NULL)
> +		goto out_free_wq;
> +

Single (per-device) ordered workqueue will serialize interrupt 
processing across all engines to one thread. Together with the fact 
request worker does not seem to need the sleeping context, I am thinking 
that a tasklet per engine would be much better (see engine->irq_tasklet 
for an example).

>   	dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
>   	if (dev_priv->hotplug.dp_wq == NULL)
> -		goto out_free_wq;
> +		goto out_free_req_wq;
>
>   	dev_priv->gpu_error.hangcheck_wq =
>   		alloc_ordered_workqueue("i915-hangcheck", 0);
> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>
>   out_free_dp_wq:
>   	destroy_workqueue(dev_priv->hotplug.dp_wq);
> +out_free_req_wq:
> +	destroy_workqueue(dev_priv->req_wq);
>   out_free_wq:
>   	destroy_workqueue(dev_priv->wq);
>   out_err:
> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>   {
>   	destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>   	destroy_workqueue(dev_priv->hotplug.dp_wq);
> +	destroy_workqueue(dev_priv->req_wq);
>   	destroy_workqueue(dev_priv->wq);
>   }
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 69c3412..5a7f256 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -1851,6 +1851,9 @@ struct drm_i915_private {
>   	 */
>   	struct workqueue_struct *wq;
>
> +	/* Work queue for request completion processing */
> +	struct workqueue_struct *req_wq;
> +
>   	/* Display functions */
>   	struct drm_i915_display_funcs display;
>
> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
>   	 */
>   	struct fence fence;
>   	struct rcu_head rcu_head;
> +	struct list_head signal_link;
> +	bool cancelled;
> +	bool irq_enabled;
> +	bool signal_requested;
>
>   	/** On Which ring this request was generated */
>   	struct drm_i915_private *i915;
> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
>   struct drm_i915_gem_request * __must_check
>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>   		       struct i915_gem_context *ctx);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
> +void i915_gem_request_worker(struct work_struct *work);
>
>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>   {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 97e3138..83cf9b0 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -39,6 +39,8 @@
>   #include <linux/pci.h>
>   #include <linux/dma-buf.h>
>
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
>   static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>   static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>   static void
> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   {
>   	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>   	struct drm_i915_private *dev_priv = req->i915;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>   	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	uint32_t seqno;
>   	DEFINE_WAIT(wait);
>   	unsigned long timeout_expire;
>   	s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>
>   	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>
> -	if (list_empty(&req->list))
> -		return 0;
> -
>   	if (i915_gem_request_completed(req))
>   		return 0;
>
> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   	trace_i915_gem_request_wait_begin(req);
>
>   	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> -
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> +	if (req->seqno) {

This needs a comment I think because it is so unusual and new that 
req->seqno == 0 is a special path. To explain why and how it can happen 
here.

> +		ret = __i915_spin_request(req, state);
> +		if (ret == 0)
> +			goto out;
>   	}
>
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	fence_enable_sw_signaling(&req->fence);
> +
>   	for (;;) {
>   		struct timer_list timer;
>
> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			break;
>   		}
>
> +		if (req->seqno) {
> +			/*
> +			 * There is quite a lot of latency in the user interrupt
> +			 * path. So do an explicit seqno check and potentially
> +			 * remove all that delay.
> +			 */
> +			if (req->engine->irq_seqno_barrier)
> +				req->engine->irq_seqno_barrier(req->engine);
> +			seqno = engine->get_seqno(engine);
> +			if (i915_seqno_passed(seqno, req->seqno)) {
> +				ret = 0;
> +				break;
> +			}
> +		}
> +
>   		if (signal_pending_state(state, current)) {
>   			ret = -ERESTARTSYS;
>   			break;
> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>   			destroy_timer_on_stack(&timer);
>   		}
>   	}
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
>
>   	finish_wait(&engine->irq_queue, &wait);

Hm I don't understand why our custom waiting remains? Shouldn't 
fence_wait just be called after the optimistic spin, more or less?

>
>   out:
>   	trace_i915_gem_request_wait_end(req);
>
> +	if ((ret == 0) && (req->seqno)) {
> +		if (req->engine->irq_seqno_barrier)
> +			req->engine->irq_seqno_barrier(req->engine);
> +		seqno = engine->get_seqno(engine);
> +		if (i915_seqno_passed(seqno, req->seqno) &&
> +		    !i915_gem_request_completed(req)) {
> +			/*
> +			 * Make sure the request is marked as completed before
> +			 * returning. NB: Need to acquire the spinlock around
> +			 * the whole call to avoid a race condition with the
> +			 * interrupt handler is running concurrently and could
> +			 * cause this invocation to early exit even though the
> +			 * request has not actually been fully processed yet.
> +			 */
> +			spin_lock_irq(&req->engine->fence_lock);
> +			i915_gem_request_notify(req->engine, true);
> +			spin_unlock_irq(&req->engine->fence_lock);
> +		}
> +	}
> +
>   	if (timeout) {
>   		s64 tres = *timeout - (ktime_get_raw_ns() - before);
>
> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   {
>   	trace_i915_gem_request_retire(request);
>
> +	if (request->irq_enabled) {
> +		request->engine->irq_put(request->engine);
> +		request->irq_enabled = false;

What protects request->irq_enabled? Here versus enable_signalling bit? 
It can be called from the external fence users which would take the 
fence_lock, but here it does not.

> +	}

> +
>   	/* We know the GPU must have read the request to have
>   	 * sent us the seqno + interrupt, so use the position
>   	 * of tail of the request to update the last known position
> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>   	list_del_init(&request->list);
>   	i915_gem_request_remove_from_client(request);
>
> +	/*
> +	 * In case the request is still in the signal pending list,
> +	 * e.g. due to being cancelled by TDR, preemption, etc.
> +	 */
> +	if (!list_empty(&request->signal_link)) {

No locking required here?

> +		/*
> +		 * The request must be marked as cancelled and the underlying
> +		 * fence as failed. NB: There is no explicit fence fail API,
> +		 * there is only a manual poke and signal.
> +		 */
> +		request->cancelled = true;
> +		/* How to propagate to any associated sync_fence??? */
> +		request->fence.status = -EIO;
> +		fence_signal_locked(&request->fence);

And here?

> +	}
> +
>   	if (request->previous_context) {
>   		if (i915.enable_execlists)
>   			intel_lr_context_unpin(request->previous_context,
> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>   	 */
>   	request->postfix = intel_ring_get_tail(ringbuf);
>
> +	/*
> +	 * Add the fence to the pending list before emitting the commands to
> +	 * generate a seqno notification interrupt.
> +	 */
> +	i915_gem_request_submit(request);
> +
>   	if (i915.enable_execlists)
>   		ret = engine->emit_request(request);
>   	else {
> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence *req_fence)
>   	struct drm_i915_gem_request *req;
>
>   	req = container_of(req_fence, typeof(*req), fence);
> +
> +	WARN_ON(req->irq_enabled);

How useful is this? If it went wrong engine irq reference counting would 
be bad. Okay no one would notice, but we could then stick some other 
warns here, list !list_empy(req->list) and who knows what, which we 
don't have, so I am just wondering if this one brings any value.

> +
>   	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>   }
>
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request is about to be submitted to the hardware so add the fence to
> + * the list of signalable fences.
> + *
> + * NB: This does not necessarily enable interrupts yet. That only occurs on
> + * demand when the request is actually waited on. However, adding it to the
> + * list early ensures that there is no race condition where the interrupt
> + * could pop out prematurely and thus be completely lost. The race is merely
> + * that the interrupt must be manually checked for after being enabled.
> + */
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>   {
> -	/* Interrupt driven fences are not implemented yet.*/
> -	WARN(true, "This should not be called!");
> -	return true;
> +	/*
> +	 * Always enable signal processing for the request's fence object
> +	 * before that request is submitted to the hardware. Thus there is no
> +	 * race condition whereby the interrupt could pop out before the
> +	 * request has been added to the signal list. Hence no need to check
> +	 * for completion, undo the list add and return false.
> +	 */
> +	i915_gem_request_reference(req);
> +	spin_lock_irq(&req->engine->fence_lock);
> +	WARN_ON(!list_empty(&req->signal_link));
> +	list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
> +	spin_unlock_irq(&req->engine->fence_lock);
> +
> +	/*
> +	 * NB: Interrupts are only enabled on demand. Thus there is still a
> +	 * race where the request could complete before the interrupt has
> +	 * been enabled. Thus care must be taken at that point.
> +	 */
> +
> +	/* Have interrupts already been requested? */
> +	if (req->signal_requested)
> +		i915_gem_request_enable_interrupt(req, false);

I am thinking that the fence lock could be held here until the end of 
the function and in such way i915_gem_request_enable_interrupt would not 
need the fence_locked parameter any more.

It would probably also be safer with regards to accesing the 
req->signal_requested. I am not sure that enable signalling and this 
otherwise can't race and miss the signal_requested getting set?

> +}
> +
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked)
> +{
> +	struct drm_i915_private *dev_priv = req->engine->i915;
> +	const bool irq_test_in_progress =
> +		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
> +						intel_engine_flag(req->engine);
> +
> +	if (req->irq_enabled)
> +		return;
> +
> +	if (irq_test_in_progress)
> +		return;
> +
> +	if (!WARN_ON(!req->engine->irq_get(req->engine)))
> +		req->irq_enabled = true;

The double negation confused me a bit. It is probably not ideal since 
WARN_ONs go to the out of line section so in a way it is deliberately 
penalising the fast and expected path. I think it would be better to put 
a WARN on the else path.

> +
> +	/*
> +	 * Because the interrupt is only enabled on demand, there is a race
> +	 * where the interrupt can fire before anyone is looking for it. So
> +	 * do an explicit check for missed interrupts.
> +	 */
> +	i915_gem_request_notify(req->engine, fence_locked);
>   }
>
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>   {
>   	struct drm_i915_gem_request *req = container_of(req_fence,
>   						 typeof(*req), fence);
> +
> +	/*
> +	 * No need to actually enable interrupt based processing until the
> +	 * request has been submitted to the hardware. At which point
> +	 * 'i915_gem_request_submit()' is called. So only really enable
> +	 * signalling in there. Just set a flag to say that interrupts are
> +	 * wanted when the request is eventually submitted. On the other hand
> +	 * if the request has already been submitted then interrupts do need
> +	 * to be enabled now.
> +	 */
> +
> +	req->signal_requested = true;
> +
> +	if (!list_empty(&req->signal_link))

In what scenarios is the list_empty check needed? Someone can somehow 
enable signalling on a fence not yet submitted?

> +		i915_gem_request_enable_interrupt(req, true);
> +
> +	return true;
> +}
> +
> +/**
> + * i915_gem_request_worker - request work handler callback.
> + * @work: Work structure
> + * Called in response to a seqno interrupt to process the completed requests.
> + */
> +void i915_gem_request_worker(struct work_struct *work)
> +{
> +	struct intel_engine_cs *engine;
> +
> +	engine = container_of(work, struct intel_engine_cs, request_work);
> +	i915_gem_request_notify(engine, false);
> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>   	u32 seqno;
>
> -	seqno = req->engine->get_seqno(req->engine);
> +	if (list_empty(&engine->fence_signal_list))

Okay this without the lock still makes me nervous. I'd rather not having 
to think about why it is safe and can't miss a wakeup.

Also I would be leaning toward having i915_gem_request_notify and 
i915_gem_request_notify__unlocked. With the enable_interrupts 
simplification I suggested it think it would look better and be more 
consistent with the rest of the driver.

> +		return;
> +
> +	if (!fence_locked)
> +		spin_lock_irqsave(&engine->fence_lock, flags);

Not called from hard irq context so can be just spin_lock_irq.

But if you agree to go with the tasklet it would then be spin_lock_bh.

>
> -	return i915_seqno_passed(seqno, req->seqno);
> +	if (engine->irq_seqno_barrier)
> +		engine->irq_seqno_barrier(engine);
> +	seqno = engine->get_seqno(engine);
> +
> +	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
> +		if (!req->cancelled) {
> +			if (!i915_seqno_passed(seqno, req->seqno))
> +				break;

Merge to one if statement?

> +		}
> +
> +		/*
> +		 * Start by removing the fence from the signal list otherwise
> +		 * the retire code can run concurrently and get confused.
> +		 */
> +		list_del_init(&req->signal_link);
> +
> +		if (!req->cancelled)
> +			fence_signal_locked(&req->fence);

I forgot how signalling errors to userspace works? Does that still work 
for cancelled fences in this series?

> +
> +		if (req->irq_enabled) {
> +			req->engine->irq_put(req->engine);
> +			req->irq_enabled = false;
> +		}
> +
> +		i915_gem_request_unreference(req);
> +	}
> +
> +	if (!fence_locked)
> +		spin_unlock_irqrestore(&engine->fence_lock, flags);
>   }
>
>   static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
>
>   static const struct fence_ops i915_gem_request_fops = {
>   	.enable_signaling	= i915_gem_request_enable_signaling,
> -	.signaled		= i915_gem_request_is_completed,
>   	.wait			= fence_default_wait,
>   	.release		= i915_gem_request_free,
>   	.get_driver_name	= i915_gem_request_get_driver_name,
> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>   	req->ctx  = ctx;
>   	i915_gem_context_reference(req->ctx);
>
> +	INIT_LIST_HEAD(&req->signal_link);
>   	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>   		   ctx->engine[engine->id].fence_timeline.fence_context,
>   		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
>   		i915_gem_request_retire(request);
>   	}
>
> +	/*
> +	 * Tidy up anything left over. This includes a call to
> +	 * i915_gem_request_notify() which will make sure that any requests
> +	 * that were on the signal pending list get also cleaned up.
> +	 */
> +	i915_gem_retire_requests_ring(engine);

Hmm.. but this function has just walked the same lists this will, and 
done the same processing. Why call this from here? It looks bad to me, 
the two are different special cases of the similar thing so I can't see 
that calling this from here makes sense.

> +
>   	/* Having flushed all requests from all queues, we know that all
>   	 * ringbuffers must now be empty. However, since we do not reclaim
>   	 * all space when retiring the request (to prevent HEADs colliding
> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>   {
>   	WARN_ON(i915_verify_lists(engine->dev));
>
> +	/*
> +	 * If no-one has waited on a request recently then interrupts will
> +	 * not have been enabled and thus no requests will ever be marked as
> +	 * completed. So do an interrupt check now.
> +	 */
> +	i915_gem_request_notify(engine, false);

Would it work to signal the fence from the existing loop a bit above in 
this function which already walks the request list in search for 
completed ones? Or maybe even in i915_gem_request_retire?

I am thinking about doing less list walking and better integration with 
the core GEM. Downside would be more traffic on the fence_lock, hmm.. 
not sure then. It just looks a bit bolted on like this.

I don't see it being a noticeable cost so perhaps it can stay like this 
for now.

> +
>   	/* Retire requests first as we use it above for the early return.
>   	 * If we retire requests last, we may use a later seqno and so clear
>   	 * the requests lists without clearing the active list, leading to
> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
>   {
>   	INIT_LIST_HEAD(&engine->active_list);
>   	INIT_LIST_HEAD(&engine->request_list);
> +	INIT_LIST_HEAD(&engine->fence_signal_list);
>   }
>
>   void
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index f780421..a87a3c5 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
>   	trace_i915_gem_request_notify(engine);
>   	engine->user_interrupts++;
>
> +	queue_work(engine->i915->req_wq, &engine->request_work);
> +
>   	wake_up_all(&engine->irq_queue);

Yes that is the weird part, why the engine->irq_queue has to remain with 
this patch?

>   }
>
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index f126bcb..134759d 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>
>   	dev_priv = engine->i915;
>
> +	cancel_work_sync(&engine->request_work);
> +
>   	if (engine->buffer) {
>   		intel_logical_ring_stop(engine);
>   		WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>
>   	INIT_LIST_HEAD(&engine->active_list);
>   	INIT_LIST_HEAD(&engine->request_list);
> +	INIT_LIST_HEAD(&engine->fence_signal_list);
>   	INIT_LIST_HEAD(&engine->buffers);
>   	INIT_LIST_HEAD(&engine->execlist_queue);
>   	spin_lock_init(&engine->execlist_lock);
> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>   	tasklet_init(&engine->irq_tasklet,
>   		     intel_lrc_irq_handler, (unsigned long)engine);
>
> +	INIT_WORK(&engine->request_work, i915_gem_request_worker);
> +
>   	logical_ring_init_platform_invariants(engine);
>   	logical_ring_default_vfuncs(engine);
>   	logical_ring_default_irqs(engine, info->irq_shift);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index fbd3f12..1641096 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>   	INIT_LIST_HEAD(&engine->request_list);
>   	INIT_LIST_HEAD(&engine->execlist_queue);
>   	INIT_LIST_HEAD(&engine->buffers);
> +	INIT_LIST_HEAD(&engine->fence_signal_list);
>   	spin_lock_init(&engine->fence_lock);
>   	i915_gem_batch_pool_init(dev, &engine->batch_pool);
>   	memset(engine->semaphore.sync_seqno, 0,
> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>
>   	init_waitqueue_head(&engine->irq_queue);
>
> +	INIT_WORK(&engine->request_work, i915_gem_request_worker);
> +
>   	ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>   	if (IS_ERR(ringbuf)) {
>   		ret = PTR_ERR(ringbuf);
> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
>
>   	dev_priv = engine->i915;
>
> +	cancel_work_sync(&engine->request_work);
> +
>   	if (engine->buffer) {
>   		intel_stop_engine(engine);
>   		WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) == 0);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 3f39daf..51779b4 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -347,6 +347,9 @@ struct intel_engine_cs {
>   	u32 (*get_cmd_length_mask)(u32 cmd_header);
>
>   	spinlock_t fence_lock;
> +	struct list_head fence_signal_list;
> +
> +	struct work_struct request_work;
>   };
>
>   static inline bool
>

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects
  2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
  2016-06-02 10:28   ` Tvrtko Ursulin
@ 2016-06-07 11:17   ` Maarten Lankhorst
  2016-06-09 17:22     ` John Harrison
  1 sibling, 1 reply; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-07 11:17 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The purpose of this patch series is to convert the requst structure to
> use fence objects for the underlying completion tracking. The fence
> object requires a sequence number. The ultimate aim is to use the same
> sequence number as for the request itself (or rather, to remove the
> request's seqno field and just use the fence's value throughout the
> driver). However, this is not currently possible and so this patch
> introduces a separate numbering scheme as an intermediate step.
>
> A major advantage of using the fence object is that it can be passed
> outside of the i915 driver and used externally. The fence API allows
> for various operations such as combining multiple fences. This
> requires that fence seqnos within a single fence context be guaranteed
> in-order. The GPU scheduler that is coming can re-order request
> execution but not within a single GPU context. Thus the fence context
> must be tied to the i915 context (and the engine within the context as
> each engine runs asynchronously).
>
> On the other hand, the driver as a whole currently only works with
> request seqnos that are allocated from a global in-order timeline. It
> will require a fair chunk of re-work to allow multiple independent
> seqno timelines to be used. Hence the introduction of a temporary,
> fence specific timeline. Once the work to update the rest of the
> driver has been completed then the request can use the fence seqno
> instead.
>
> v2: New patch in series.
>
> v3: Renamed/retyped timeline structure fields after review comments by
> Tvrtko Ursulin.
>
> Added context information to the timeline's name string for better
> identification in debugfs output.
>
> v5: Line wrapping and other white space fixes to keep style checker
> happy.
>
> v7: Updated to newer nightly (lots of ring -> engine renaming).
>
> v8: Moved to earlier in patch series so no longer needs to remove the
> quick hack timeline that was being added before.
>
> v9: Updated to another newer nightly (changes to context structure
> naming). Also updated commit message to match previous changes.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>  drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_context.c | 16 +++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>  4 files changed, 78 insertions(+)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 2a88a46..a5f8ad8 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -831,6 +831,19 @@ struct i915_ctx_hang_stats {
>  	bool banned;
>  };
>  
> +struct i915_fence_timeline {
> +	char        name[32];
> +	unsigned    fence_context;
Should be a u64 now, since commit 76bf0db5543976ef50362db7071da367cb118532
> +	unsigned    next;
> +
> +	struct i915_gem_context *ctx;
> +	struct intel_engine_cs *engine;
> +};
> +
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct i915_gem_context *ctx,
> +			       struct intel_engine_cs *ring);
> +
>  /* This must match up with the value previously used for execbuf2.rsvd1. */
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
> @@ -875,6 +888,7 @@ struct i915_gem_context {
>  		u64 lrc_desc;
>  		int pin_count;
>  		bool initialised;
> +		struct i915_fence_timeline fence_timeline;
>  	} engine[I915_NUM_ENGINES];
>  
>  	struct list_head link;
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 5ffc6fa..57d3593 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -2743,6 +2743,46 @@ void i915_gem_request_free(struct kref *req_ref)
>  	kmem_cache_free(req->i915->requests, req);
>  }
>  
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct i915_gem_context *ctx,
> +			       struct intel_engine_cs *engine)
> +{
> +	struct i915_fence_timeline *timeline;
> +
> +	timeline = &ctx->engine[engine->id].fence_timeline;
> +
> +	if (timeline->engine)
> +		return 0;
Do you ever expect a reinit?
> +	timeline->fence_context = fence_context_alloc(1);
> +
> +	/*
> +	 * Start the timeline from seqno 0 as this is a special value
> +	 * that is reserved for invalid sync points.
> +	 */
> +	timeline->next       = 1;
> +	timeline->ctx        = ctx;
> +	timeline->engine     = engine;
> +
> +	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
> +		 timeline->fence_context, engine->name, ctx->user_handle);
> +
> +	return 0;
> +}
> +
On top of the other comments, you might want to add a TODO comment that there should be only one timeline for each context,
with each engine having only a unique fence->context.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
  2016-06-02 11:07   ` Tvrtko Ursulin
@ 2016-06-07 11:42     ` Maarten Lankhorst
  2016-06-07 12:11       ` Tvrtko Ursulin
  2016-06-10 11:26       ` John Harrison
  0 siblings, 2 replies; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-07 11:42 UTC (permalink / raw)
  To: Tvrtko Ursulin, John.C.Harrison, Intel-GFX

Op 02-06-16 om 13:07 schreef Tvrtko Ursulin:
>
> On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> There is a construct in the linux kernel called 'struct fence' that is
>> intended to keep track of work that is executed on hardware. I.e. it
>> solves the basic problem that the drivers 'struct
>> drm_i915_gem_request' is trying to address. The request structure does
>> quite a lot more than simply track the execution progress so is very
>> definitely still required. However, the basic completion status side
>> could be updated to use the ready made fence implementation and gain
>> all the advantages that provides.
>>
>> This patch makes the first step of integrating a struct fence into the
>> request. It replaces the explicit reference count with that of the
>> fence. It also replaces the 'is completed' test with the fence's
>> equivalent. Currently, that simply chains on to the original request
>> implementation. A future patch will improve this.
>>
>> v3: Updated after review comments by Tvrtko Ursulin. Added fence
>> context/seqno pair to the debugfs request info. Renamed fence 'driver
>> name' to just 'i915'. Removed BUG_ONs.
>>
>> v5: Changed seqno format in debugfs to %x rather than %u as that is
>> apparently the preferred appearance. Line wrapped some long lines to
>> keep the style checker happy.
>>
>> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
>> was with the re-worked busy spin precursor to waiting on a request. In
>> particular, the addition of a 'request_started' helper function. This
>> has no corresponding concept within the fence framework. However, it
>> is only ever used in one place and the whole point of that place is to
>> always directly read the seqno for absolutely lowest latency possible.
>> So the simple solution is to just make the seqno test explicit at that
>> point now rather than later in the series (it was previously being
>> done anyway when fences become interrupt driven).
>>
>> v7: Rebased to newer nightly - lots of ring -> engine renaming and
>> interface change to get_seqno().
>>
>> v8: Rebased to newer nightly - no longer needs to worry about mutex
>> locking in the request free code path. Moved to after fence timeline
>> patch so no longer needs to add a horrid hack timeline.
>>
>> Removed commented out code block. Added support for possible RCU usage
>> of fence object (Review comments by Maarten Lankhorst).
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>
> Was it an r-b or an ack from Jesse? If the former does it need a "(v?)" suffix, depending on the amount of code changes after his r-b?
>
>> ---
>>   drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
>>   drivers/gpu/drm/i915/i915_drv.h         |  43 +++++---------
>>   drivers/gpu/drm/i915/i915_gem.c         | 101 +++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>>   6 files changed, 115 insertions(+), 38 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>> index ac7e569..844cc4b 100644
>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>> @@ -767,11 +767,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>>               task = NULL;
>>               if (req->pid)
>>                   task = pid_task(req->pid, PIDTYPE_PID);
>> -            seq_printf(m, "    %x @ %d: %s [%d]\n",
>> +            seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
>
> In the previous patch fence context and seqno were %d in the timeline->name so it would probably be more consistent.
>
>>                      req->seqno,
>>                      (int) (jiffies - req->emitted_jiffies),
>>                      task ? task->comm : "<unknown>",
>> -                   task ? task->pid : -1);
>> +                   task ? task->pid : -1,
>> +                   req->fence.context, req->fence.seqno);
req->fence.context is 64-bits, will probably cause a compiler warning.
>>               rcu_read_unlock();
>>           }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index a5f8ad8..905feae 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -42,6 +42,7 @@
>>   #include <linux/kref.h>
>>   #include <linux/pm_qos.h>
>>   #include <linux/shmem_fs.h>
>> +#include <linux/fence.h>
>>
>>   #include <drm/drmP.h>
>>   #include <drm/intel-gtt.h>
>> @@ -2353,7 +2354,11 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
>>    * initial reference taken using kref_init
>>    */
>>   struct drm_i915_gem_request {
>> -    struct kref ref;
>> +    /**
>> +     * Underlying object for implementing the signal/wait stuff.
>> +     */
>> +    struct fence fence;
>> +    struct rcu_head rcu_head;
fenece.rcu can be used, no need to duplicate. :)
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2455,7 +2460,13 @@ struct drm_i915_gem_request {
>>   struct drm_i915_gem_request * __must_check
>>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>>                  struct i915_gem_context *ctx);
>> -void i915_gem_request_free(struct kref *req_ref);
>> +
>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> +                          bool lazy_coherency)
>> +{
>> +    return fence_is_signaled(&req->fence);
>> +}
>
> I would squash the following patch into this one, it makes no sense to keep a function with an unused parameter. And fewer patches in the series makes it less scary to review. :) Of course if they are also not too big. :D
It's easier to read with all the function parameter changes in a separate patch.
>> +
>>   int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>                      struct drm_file *file);
>>
>> @@ -2475,14 +2486,14 @@ static inline struct drm_i915_gem_request *
>>   i915_gem_request_reference(struct drm_i915_gem_request *req)
>>   {
>>       if (req)
>> -        kref_get(&req->ref);
>> +        fence_get(&req->fence);
>>       return req;
>>   }
>>
>>   static inline void
>>   i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>   {
>> -    kref_put(&req->ref, i915_gem_request_free);
>> +    fence_put(&req->fence);
>>   }
>>
>>   static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>> @@ -2498,12 +2509,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>   }
>>
>>   /*
>> - * XXX: i915_gem_request_completed should be here but currently needs the
>> - * definition of i915_seqno_passed() which is below. It will be moved in
>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>> - */
>> -
>> -/*
>>    * A command that requires special handling by the command parser.
>>    */
>>   struct drm_i915_cmd_descriptor {
>> @@ -3211,24 +3216,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>       return (int32_t)(seq1 - seq2) >= 0;
>>   }
>>
>> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
>> -                       bool lazy_coherency)
>> -{
>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>> -        req->engine->irq_seqno_barrier(req->engine);
>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>> -                 req->previous_seqno);
>> -}
>> -
>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>> -                          bool lazy_coherency)
>> -{
>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>> -        req->engine->irq_seqno_barrier(req->engine);
>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>> -                 req->seqno);
>> -}
>> -
>>   int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
>>   int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 57d3593..b67fd7c 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>   {
>>       unsigned long timeout;
>>       unsigned cpu;
>> +    uint32_t seqno;
>>
>>       /* When waiting for high frequency requests, e.g. during synchronous
>>        * rendering split between the CPU and GPU, the finite amount of time
>> @@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>           return -EBUSY;
>>
>>       /* Only spin if we know the GPU is processing this request */
>> -    if (!i915_gem_request_started(req, true))
>> +    seqno = req->engine->get_seqno(req->engine);
>> +    if (!i915_seqno_passed(seqno, req->previous_seqno))
>>           return -EAGAIN;
>>
>>       timeout = local_clock_us(&cpu) + 5;
>>       while (!need_resched()) {
>> -        if (i915_gem_request_completed(req, true))
>> +        seqno = req->engine->get_seqno(req->engine);
>> +        if (i915_seqno_passed(seqno, req->seqno))
>>               return 0;
>>
>>           if (signal_pending_state(state, current))
>> @@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>           cpu_relax_lowlatency();
>>       }
>>
>> -    if (i915_gem_request_completed(req, false))
>> +    if (req->engine->irq_seqno_barrier)
>> +        req->engine->irq_seqno_barrier(req->engine);
>> +    seqno = req->engine->get_seqno(req->engine);
>> +    if (i915_seqno_passed(seqno, req->seqno))
>>           return 0;
>>
>>       return -EAGAIN;
>> @@ -2736,13 +2742,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>       }
>>   }
>>
>> -void i915_gem_request_free(struct kref *req_ref)
>> +static void i915_gem_request_free_rcu(struct rcu_head *head)
>>   {
>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>> -                         typeof(*req), ref);
>> +    struct drm_i915_gem_request *req;
>> +
>> +    req = container_of(head, typeof(*req), rcu_head);
>>       kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> +static void i915_gem_request_free(struct fence *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req;
>> +
>> +    req = container_of(req_fence, typeof(*req), fence);
>> +    call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>> +}
>> +
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +{
>> +    /* Interrupt driven fences are not implemented yet.*/
>> +    WARN(true, "This should not be called!");
>> +    return true;
>> +}
>> +
>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>> +                         typeof(*req), fence);
>> +    u32 seqno;
>> +
>> +    seqno = req->engine->get_seqno(req->engine);
>> +
>> +    return i915_seqno_passed(seqno, req->seqno);
>> +}
>> +
>> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>> +{
>> +    return "i915";
>> +}
>> +
>> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>> +{
>> +    struct drm_i915_gem_request *req;
>> +    struct i915_fence_timeline *timeline;
>> +
>> +    req = container_of(req_fence, typeof(*req), fence);
>> +    timeline = &req->ctx->engine[req->engine->id].fence_timeline;
>> +
>> +    return timeline->name;
>> +}
>> +
>> +static void i915_gem_request_timeline_value_str(struct fence *req_fence,
>> +                        char *str, int size)
>> +{
>> +    struct drm_i915_gem_request *req;
>> +
>> +    req = container_of(req_fence, typeof(*req), fence);
>> +
>> +    /* Last signalled timeline value ??? */
>> +    snprintf(str, size, "? [%d]"/*, timeline->value*/,
>
> Reference to timeline->value a leftover from the past?
>
> Is the string format defined by the API? Asking because "? [%d]" looks intriguing.
>
>> +         req->engine->get_seqno(req->engine));
>> +}
>> +
>> +static void i915_gem_request_fence_value_str(struct fence *req_fence,
>> +                         char *str, int size)
>> +{
>> +    struct drm_i915_gem_request *req;
>> +
>> +    req = container_of(req_fence, typeof(*req), fence);
>> +
>> +    snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
>
> Is it OK to put req->seqno in this one? OR it is just for debug anyway so it helps us and fence framework does not care?
I think this is used for getting all info from debugfs only, so req->seqno is fine.
>> +}
>> +
>> +static const struct fence_ops i915_gem_request_fops = {
>> +    .enable_signaling    = i915_gem_request_enable_signaling,
>> +    .signaled        = i915_gem_request_is_completed,
>> +    .wait            = fence_default_wait,
>> +    .release        = i915_gem_request_free,
>> +    .get_driver_name    = i915_gem_request_get_driver_name,
>> +    .get_timeline_name    = i915_gem_request_get_timeline_name,
>> +    .fence_value_str    = i915_gem_request_fence_value_str,
>> +    .timeline_value_str    = i915_gem_request_timeline_value_str,
>> +};
>> +
>>   int i915_create_fence_timeline(struct drm_device *dev,
>>                      struct i915_gem_context *ctx,
>>                      struct intel_engine_cs *engine)
>> @@ -2770,7 +2852,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
>>       return 0;
>>   }
>>
>> -unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>> +static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>>   {
>>       unsigned seqno;
>>
>> @@ -2814,13 +2896,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>       if (ret)
>>           goto err;
>>
>> -    kref_init(&req->ref);
>>       req->i915 = dev_priv;
>>       req->engine = engine;
>>       req->reset_counter = reset_counter;
>>       req->ctx  = ctx;
>>       i915_gem_context_reference(req->ctx);
>>
>> +    fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>> +           ctx->engine[engine->id].fence_timeline.fence_context,
>> +           i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>> +
>>       /*
>>        * Reserve space in the ring buffer for all the commands required to
>>        * eventually emit this request. This is to guarantee that the
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 14bcfb7..f126bcb 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2030,6 +2030,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>       INIT_LIST_HEAD(&engine->buffers);
>>       INIT_LIST_HEAD(&engine->execlist_queue);
>>       spin_lock_init(&engine->execlist_lock);
>> +    spin_lock_init(&engine->fence_lock);
>>
>>       tasklet_init(&engine->irq_tasklet,
>>                intel_lrc_irq_handler, (unsigned long)engine);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index 8d35a39..fbd3f12 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>       INIT_LIST_HEAD(&engine->request_list);
>>       INIT_LIST_HEAD(&engine->execlist_queue);
>>       INIT_LIST_HEAD(&engine->buffers);
>> +    spin_lock_init(&engine->fence_lock);
>>       i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>       memset(engine->semaphore.sync_seqno, 0,
>>              sizeof(engine->semaphore.sync_seqno));
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index b33c876..3f39daf 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -345,6 +345,8 @@ struct intel_engine_cs {
>>        * to encode the command length in the header).
>>        */
>>       u32 (*get_cmd_length_mask)(u32 cmd_header);
>> +
>> +    spinlock_t fence_lock;
>
> Why is this lock per-engine, and not for example per timeline? Aren't fencees living completely isolated in their per-context-per-engine domains? So presumably there is something somewhere which is shared outside that domain to need a lock at the engine level? 
All outstanding requests are added to engine->fence_signal_list in patch 4, which means a per engine lock is required.


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences
  2016-06-02 13:25   ` Tvrtko Ursulin
@ 2016-06-07 12:02     ` Maarten Lankhorst
  2016-06-07 12:19       ` Tvrtko Ursulin
  2016-06-13 15:51       ` John Harrison
  0 siblings, 2 replies; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-07 12:02 UTC (permalink / raw)
  To: Tvrtko Ursulin, John.C.Harrison, Intel-GFX

Op 02-06-16 om 15:25 schreef Tvrtko Ursulin:
>
> On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The intended usage model for struct fence is that the signalled status
>> should be set on demand rather than polled. That is, there should not
>> be a need for a 'signaled' function to be called everytime the status
>> is queried. Instead, 'something' should be done to enable a signal
>> callback from the hardware which will update the state directly. In
>> the case of requests, this is the seqno update interrupt. The idea is
>> that this callback will only be enabled on demand when something
>> actually tries to wait on the fence.
>>
>> This change removes the polling test and replaces it with the callback
>> scheme. Each fence is added to a 'please poke me' list at the start of
>> i915_add_request(). The interrupt handler then scans through the 'poke
>> me' list when a new seqno pops out and signals any matching
>> fence/request. The fence is then removed from the list so the entire
>> request stack does not need to be scanned every time. Note that the
>> fence is added to the list before the commands to generate the seqno
>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>> race free if the interrupt is already enabled.
>>
>> Note that the interrupt is only enabled on demand (i.e. when
>> __wait_request() is called). Thus there is still a potential race when
>> enabling the interrupt as the request may already have completed.
>> However, this is simply solved by calling the interrupt processing
>> code immediately after enabling the interrupt and thereby checking for
>> already completed requests.
>>
>> Lastly, the ring clean up code has the possibility to cancel
>> outstanding requests (e.g. because TDR has reset the ring). These
>> requests will never get signalled and so must be removed from the
>> signal list manually. This is done by setting a 'cancelled' flag and
>> then calling the regular notify/retire code path rather than
>> attempting to duplicate the list manipulatation and clean up code in
>> multiple places. This also avoid any race condition where the
>> cancellation request might occur after/during the completion interrupt
>> actually arriving.
>>
>> v2: Updated to take advantage of the request unreference no longer
>> requiring the mutex lock.
>>
>> v3: Move the signal list processing around to prevent unsubmitted
>> requests being added to the list. This was occurring on Android
>> because the native sync implementation calls the
>> fence->enable_signalling API immediately on fence creation.
>>
>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>> 'link' instead of 'list'. Added support for returning an error code on
>> a cancelled fence. Update list processing to be more efficient/safer
>> with respect to spinlocks.
>>
>> v5: Made i915_gem_request_submit a static as it is only ever called
>> from one place.
>>
>> Fixed up the low latency wait optimisation. The time delay between the
>> seqno value being to memory and the drive's ISR running can be
>> significant, at least for the wait request micro-benchmark. This can
>> be greatly improved by explicitly checking for seqno updates in the
>> pre-wait busy poll loop. Also added some documentation comments to the
>> busy poll code.
>>
>> Fixed up support for the faking of lost interrupts
>> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
>> tells the driver to loose interrupts deliberately and then check that
>> everything still works as expected (albeit much slower).
>>
>> Updates from review comments: use non IRQ-save spinlocking, early exit
>> on WARN and improved comments (Tvrtko Ursulin).
>>
>> v6: Updated to newer nigthly and resolved conflicts around the
>> wait_request busy spin optimisation. Also fixed a race condition
>> between this early exit path and the regular completion path.
>>
>> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
>> interface change on get_seqno(). Also added a list_empty() check
>> before acquring spinlocks and doing list processing.
>>
>> v8: Updated to newer nightly - changes to request clean up code mean
>> non of the deferred free mess is needed any more.
>>
>> v9: Moved the request completion processing out of the interrupt
>> handler and into a worker thread (Chris Wilson).
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_dma.c         |   9 +-
>>   drivers/gpu/drm/i915/i915_drv.h         |  11 ++
>>   drivers/gpu/drm/i915/i915_gem.c         | 248 +++++++++++++++++++++++++++++---
>>   drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>   drivers/gpu/drm/i915/intel_lrc.c        |   5 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
>>   drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
>>   7 files changed, 260 insertions(+), 23 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>> index 07edaed..f8f60bb 100644
>> --- a/drivers/gpu/drm/i915/i915_dma.c
>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>>       if (dev_priv->wq == NULL)
>>           goto out_err;
>>
>> +    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>> +    if (dev_priv->req_wq == NULL)
>> +        goto out_free_wq;
>> +
>
> Single (per-device) ordered workqueue will serialize interrupt processing across all engines to one thread. Together with the fact request worker does not seem to need the sleeping context, I am thinking that a tasklet per engine would be much better (see engine->irq_tasklet for an example).
>
>>       dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
>>       if (dev_priv->hotplug.dp_wq == NULL)
>> -        goto out_free_wq;
>> +        goto out_free_req_wq;
>>
>>       dev_priv->gpu_error.hangcheck_wq =
>>           alloc_ordered_workqueue("i915-hangcheck", 0);
>> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>>
>>   out_free_dp_wq:
>>       destroy_workqueue(dev_priv->hotplug.dp_wq);
>> +out_free_req_wq:
>> +    destroy_workqueue(dev_priv->req_wq);
>>   out_free_wq:
>>       destroy_workqueue(dev_priv->wq);
>>   out_err:
>> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>>   {
>>       destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>>       destroy_workqueue(dev_priv->hotplug.dp_wq);
>> +    destroy_workqueue(dev_priv->req_wq);
>>       destroy_workqueue(dev_priv->wq);
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 69c3412..5a7f256 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -1851,6 +1851,9 @@ struct drm_i915_private {
>>        */
>>       struct workqueue_struct *wq;
>>
>> +    /* Work queue for request completion processing */
>> +    struct workqueue_struct *req_wq;
>> +
>>       /* Display functions */
>>       struct drm_i915_display_funcs display;
>>
>> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
>>        */
>>       struct fence fence;
>>       struct rcu_head rcu_head;
>> +    struct list_head signal_link;
>> +    bool cancelled;
>> +    bool irq_enabled;
>> +    bool signal_requested;
>>
>>       /** On Which ring this request was generated */
>>       struct drm_i915_private *i915;
>> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
>>   struct drm_i915_gem_request * __must_check
>>   i915_gem_request_alloc(struct intel_engine_cs *engine,
>>                  struct i915_gem_context *ctx);
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>> +                       bool fence_locked);
>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>> +void i915_gem_request_worker(struct work_struct *work);
>>
>>   static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>>   {
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 97e3138..83cf9b0 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -39,6 +39,8 @@
>>   #include <linux/pci.h>
>>   #include <linux/dma-buf.h>
>>
>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
>> +
>>   static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>>   static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>>   static void
>> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>   {
>>       struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>>       struct drm_i915_private *dev_priv = req->i915;
>> -    const bool irq_test_in_progress =
>> -        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>>       int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>> +    uint32_t seqno;
>>       DEFINE_WAIT(wait);
>>       unsigned long timeout_expire;
>>       s64 before = 0; /* Only to silence a compiler warning. */
>> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>
>>       WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>
>> -    if (list_empty(&req->list))
>> -        return 0;
>> -
>>       if (i915_gem_request_completed(req))
>>           return 0;
>>
>> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>       trace_i915_gem_request_wait_begin(req);
>>
>>       /* Optimistic spin for the next jiffie before touching IRQs */
>> -    ret = __i915_spin_request(req, state);
>> -    if (ret == 0)
>> -        goto out;
>> -
>> -    if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
>> -        ret = -ENODEV;
>> -        goto out;
>> +    if (req->seqno) {
>
> This needs a comment I think because it is so unusual and new that req->seqno == 0 is a special path. To explain why and how it can happen here.
>
>> +        ret = __i915_spin_request(req, state);
>> +        if (ret == 0)
>> +            goto out;
>>       }
>>
>> +    /*
>> +     * Enable interrupt completion of the request.
>> +     */
>> +    fence_enable_sw_signaling(&req->fence);
>> +
>>       for (;;) {
>>           struct timer_list timer;
>>
>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>               break;
>>           }
>>
>> +        if (req->seqno) {
>> +            /*
>> +             * There is quite a lot of latency in the user interrupt
>> +             * path. So do an explicit seqno check and potentially
>> +             * remove all that delay.
>> +             */
>> +            if (req->engine->irq_seqno_barrier)
>> +                req->engine->irq_seqno_barrier(req->engine);
>> +            seqno = engine->get_seqno(engine);
>> +            if (i915_seqno_passed(seqno, req->seqno)) {
>> +                ret = 0;
>> +                break;
>> +            }
>> +        }
>> +
>>           if (signal_pending_state(state, current)) {
>>               ret = -ERESTARTSYS;
>>               break;
>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>               destroy_timer_on_stack(&timer);
>>           }
>>       }
>> -    if (!irq_test_in_progress)
>> -        engine->irq_put(engine);
>>
>>       finish_wait(&engine->irq_queue, &wait);
>
> Hm I don't understand why our custom waiting remains? Shouldn't fence_wait just be called after the optimistic spin, more or less?
>
>>
>>   out:
>>       trace_i915_gem_request_wait_end(req);
>>
>> +    if ((ret == 0) && (req->seqno)) {
>> +        if (req->engine->irq_seqno_barrier)
>> +            req->engine->irq_seqno_barrier(req->engine);
>> +        seqno = engine->get_seqno(engine);
>> +        if (i915_seqno_passed(seqno, req->seqno) &&
>> +            !i915_gem_request_completed(req)) {
>> +            /*
>> +             * Make sure the request is marked as completed before
>> +             * returning. NB: Need to acquire the spinlock around
>> +             * the whole call to avoid a race condition with the
>> +             * interrupt handler is running concurrently and could
>> +             * cause this invocation to early exit even though the
>> +             * request has not actually been fully processed yet.
>> +             */
>> +            spin_lock_irq(&req->engine->fence_lock);
>> +            i915_gem_request_notify(req->engine, true);
>> +            spin_unlock_irq(&req->engine->fence_lock);
>> +        }
>> +    }
>> +
>>       if (timeout) {
>>           s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>
>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>   {
>>       trace_i915_gem_request_retire(request);
>>
>> +    if (request->irq_enabled) {
>> +        request->engine->irq_put(request->engine);
>> +        request->irq_enabled = false;
>
> What protects request->irq_enabled? Here versus enable_signalling bit? It can be called from the external fence users which would take the fence_lock, but here it does not.
>
>> +    }
>
>> +
>>       /* We know the GPU must have read the request to have
>>        * sent us the seqno + interrupt, so use the position
>>        * of tail of the request to update the last known position
>> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>       list_del_init(&request->list);
>>       i915_gem_request_remove_from_client(request);
>>
>> +    /*
>> +     * In case the request is still in the signal pending list,
>> +     * e.g. due to being cancelled by TDR, preemption, etc.
>> +     */
>> +    if (!list_empty(&request->signal_link)) {
>
> No locking required here?
Considering the locked function is used, I'm assuming this function holds the fence_lock.

If not, something's seriously wrong.
>> +        /*
>> +         * The request must be marked as cancelled and the underlying
>> +         * fence as failed. NB: There is no explicit fence fail API,
>> +         * there is only a manual poke and signal.
>> +         */
>> +        request->cancelled = true;
>> +        /* How to propagate to any associated sync_fence??? */
^This way works, so comment can be removed.

There's deliberately no way to cancel a fence, it's the same path but with status member set.

If you have a fence for another driver, there's really no good way to handle failure. So you have to treat
it as if it succeeded.
>> +        request->fence.status = -EIO;
>> +        fence_signal_locked(&request->fence);
>
> And here?
>
>> +    }
>> +
>>       if (request->previous_context) {
>>           if (i915.enable_execlists)
>>               intel_lr_context_unpin(request->previous_context,
>> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>>        */
>>       request->postfix = intel_ring_get_tail(ringbuf);
>>
>> +    /*
>> +     * Add the fence to the pending list before emitting the commands to
>> +     * generate a seqno notification interrupt.
>> +     */
>> +    i915_gem_request_submit(request);
>> +
>>       if (i915.enable_execlists)
>>           ret = engine->emit_request(request);
>>       else {
>> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence *req_fence)
>>       struct drm_i915_gem_request *req;
>>
>>       req = container_of(req_fence, typeof(*req), fence);
>> +
>> +    WARN_ON(req->irq_enabled);
>
> How useful is this? If it went wrong engine irq reference counting would be bad. Okay no one would notice, but we could then stick some other warns here, list !list_empy(req->list) and who knows what, which we don't have, so I am just wondering if this one brings any value.
>
>> +
>>       call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>>   }
>>
>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>> +/*
>> + * The request is about to be submitted to the hardware so add the fence to
>> + * the list of signalable fences.
>> + *
>> + * NB: This does not necessarily enable interrupts yet. That only occurs on
>> + * demand when the request is actually waited on. However, adding it to the
>> + * list early ensures that there is no race condition where the interrupt
>> + * could pop out prematurely and thus be completely lost. The race is merely
>> + * that the interrupt must be manually checked for after being enabled.
>> + */
>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>   {
>> -    /* Interrupt driven fences are not implemented yet.*/
>> -    WARN(true, "This should not be called!");
>> -    return true;
>> +    /*
>> +     * Always enable signal processing for the request's fence object
>> +     * before that request is submitted to the hardware. Thus there is no
>> +     * race condition whereby the interrupt could pop out before the
>> +     * request has been added to the signal list. Hence no need to check
>> +     * for completion, undo the list add and return false.
>> +     */
>> +    i915_gem_request_reference(req);
>> +    spin_lock_irq(&req->engine->fence_lock);
>> +    WARN_ON(!list_empty(&req->signal_link));
>> +    list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
>> +    spin_unlock_irq(&req->engine->fence_lock);
>> +
>> +    /*
>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>> +     * race where the request could complete before the interrupt has
>> +     * been enabled. Thus care must be taken at that point.
>> +     */
>> +
>> +    /* Have interrupts already been requested? */
>> +    if (req->signal_requested)
>> +        i915_gem_request_enable_interrupt(req, false);
>
> I am thinking that the fence lock could be held here until the end of the function and in such way i915_gem_request_enable_interrupt would not need the fence_locked parameter any more.
>
> It would probably also be safer with regards to accesing the req->signal_requested. I am not sure that enable signalling and this otherwise can't race and miss the signal_requested getting set?
>
>> +}
>> +
>> +/*
>> + * The request is being actively waited on, so enable interrupt based
>> + * completion signalling.
>> + */
>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>> +                       bool fence_locked)
>> +{
>> +    struct drm_i915_private *dev_priv = req->engine->i915;
>> +    const bool irq_test_in_progress =
>> +        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>> +                        intel_engine_flag(req->engine);
>> +
>> +    if (req->irq_enabled)
>> +        return;
>> +
>> +    if (irq_test_in_progress)
>> +        return;
>> +
>> +    if (!WARN_ON(!req->engine->irq_get(req->engine)))
>> +        req->irq_enabled = true;
>
> The double negation confused me a bit. It is probably not ideal since WARN_ONs go to the out of line section so in a way it is deliberately penalising the fast and expected path. I think it would be better to put a WARN on the else path.
>
>> +
>> +    /*
>> +     * Because the interrupt is only enabled on demand, there is a race
>> +     * where the interrupt can fire before anyone is looking for it. So
>> +     * do an explicit check for missed interrupts.
>> +     */
>> +    i915_gem_request_notify(req->engine, fence_locked);
>>   }
>>
>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>   {
>>       struct drm_i915_gem_request *req = container_of(req_fence,
>>                            typeof(*req), fence);
>> +
>> +    /*
>> +     * No need to actually enable interrupt based processing until the
>> +     * request has been submitted to the hardware. At which point
>> +     * 'i915_gem_request_submit()' is called. So only really enable
>> +     * signalling in there. Just set a flag to say that interrupts are
>> +     * wanted when the request is eventually submitted. On the other hand
>> +     * if the request has already been submitted then interrupts do need
>> +     * to be enabled now.
>> +     */
>> +
>> +    req->signal_requested = true;
>> +
>> +    if (!list_empty(&req->signal_link))
>
> In what scenarios is the list_empty check needed? Someone can somehow enable signalling on a fence not yet submitted?
>> +        i915_gem_request_enable_interrupt(req, true);
>> +
>> +    return true;
>> +}
>> +
>> +/**
>> + * i915_gem_request_worker - request work handler callback.
>> + * @work: Work structure
>> + * Called in response to a seqno interrupt to process the completed requests.
>> + */
>> +void i915_gem_request_worker(struct work_struct *work)
>> +{
>> +    struct intel_engine_cs *engine;
>> +
>> +    engine = container_of(work, struct intel_engine_cs, request_work);
>> +    i915_gem_request_notify(engine, false);
>> +}
>> +
>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
>> +{
>> +    struct drm_i915_gem_request *req, *req_next;
>> +    unsigned long flags;
>>       u32 seqno;
>>
>> -    seqno = req->engine->get_seqno(req->engine);
>> +    if (list_empty(&engine->fence_signal_list))
>
> Okay this without the lock still makes me nervous. I'd rather not having to think about why it is safe and can't miss a wakeup.
>
> Also I would be leaning toward having i915_gem_request_notify and i915_gem_request_notify__unlocked. With the enable_interrupts simplification I suggested it think it would look better and be more consistent with the rest of the driver.
>
>> +        return;
>> +
>> +    if (!fence_locked)
>> +        spin_lock_irqsave(&engine->fence_lock, flags);
>
> Not called from hard irq context so can be just spin_lock_irq.
>
> But if you agree to go with the tasklet it would then be spin_lock_bh.
fence is always spin_lock_irq, if this requires _bh then it can't go into the tasklet.
>>
>> -    return i915_seqno_passed(seqno, req->seqno);
>> +    if (engine->irq_seqno_barrier)
>> +        engine->irq_seqno_barrier(engine);
>> +    seqno = engine->get_seqno(engine);
>> +
>> +    list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
>> +        if (!req->cancelled) {
>> +            if (!i915_seqno_passed(seqno, req->seqno))
>> +                break;
>
> Merge to one if statement?
>
>> +        }
>> +
>> +        /*
>> +         * Start by removing the fence from the signal list otherwise
>> +         * the retire code can run concurrently and get confused.
>> +         */
>> +        list_del_init(&req->signal_link);
>> +
>> +        if (!req->cancelled)
>> +            fence_signal_locked(&req->fence);
>
> I forgot how signalling errors to userspace works? Does that still work for cancelled fences in this series?
>
>> +
>> +        if (req->irq_enabled) {
>> +            req->engine->irq_put(req->engine);
>> +            req->irq_enabled = false;
>> +        }
>> +
>> +        i915_gem_request_unreference(req);
>> +    }
>> +
>> +    if (!fence_locked)
>> +        spin_unlock_irqrestore(&engine->fence_lock, flags);
>>   }
>>
>>   static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
>>
>>   static const struct fence_ops i915_gem_request_fops = {
>>       .enable_signaling    = i915_gem_request_enable_signaling,
>> -    .signaled        = i915_gem_request_is_completed,
>>       .wait            = fence_default_wait,
>>       .release        = i915_gem_request_free,
>>       .get_driver_name    = i915_gem_request_get_driver_name,
>> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>       req->ctx  = ctx;
>>       i915_gem_context_reference(req->ctx);
>>
>> +    INIT_LIST_HEAD(&req->signal_link);
>>       fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>>              ctx->engine[engine->id].fence_timeline.fence_context,
>>              i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
>>           i915_gem_request_retire(request);
>>       }
>>
>> +    /*
>> +     * Tidy up anything left over. This includes a call to
>> +     * i915_gem_request_notify() which will make sure that any requests
>> +     * that were on the signal pending list get also cleaned up.
>> +     */
>> +    i915_gem_retire_requests_ring(engine);
>
> Hmm.. but this function has just walked the same lists this will, and done the same processing. Why call this from here? It looks bad to me, the two are different special cases of the similar thing so I can't see that calling this from here makes sense.
>
>> +
>>       /* Having flushed all requests from all queues, we know that all
>>        * ringbuffers must now be empty. However, since we do not reclaim
>>        * all space when retiring the request (to prevent HEADs colliding
>> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>>   {
>>       WARN_ON(i915_verify_lists(engine->dev));
>>
>> +    /*
>> +     * If no-one has waited on a request recently then interrupts will
>> +     * not have been enabled and thus no requests will ever be marked as
>> +     * completed. So do an interrupt check now.
>> +     */
>> +    i915_gem_request_notify(engine, false);
>
> Would it work to signal the fence from the existing loop a bit above in this function which already walks the request list in search for completed ones? Or maybe even in i915_gem_request_retire?
>
> I am thinking about doing less list walking and better integration with the core GEM. Downside would be more traffic on the fence_lock, hmm.. not sure then. It just looks a bit bolted on like this.
>
> I don't see it being a noticeable cost so perhaps it can stay like this for now.
>
>> +
>>       /* Retire requests first as we use it above for the early return.
>>        * If we retire requests last, we may use a later seqno and so clear
>>        * the requests lists without clearing the active list, leading to
>> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
>>   {
>>       INIT_LIST_HEAD(&engine->active_list);
>>       INIT_LIST_HEAD(&engine->request_list);
>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>   }
>>
>>   void
>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>> index f780421..a87a3c5 100644
>> --- a/drivers/gpu/drm/i915/i915_irq.c
>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
>>       trace_i915_gem_request_notify(engine);
>>       engine->user_interrupts++;
>>
>> +    queue_work(engine->i915->req_wq, &engine->request_work);
>> +
>>       wake_up_all(&engine->irq_queue);
>
> Yes that is the weird part, why the engine->irq_queue has to remain with this patch?
>
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index f126bcb..134759d 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>>
>>       dev_priv = engine->i915;
>>
>> +    cancel_work_sync(&engine->request_work);
>> +
>>       if (engine->buffer) {
>>           intel_logical_ring_stop(engine);
>>           WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
>> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>
>>       INIT_LIST_HEAD(&engine->active_list);
>>       INIT_LIST_HEAD(&engine->request_list);
>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>       INIT_LIST_HEAD(&engine->buffers);
>>       INIT_LIST_HEAD(&engine->execlist_queue);
>>       spin_lock_init(&engine->execlist_lock);
>> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>       tasklet_init(&engine->irq_tasklet,
>>                intel_lrc_irq_handler, (unsigned long)engine);
>>
>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>> +
>>       logical_ring_init_platform_invariants(engine);
>>       logical_ring_default_vfuncs(engine);
>>       logical_ring_default_irqs(engine, info->irq_shift);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> index fbd3f12..1641096 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>       INIT_LIST_HEAD(&engine->request_list);
>>       INIT_LIST_HEAD(&engine->execlist_queue);
>>       INIT_LIST_HEAD(&engine->buffers);
>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>       spin_lock_init(&engine->fence_lock);
>>       i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>       memset(engine->semaphore.sync_seqno, 0,
>> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>
>>       init_waitqueue_head(&engine->irq_queue);
>>
>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>> +
>>       ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>>       if (IS_ERR(ringbuf)) {
>>           ret = PTR_ERR(ringbuf);
>> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
>>
>>       dev_priv = engine->i915;
>>
>> +    cancel_work_sync(&engine->request_work);
>> +
>>       if (engine->buffer) {
>>           intel_stop_engine(engine);
>>           WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) == 0);
>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> index 3f39daf..51779b4 100644
>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>> @@ -347,6 +347,9 @@ struct intel_engine_cs {
>>       u32 (*get_cmd_length_mask)(u32 cmd_header);
>>
>>       spinlock_t fence_lock;
>> +    struct list_head fence_signal_list;
>> +
>> +    struct work_struct request_work;
>>   };
>>
>>   static inline bool 

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed()
  2016-06-01 17:07 ` [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-06-07 12:07   ` Maarten Lankhorst
  0 siblings, 0 replies; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-07 12:07 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The change to the implementation of i915_gem_request_completed() means
> that the lazy coherency flag is no longer used. This can now be
> removed to simplify the interface.
>
> v6: Updated to newer nightly and resolved conflicts.
>
> v7: Updated to newer nightly (lots of ring -> engine renaming).
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
>  drivers/gpu/drm/i915/i915_drv.h      |  3 +--
>  drivers/gpu/drm/i915/i915_gem.c      | 14 +++++++-------
>  drivers/gpu/drm/i915/intel_display.c |  2 +-
>  drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
>  5 files changed, 12 insertions(+), 13 
Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
  2016-06-07 11:42     ` Maarten Lankhorst
@ 2016-06-07 12:11       ` Tvrtko Ursulin
  2016-06-10 11:26       ` John Harrison
  1 sibling, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:11 UTC (permalink / raw)
  To: Maarten Lankhorst, John.C.Harrison, Intel-GFX


On 07/06/16 12:42, Maarten Lankhorst wrote:
> Op 02-06-16 om 13:07 schreef Tvrtko Ursulin:

[snip]

>>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>> +                          bool lazy_coherency)
>>> +{
>>> +    return fence_is_signaled(&req->fence);
>>> +}
>>
>> I would squash the following patch into this one, it makes no sense to keep a function with an unused parameter. And fewer patches in the series makes it less scary to review. :) Of course if they are also not too big. :D
> It's easier to read with all the function parameter changes in a separate patch.

I do not think so, but it is not even near a blocking commit so OK.

>>>        u32 (*get_cmd_length_mask)(u32 cmd_header);
>>> +
>>> +    spinlock_t fence_lock;
>>
>> Why is this lock per-engine, and not for example per timeline? Aren't fencees living completely isolated in their per-context-per-engine domains? So presumably there is something somewhere which is shared outside that domain to need a lock at the engine level?
> All outstanding requests are added to engine->fence_signal_list in patch 4, which means a per engine lock is required.

Okay, a comment is required here to describe the lock then. All what it 
protects and how and when it needs to be taken. Both from the i915 point 
of view and from the fence API side.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 5/6] drm/i915: Updated request structure tracing
  2016-06-01 17:07 ` [PATCH v9 5/6] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-06-07 12:15   ` Maarten Lankhorst
  0 siblings, 0 replies; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-07 12:15 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> Added the '_complete' trace event which occurs when a fence/request is
> signaled as complete. Also moved the notify event from the IRQ handler
> code to inside the notify function itself.
>
> v3: Added the current ring seqno to the notify trace point.
>
> v5: Line wrapping to keep the style checker happy.
>
> v7: Updated to newer nightly (lots of ring -> engine renaming).
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>

Reviewed-by: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences
  2016-06-07 12:02     ` Maarten Lankhorst
@ 2016-06-07 12:19       ` Tvrtko Ursulin
  2016-06-13 15:51       ` John Harrison
  1 sibling, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2016-06-07 12:19 UTC (permalink / raw)
  To: Maarten Lankhorst, John.C.Harrison, Intel-GFX


On 07/06/16 13:02, Maarten Lankhorst wrote:
> Op 02-06-16 om 15:25 schreef Tvrtko Ursulin:

[snip]

>>> +        return;
>>> +
>>> +    if (!fence_locked)
>>> +        spin_lock_irqsave(&engine->fence_lock, flags);
>>
>> Not called from hard irq context so can be just spin_lock_irq.
>>
>> But if you agree to go with the tasklet it would then be spin_lock_bh.
> fence is always spin_lock_irq, if this requires _bh then it can't go into the tasklet.

No if fence API requires the _irq versions then it is fine, they 
supersede the _bh variants.

Also it doesn't have to use the tasklet, I was just suggesting it as 
lighter weight for lower latency since nothing seems to need 
process/sleeping context anyway.

Main thing is that signaling is not serialized across engines by a 
single worker.

Regards,

Tvrtko

P.S. Please try to put newlines between quoted text and your replies? It 
will be easier to find your comments that way.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-06-01 17:07 ` [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-06-07 12:47   ` Maarten Lankhorst
  2016-06-16 12:10     ` John Harrison
  0 siblings, 1 reply; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-07 12:47 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The notify function can be called many times without the seqno
> changing. Some are to prevent races due to the requirement of not
> enabling interrupts until requested. However, when interrupts are
> enabled the IRQ handler can be called multiple times without the
> ring's seqno value changing. E.g. two interrupts are generated by
> batch buffers completing in quick succession, the first call to the
> handler processes both completions but the handler still gets executed
> a second time. This patch reduces the overhead of these extra calls by
> caching the last processed seqno value and early exiting if it has not
> changed.
How significant is this overhead?

Patch looks reasonable otherwise.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects
  2016-06-02 10:28   ` Tvrtko Ursulin
@ 2016-06-09 16:08     ` John Harrison
  0 siblings, 0 replies; 26+ messages in thread
From: John Harrison @ 2016-06-09 16:08 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-GFX

On 02/06/2016 11:28, Tvrtko Ursulin wrote:
>
> On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The purpose of this patch series is to convert the requst structure to
>> use fence objects for the underlying completion tracking. The fence
>> object requires a sequence number. The ultimate aim is to use the same
>> sequence number as for the request itself (or rather, to remove the
>> request's seqno field and just use the fence's value throughout the
>> driver). However, this is not currently possible and so this patch
>> introduces a separate numbering scheme as an intermediate step.
>>
>> A major advantage of using the fence object is that it can be passed
>> outside of the i915 driver and used externally. The fence API allows
>> for various operations such as combining multiple fences. This
>> requires that fence seqnos within a single fence context be guaranteed
>> in-order. The GPU scheduler that is coming can re-order request
>> execution but not within a single GPU context. Thus the fence context
>> must be tied to the i915 context (and the engine within the context as
>> each engine runs asynchronously).
>>
>> On the other hand, the driver as a whole currently only works with
>> request seqnos that are allocated from a global in-order timeline. It
>> will require a fair chunk of re-work to allow multiple independent
>> seqno timelines to be used. Hence the introduction of a temporary,
>> fence specific timeline. Once the work to update the rest of the
>> driver has been completed then the request can use the fence seqno
>> instead.
>>
>> v2: New patch in series.
>>
>> v3: Renamed/retyped timeline structure fields after review comments by
>> Tvrtko Ursulin.
>>
>> Added context information to the timeline's name string for better
>> identification in debugfs output.
>>
>> v5: Line wrapping and other white space fixes to keep style checker
>> happy.
>>
>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>
>> v8: Moved to earlier in patch series so no longer needs to remove the
>> quick hack timeline that was being added before.
>>
>> v9: Updated to another newer nightly (changes to context structure
>> naming). Also updated commit message to match previous changes.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>   drivers/gpu/drm/i915/i915_gem.c         | 40 
>> +++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_context.c | 16 +++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>   4 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h 
>> b/drivers/gpu/drm/i915/i915_drv.h
>> index 2a88a46..a5f8ad8 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -831,6 +831,19 @@ struct i915_ctx_hang_stats {
>>       bool banned;
>>   };
>>
>> +struct i915_fence_timeline {
>> +    char        name[32];
>> +    unsigned    fence_context;
>> +    unsigned    next;
>> +
>> +    struct i915_gem_context *ctx;
>> +    struct intel_engine_cs *engine;
>
> Are these backpointers used in the patch series? I did a quick search 
> with the "timeline->" string and did not find anything.
Hmm, not any more it seems. Will remove them.

>
>> +};
>> +
>> +int i915_create_fence_timeline(struct drm_device *dev,
>> +                   struct i915_gem_context *ctx,
>> +                   struct intel_engine_cs *ring);
>> +
>>   /* This must match up with the value previously used for 
>> execbuf2.rsvd1. */
>>   #define DEFAULT_CONTEXT_HANDLE 0
>>
>> @@ -875,6 +888,7 @@ struct i915_gem_context {
>>           u64 lrc_desc;
>>           int pin_count;
>>           bool initialised;
>> +        struct i915_fence_timeline fence_timeline;
>>       } engine[I915_NUM_ENGINES];
>>
>>       struct list_head link;
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c 
>> b/drivers/gpu/drm/i915/i915_gem.c
>> index 5ffc6fa..57d3593 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2743,6 +2743,46 @@ void i915_gem_request_free(struct kref *req_ref)
>>       kmem_cache_free(req->i915->requests, req);
>>   }
>>
>> +int i915_create_fence_timeline(struct drm_device *dev,
>
> dev is not used in the function. Maybe it will be in later patches? In 
> which case I think dev_priv is the current bkm for i915 
> specific/internal code.
Again, it is left over from a previous implementation and could actually 
be removed.

>
>> +                   struct i915_gem_context *ctx,
>> +                   struct intel_engine_cs *engine)
>> +{
>> +    struct i915_fence_timeline *timeline;
>> +
>> +    timeline = &ctx->engine[engine->id].fence_timeline;
>> +
>> +    if (timeline->engine)
>> +        return 0;
>
> Is this an expected case? Unless I am missing something it shouldn't 
> be so maybe a WARN_ON would be warranted?
No. Will make it a WARN instead.

>
>> +
>> +    timeline->fence_context = fence_context_alloc(1);
>> +
>> +    /*
>> +     * Start the timeline from seqno 0 as this is a special value
>> +     * that is reserved for invalid sync points.
>> +     */
>
> Comment and init to 1 below look in disagreement. Maybe comment should 
> be something like "Start the timeline from seqno 1 as 0 is a special 
> value.." ?
Hmm not sure what happened there. Will update the comment so it actually 
makes sense.

>
>> +    timeline->next       = 1;
>> +    timeline->ctx        = ctx;
>> +    timeline->engine     = engine;
>> +
>> +    snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
>> +         timeline->fence_context, engine->name, ctx->user_handle);
>> +
>
> For rings like "video enhancement ring" name is 22 chars on its own, 
> leaving only 9 for the two integers. If a lot of contexts or long 
> runtime I suppose that could overflow. It is a bit of a stretch but 
> perhaps 32 is not enough, maybe available space for the name should be 
> better defined as longest ring name (with a comment) plus maximum for 
> two integers.
>
> I think timeline->name is only for debug but still feels better to 
> make sure it will fit rather than truncate it.
Yes, it is only used for debug purposes (trace events and debugfs state 
dump). There is no 'maximum ring name length' value anywhere is there? 
And making it a dynamic allocation seems like excessive complication for 
a debug string. I could bump the static size up to 40 or maybe 64?

>
> And we should proably just shorten the "video enhancement ring" to 
> "vecs ring"...
I generally run with a local patch to convert all ring names to a TLA 
(RCS, BCS, etc.) just to make the debug output more readable.

>
>> +    return 0;
>> +}
>> +
>> +unsigned i915_fence_timeline_get_next_seqno(struct 
>> i915_fence_timeline *timeline)
>
> It is strange to add a public function in this patch which is even 
> unused especially since the following patch makes it private.
>
> Would it make more sense for it to be static straight away and maybe 
> even called from __i915_gem_request_alloc unconditionally so that the 
> patch does not add dead code?

It's the old question of how to split a series up into patches that 
don't do everything all at once but still make sense individually. I 
think it makes more sense to include the code in this prep patch as it 
is a rather fundamental part of the timeline code. However, as none of 
it is wired up yet it can't be made static. The next patch hooks up the 
timeline code into various places at which point it this function can 
become static because it is only used in the file. I could be done 
differently but this seemed like the most sensible option to me.

> Don't know, not terribly important but would perhaps look more logical 
> as a patch series.
>
>> +{
>> +    unsigned seqno;
>> +
>> +    seqno = timeline->next;
>> +
>> +    /* Reserve zero for invalid */
>> +    if (++timeline->next == 0)
>> +        timeline->next = 1;
>> +
>> +    return seqno;
>> +}
>> +
>>   static inline int
>>   __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>                struct i915_gem_context *ctx,
>> diff --git a/drivers/gpu/drm/i915/i915_gem_context.c 
>> b/drivers/gpu/drm/i915/i915_gem_context.c
>> index d0e7fc6..07d8c63 100644
>> --- a/drivers/gpu/drm/i915/i915_gem_context.c
>> +++ b/drivers/gpu/drm/i915/i915_gem_context.c
>> @@ -320,6 +320,22 @@ i915_gem_create_context(struct drm_device *dev,
>>       if (IS_ERR(ctx))
>>           return ctx;
>>
>> +    if (!i915.enable_execlists) {
>> +        struct intel_engine_cs *engine;
>> +
>> +        /* Create a per context timeline for fences */
>> +        for_each_engine(engine, to_i915(dev)) {
>> +            int ret = i915_create_fence_timeline(dev, ctx, engine);
>> +            if (ret) {
>> +                DRM_ERROR("Fence timeline creation failed for legacy 
>> %s: %p\n",
>> +                      engine->name, ctx);
>> +                idr_remove(&file_priv->context_idr, ctx->user_handle);
>> +                i915_gem_context_unreference(ctx);
>> +                return ERR_PTR(ret);
>> +            }
>> +        }
>> +    }
>> +
>>       if (USES_FULL_PPGTT(dev)) {
>>           struct i915_hw_ppgtt *ppgtt = i915_ppgtt_create(dev, 
>> file_priv);
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c 
>> b/drivers/gpu/drm/i915/intel_lrc.c
>> index 5c191a1..14bcfb7 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -2496,6 +2496,14 @@ static int 
>> execlists_context_deferred_alloc(struct i915_gem_context *ctx,
>>           goto error_ringbuf;
>>       }
>>
>> +    /* Create a per context timeline for fences */
>> +    ret = i915_create_fence_timeline(ctx->i915->dev, ctx, engine);
>> +    if (ret) {
>> +        DRM_ERROR("Fence timeline creation failed for engine %s, ctx 
>> %p\n",
>
> "engine %s" will log something like "engine render ring" which will be 
> weird.
>
> Also pointer to the context is not that interesting as DRM_ERROR I 
> think. ctxc->user_handle instead? Same in the legacy mode.
Will drop the 'engine' and add in user_handle.

>
>> +              engine->name, ctx);
>> +        goto error_ringbuf;
>> +    }
>> +
>>       ce->ringbuf = ringbuf;
>>       ce->state = ctx_obj;
>>       ce->initialised = engine->init_context == NULL;
>>
>
> So in summary just some minor things, otherwise it looks OK I think.
>
> Regards,
>
> Tvrtko
>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects
  2016-06-07 11:17   ` Maarten Lankhorst
@ 2016-06-09 17:22     ` John Harrison
  0 siblings, 0 replies; 26+ messages in thread
From: John Harrison @ 2016-06-09 17:22 UTC (permalink / raw)
  To: Maarten Lankhorst, Intel-GFX

On 07/06/2016 12:17, Maarten Lankhorst wrote:
> Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The purpose of this patch series is to convert the requst structure to
>> use fence objects for the underlying completion tracking. The fence
>> object requires a sequence number. The ultimate aim is to use the same
>> sequence number as for the request itself (or rather, to remove the
>> request's seqno field and just use the fence's value throughout the
>> driver). However, this is not currently possible and so this patch
>> introduces a separate numbering scheme as an intermediate step.
>>
>> A major advantage of using the fence object is that it can be passed
>> outside of the i915 driver and used externally. The fence API allows
>> for various operations such as combining multiple fences. This
>> requires that fence seqnos within a single fence context be guaranteed
>> in-order. The GPU scheduler that is coming can re-order request
>> execution but not within a single GPU context. Thus the fence context
>> must be tied to the i915 context (and the engine within the context as
>> each engine runs asynchronously).
>>
>> On the other hand, the driver as a whole currently only works with
>> request seqnos that are allocated from a global in-order timeline. It
>> will require a fair chunk of re-work to allow multiple independent
>> seqno timelines to be used. Hence the introduction of a temporary,
>> fence specific timeline. Once the work to update the rest of the
>> driver has been completed then the request can use the fence seqno
>> instead.
>>
>> v2: New patch in series.
>>
>> v3: Renamed/retyped timeline structure fields after review comments by
>> Tvrtko Ursulin.
>>
>> Added context information to the timeline's name string for better
>> identification in debugfs output.
>>
>> v5: Line wrapping and other white space fixes to keep style checker
>> happy.
>>
>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>
>> v8: Moved to earlier in patch series so no longer needs to remove the
>> quick hack timeline that was being added before.
>>
>> v9: Updated to another newer nightly (changes to context structure
>> naming). Also updated commit message to match previous changes.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>   drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_context.c | 16 +++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>   4 files changed, 78 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index 2a88a46..a5f8ad8 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -831,6 +831,19 @@ struct i915_ctx_hang_stats {
>>   	bool banned;
>>   };
>>   
>> +struct i915_fence_timeline {
>> +	char        name[32];
>> +	unsigned    fence_context;
> Should be a u64 now, since commit 76bf0db5543976ef50362db7071da367cb118532
Yeah, that's newer than the tree these patches are based on. Will update 
and rebase...

>> +	unsigned    next;
>> +
>> +	struct i915_gem_context *ctx;
>> +	struct intel_engine_cs *engine;
>> +};
>> +
>> +int i915_create_fence_timeline(struct drm_device *dev,
>> +			       struct i915_gem_context *ctx,
>> +			       struct intel_engine_cs *ring);
>> +
>>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>>   #define DEFAULT_CONTEXT_HANDLE 0
>>   
>> @@ -875,6 +888,7 @@ struct i915_gem_context {
>>   		u64 lrc_desc;
>>   		int pin_count;
>>   		bool initialised;
>> +		struct i915_fence_timeline fence_timeline;
>>   	} engine[I915_NUM_ENGINES];
>>   
>>   	struct list_head link;
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index 5ffc6fa..57d3593 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -2743,6 +2743,46 @@ void i915_gem_request_free(struct kref *req_ref)
>>   	kmem_cache_free(req->i915->requests, req);
>>   }
>>   
>> +int i915_create_fence_timeline(struct drm_device *dev,
>> +			       struct i915_gem_context *ctx,
>> +			       struct intel_engine_cs *engine)
>> +{
>> +	struct i915_fence_timeline *timeline;
>> +
>> +	timeline = &ctx->engine[engine->id].fence_timeline;
>> +
>> +	if (timeline->engine)
>> +		return 0;
> Do you ever expect a reinit?
No. Will change to a WARN_ON as per Tvrtko's comment.

>> +	timeline->fence_context = fence_context_alloc(1);
>> +
>> +	/*
>> +	 * Start the timeline from seqno 0 as this is a special value
>> +	 * that is reserved for invalid sync points.
>> +	 */
>> +	timeline->next       = 1;
>> +	timeline->ctx        = ctx;
>> +	timeline->engine     = engine;
>> +
>> +	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
>> +		 timeline->fence_context, engine->name, ctx->user_handle);
>> +
>> +	return 0;
>> +}
>> +
> On top of the other comments, you might want to add a TODO comment that there should be only one timeline for each context,
> with each engine having only a unique fence->context.
That is not the plan. This patch implements the safest possible option 
of a timeline per engine context. Chris's idea was that the timeline 
should be per VM (struct i915_address_space) instead. Although to me 
that sounds like it would cause problems with requests of multiple 
contexts within a single VM being reordered by the scheduler. Thus 
causing out of order completion on a single timeline. So for the moment, 
I am leaving it as is.

>
> ~Maarten

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
  2016-06-07 11:42     ` Maarten Lankhorst
  2016-06-07 12:11       ` Tvrtko Ursulin
@ 2016-06-10 11:26       ` John Harrison
  2016-06-13 10:16         ` Maarten Lankhorst
  1 sibling, 1 reply; 26+ messages in thread
From: John Harrison @ 2016-06-10 11:26 UTC (permalink / raw)
  To: Maarten Lankhorst, Tvrtko Ursulin, Intel-GFX

On 07/06/2016 12:42, Maarten Lankhorst wrote:
> Op 02-06-16 om 13:07 schreef Tvrtko Ursulin:
>> On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> There is a construct in the linux kernel called 'struct fence' that is
>>> intended to keep track of work that is executed on hardware. I.e. it
>>> solves the basic problem that the drivers 'struct
>>> drm_i915_gem_request' is trying to address. The request structure does
>>> quite a lot more than simply track the execution progress so is very
>>> definitely still required. However, the basic completion status side
>>> could be updated to use the ready made fence implementation and gain
>>> all the advantages that provides.
>>>
>>> This patch makes the first step of integrating a struct fence into the
>>> request. It replaces the explicit reference count with that of the
>>> fence. It also replaces the 'is completed' test with the fence's
>>> equivalent. Currently, that simply chains on to the original request
>>> implementation. A future patch will improve this.
>>>
>>> v3: Updated after review comments by Tvrtko Ursulin. Added fence
>>> context/seqno pair to the debugfs request info. Renamed fence 'driver
>>> name' to just 'i915'. Removed BUG_ONs.
>>>
>>> v5: Changed seqno format in debugfs to %x rather than %u as that is
>>> apparently the preferred appearance. Line wrapped some long lines to
>>> keep the style checker happy.
>>>
>>> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
>>> was with the re-worked busy spin precursor to waiting on a request. In
>>> particular, the addition of a 'request_started' helper function. This
>>> has no corresponding concept within the fence framework. However, it
>>> is only ever used in one place and the whole point of that place is to
>>> always directly read the seqno for absolutely lowest latency possible.
>>> So the simple solution is to just make the seqno test explicit at that
>>> point now rather than later in the series (it was previously being
>>> done anyway when fences become interrupt driven).
>>>
>>> v7: Rebased to newer nightly - lots of ring -> engine renaming and
>>> interface change to get_seqno().
>>>
>>> v8: Rebased to newer nightly - no longer needs to worry about mutex
>>> locking in the request free code path. Moved to after fence timeline
>>> patch so no longer needs to add a horrid hack timeline.
>>>
>>> Removed commented out code block. Added support for possible RCU usage
>>> of fence object (Review comments by Maarten Lankhorst).
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>> Was it an r-b or an ack from Jesse? If the former does it need a "(v?)" suffix, depending on the amount of code changes after his r-b?
Going back through the old emails it looks like you are right, it was 
actually an ack on v3. What is the official tag for that?

>>
>>> ---
>>>    drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
>>>    drivers/gpu/drm/i915/i915_drv.h         |  43 +++++---------
>>>    drivers/gpu/drm/i915/i915_gem.c         | 101 +++++++++++++++++++++++++++++---
>>>    drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>>>    6 files changed, 115 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>> index ac7e569..844cc4b 100644
>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>> @@ -767,11 +767,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>>>                task = NULL;
>>>                if (req->pid)
>>>                    task = pid_task(req->pid, PIDTYPE_PID);
>>> -            seq_printf(m, "    %x @ %d: %s [%d]\n",
>>> +            seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
>> In the previous patch fence context and seqno were %d in the timeline->name so it would probably be more consistent.
It is trying to be consistent with the surroundings. Requests used to be 
printed as %d but for some reason got changed to be %x recently. Whereas 
the fence debug output is all %d still. Currently, the request::seqno 
and fence::seqno are different so maybe it doesn't really matter that 
they are printed differently but the ultimate aim is to merge the two 
into a single value. At which point all i915 code would be showing the 
value in one format but all fence code in another.

>>
>>>                       req->seqno,
>>>                       (int) (jiffies - req->emitted_jiffies),
>>>                       task ? task->comm : "<unknown>",
>>> -                   task ? task->pid : -1);
>>> +                   task ? task->pid : -1,
>>> +                   req->fence.context, req->fence.seqno);
> req->fence.context is 64-bits, will probably cause a compiler warning.
Not at the point the above was written. Have rebased to the newer tree 
and updated to u64 / %llx.

>>>                rcu_read_unlock();
>>>            }
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index a5f8ad8..905feae 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -42,6 +42,7 @@
>>>    #include <linux/kref.h>
>>>    #include <linux/pm_qos.h>
>>>    #include <linux/shmem_fs.h>
>>> +#include <linux/fence.h>
>>>
>>>    #include <drm/drmP.h>
>>>    #include <drm/intel-gtt.h>
>>> @@ -2353,7 +2354,11 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
>>>     * initial reference taken using kref_init
>>>     */
>>>    struct drm_i915_gem_request {
>>> -    struct kref ref;
>>> +    /**
>>> +     * Underlying object for implementing the signal/wait stuff.
>>> +     */
>>> +    struct fence fence;
>>> +    struct rcu_head rcu_head;
> fenece.rcu can be used, no need to duplicate. :)
Is that true? Does it not matter if someone else is already using the 
one in the fence structure for some other operation? Or is that one 
specifically there for the creator (us) to use and so guaranteed not to 
be used elsewhere?

>>>        /** On Which ring this request was generated */
>>>        struct drm_i915_private *i915;
>>> @@ -2455,7 +2460,13 @@ struct drm_i915_gem_request {
>>>    struct drm_i915_gem_request * __must_check
>>>    i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>                   struct i915_gem_context *ctx);
>>> -void i915_gem_request_free(struct kref *req_ref);
>>> +
>>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>> +                          bool lazy_coherency)
>>> +{
>>> +    return fence_is_signaled(&req->fence);
>>> +}
>> I would squash the following patch into this one, it makes no sense to keep a function with an unused parameter. And fewer patches in the series makes it less scary to review. :) Of course if they are also not too big. :D
> It's easier to read with all the function parameter changes in a separate patch.
Yeah, the guidance from on high has been that such things should be in 
separate patches.

>>> +
>>>    int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>>                       struct drm_file *file);
>>>
>>> @@ -2475,14 +2486,14 @@ static inline struct drm_i915_gem_request *
>>>    i915_gem_request_reference(struct drm_i915_gem_request *req)
>>>    {
>>>        if (req)
>>> -        kref_get(&req->ref);
>>> +        fence_get(&req->fence);
>>>        return req;
>>>    }
>>>
>>>    static inline void
>>>    i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>>    {
>>> -    kref_put(&req->ref, i915_gem_request_free);
>>> +    fence_put(&req->fence);
>>>    }
>>>
>>>    static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>> @@ -2498,12 +2509,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>>    }
>>>
>>>    /*
>>> - * XXX: i915_gem_request_completed should be here but currently needs the
>>> - * definition of i915_seqno_passed() which is below. It will be moved in
>>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>>> - */
>>> -
>>> -/*
>>>     * A command that requires special handling by the command parser.
>>>     */
>>>    struct drm_i915_cmd_descriptor {
>>> @@ -3211,24 +3216,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>>        return (int32_t)(seq1 - seq2) >= 0;
>>>    }
>>>
>>> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
>>> -                       bool lazy_coherency)
>>> -{
>>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>>> -        req->engine->irq_seqno_barrier(req->engine);
>>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>>> -                 req->previous_seqno);
>>> -}
>>> -
>>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>> -                          bool lazy_coherency)
>>> -{
>>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>>> -        req->engine->irq_seqno_barrier(req->engine);
>>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>>> -                 req->seqno);
>>> -}
>>> -
>>>    int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
>>>    int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 57d3593..b67fd7c 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>    {
>>>        unsigned long timeout;
>>>        unsigned cpu;
>>> +    uint32_t seqno;
>>>
>>>        /* When waiting for high frequency requests, e.g. during synchronous
>>>         * rendering split between the CPU and GPU, the finite amount of time
>>> @@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>            return -EBUSY;
>>>
>>>        /* Only spin if we know the GPU is processing this request */
>>> -    if (!i915_gem_request_started(req, true))
>>> +    seqno = req->engine->get_seqno(req->engine);
>>> +    if (!i915_seqno_passed(seqno, req->previous_seqno))
>>>            return -EAGAIN;
>>>
>>>        timeout = local_clock_us(&cpu) + 5;
>>>        while (!need_resched()) {
>>> -        if (i915_gem_request_completed(req, true))
>>> +        seqno = req->engine->get_seqno(req->engine);
>>> +        if (i915_seqno_passed(seqno, req->seqno))
>>>                return 0;
>>>
>>>            if (signal_pending_state(state, current))
>>> @@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>            cpu_relax_lowlatency();
>>>        }
>>>
>>> -    if (i915_gem_request_completed(req, false))
>>> +    if (req->engine->irq_seqno_barrier)
>>> +        req->engine->irq_seqno_barrier(req->engine);
>>> +    seqno = req->engine->get_seqno(req->engine);
>>> +    if (i915_seqno_passed(seqno, req->seqno))
>>>            return 0;
>>>
>>>        return -EAGAIN;
>>> @@ -2736,13 +2742,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>>        }
>>>    }
>>>
>>> -void i915_gem_request_free(struct kref *req_ref)
>>> +static void i915_gem_request_free_rcu(struct rcu_head *head)
>>>    {
>>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>>> -                         typeof(*req), ref);
>>> +    struct drm_i915_gem_request *req;
>>> +
>>> +    req = container_of(head, typeof(*req), rcu_head);
>>>        kmem_cache_free(req->i915->requests, req);
>>>    }
>>>
>>> +static void i915_gem_request_free(struct fence *req_fence)
>>> +{
>>> +    struct drm_i915_gem_request *req;
>>> +
>>> +    req = container_of(req_fence, typeof(*req), fence);
>>> +    call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>>> +}
>>> +
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +{
>>> +    /* Interrupt driven fences are not implemented yet.*/
>>> +    WARN(true, "This should not be called!");
>>> +    return true;
>>> +}
>>> +
>>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>>> +{
>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>> +                         typeof(*req), fence);
>>> +    u32 seqno;
>>> +
>>> +    seqno = req->engine->get_seqno(req->engine);
>>> +
>>> +    return i915_seqno_passed(seqno, req->seqno);
>>> +}
>>> +
>>> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>>> +{
>>> +    return "i915";
>>> +}
>>> +
>>> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>>> +{
>>> +    struct drm_i915_gem_request *req;
>>> +    struct i915_fence_timeline *timeline;
>>> +
>>> +    req = container_of(req_fence, typeof(*req), fence);
>>> +    timeline = &req->ctx->engine[req->engine->id].fence_timeline;
>>> +
>>> +    return timeline->name;
>>> +}
>>> +
>>> +static void i915_gem_request_timeline_value_str(struct fence *req_fence,
>>> +                        char *str, int size)
>>> +{
>>> +    struct drm_i915_gem_request *req;
>>> +
>>> +    req = container_of(req_fence, typeof(*req), fence);
>>> +
>>> +    /* Last signalled timeline value ??? */
>>> +    snprintf(str, size, "? [%d]"/*, timeline->value*/,
>> Reference to timeline->value a leftover from the past?
>>
>> Is the string format defined by the API? Asking because "? [%d]" looks intriguing.
It is basically just a debug string so can say anything we want. 
Convention is that it tells you where the timeline is up to. 
Unfortunately, there is no actual way to know that at present. The 
original Android implementation had the fence notification done via the 
timeline so engine->get_seqno() would be equivalent to 
timeline->current_value. However, all of that automatic update got 
removed with the switch to the 'official' struct fence instead of the 
Android only one. So now it is up to the timeline implementer do things 
how they see fit. And right now, keeping the timeline updated is 
unnecessary extra complication and overhead. When fence::seqno and 
request::seqno are unified then get_seqno() will be sufficient. Until 
then, it is just a pseudo value - hence the '? [%d]'. I have updated the 
comment in the code to explain it a bit better.


>>
>>> +         req->engine->get_seqno(req->engine));
>>> +}
>>> +
>>> +static void i915_gem_request_fence_value_str(struct fence *req_fence,
>>> +                         char *str, int size)
>>> +{
>>> +    struct drm_i915_gem_request *req;
>>> +
>>> +    req = container_of(req_fence, typeof(*req), fence);
>>> +
>>> +    snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
>> Is it OK to put req->seqno in this one? OR it is just for debug anyway so it helps us and fence framework does not care?
> I think this is used for getting all info from debugfs only, so req->seqno is fine.
Yes, it is just for debugfs and trace output. And until the two values 
are unified, it is really useful to have them both present.

>>> +}
>>> +
>>> +static const struct fence_ops i915_gem_request_fops = {
>>> +    .enable_signaling    = i915_gem_request_enable_signaling,
>>> +    .signaled        = i915_gem_request_is_completed,
>>> +    .wait            = fence_default_wait,
>>> +    .release        = i915_gem_request_free,
>>> +    .get_driver_name    = i915_gem_request_get_driver_name,
>>> +    .get_timeline_name    = i915_gem_request_get_timeline_name,
>>> +    .fence_value_str    = i915_gem_request_fence_value_str,
>>> +    .timeline_value_str    = i915_gem_request_timeline_value_str,
>>> +};
>>> +
>>>    int i915_create_fence_timeline(struct drm_device *dev,
>>>                       struct i915_gem_context *ctx,
>>>                       struct intel_engine_cs *engine)
>>> @@ -2770,7 +2852,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
>>>        return 0;
>>>    }
>>>
>>> -unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>>> +static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>>>    {
>>>        unsigned seqno;
>>>
>>> @@ -2814,13 +2896,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>        if (ret)
>>>            goto err;
>>>
>>> -    kref_init(&req->ref);
>>>        req->i915 = dev_priv;
>>>        req->engine = engine;
>>>        req->reset_counter = reset_counter;
>>>        req->ctx  = ctx;
>>>        i915_gem_context_reference(req->ctx);
>>>
>>> +    fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>>> +           ctx->engine[engine->id].fence_timeline.fence_context,
>>> +           i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>>> +
>>>        /*
>>>         * Reserve space in the ring buffer for all the commands required to
>>>         * eventually emit this request. This is to guarantee that the
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index 14bcfb7..f126bcb 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -2030,6 +2030,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>>        INIT_LIST_HEAD(&engine->buffers);
>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>        spin_lock_init(&engine->execlist_lock);
>>> +    spin_lock_init(&engine->fence_lock);
>>>
>>>        tasklet_init(&engine->irq_tasklet,
>>>                 intel_lrc_irq_handler, (unsigned long)engine);
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index 8d35a39..fbd3f12 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>>        INIT_LIST_HEAD(&engine->request_list);
>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>        INIT_LIST_HEAD(&engine->buffers);
>>> +    spin_lock_init(&engine->fence_lock);
>>>        i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>>        memset(engine->semaphore.sync_seqno, 0,
>>>               sizeof(engine->semaphore.sync_seqno));
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> index b33c876..3f39daf 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> @@ -345,6 +345,8 @@ struct intel_engine_cs {
>>>         * to encode the command length in the header).
>>>         */
>>>        u32 (*get_cmd_length_mask)(u32 cmd_header);
>>> +
>>> +    spinlock_t fence_lock;
>> Why is this lock per-engine, and not for example per timeline? Aren't fencees living completely isolated in their per-context-per-engine domains? So presumably there is something somewhere which is shared outside that domain to need a lock at the engine level?
> All outstanding requests are added to engine->fence_signal_list in patch 4, which means a per engine lock is required.

 > Okay, a comment is required here to describe the lock then. All what it
 > protects and how and when it needs to be taken. Both from the i915
 > point of view and from the fence API side.

Will add a comment to say that the lock is used for the signal list as 
well as the fence itself.


>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 2/6] drm/i915: Convert requests to use struct fence
  2016-06-10 11:26       ` John Harrison
@ 2016-06-13 10:16         ` Maarten Lankhorst
  0 siblings, 0 replies; 26+ messages in thread
From: Maarten Lankhorst @ 2016-06-13 10:16 UTC (permalink / raw)
  To: John Harrison, Tvrtko Ursulin, Intel-GFX

Op 10-06-16 om 13:26 schreef John Harrison:
> On 07/06/2016 12:42, Maarten Lankhorst wrote:
>> Op 02-06-16 om 13:07 schreef Tvrtko Ursulin:
>>> On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> There is a construct in the linux kernel called 'struct fence' that is
>>>> intended to keep track of work that is executed on hardware. I.e. it
>>>> solves the basic problem that the drivers 'struct
>>>> drm_i915_gem_request' is trying to address. The request structure does
>>>> quite a lot more than simply track the execution progress so is very
>>>> definitely still required. However, the basic completion status side
>>>> could be updated to use the ready made fence implementation and gain
>>>> all the advantages that provides.
>>>>
>>>> This patch makes the first step of integrating a struct fence into the
>>>> request. It replaces the explicit reference count with that of the
>>>> fence. It also replaces the 'is completed' test with the fence's
>>>> equivalent. Currently, that simply chains on to the original request
>>>> implementation. A future patch will improve this.
>>>>
>>>> v3: Updated after review comments by Tvrtko Ursulin. Added fence
>>>> context/seqno pair to the debugfs request info. Renamed fence 'driver
>>>> name' to just 'i915'. Removed BUG_ONs.
>>>>
>>>> v5: Changed seqno format in debugfs to %x rather than %u as that is
>>>> apparently the preferred appearance. Line wrapped some long lines to
>>>> keep the style checker happy.
>>>>
>>>> v6: Updated to newer nigthly and resolved conflicts. The biggest issue
>>>> was with the re-worked busy spin precursor to waiting on a request. In
>>>> particular, the addition of a 'request_started' helper function. This
>>>> has no corresponding concept within the fence framework. However, it
>>>> is only ever used in one place and the whole point of that place is to
>>>> always directly read the seqno for absolutely lowest latency possible.
>>>> So the simple solution is to just make the seqno test explicit at that
>>>> point now rather than later in the series (it was previously being
>>>> done anyway when fences become interrupt driven).
>>>>
>>>> v7: Rebased to newer nightly - lots of ring -> engine renaming and
>>>> interface change to get_seqno().
>>>>
>>>> v8: Rebased to newer nightly - no longer needs to worry about mutex
>>>> locking in the request free code path. Moved to after fence timeline
>>>> patch so no longer needs to add a horrid hack timeline.
>>>>
>>>> Removed commented out code block. Added support for possible RCU usage
>>>> of fence object (Review comments by Maarten Lankhorst).
>>>>
>>>> For: VIZ-5190
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
>>> Was it an r-b or an ack from Jesse? If the former does it need a "(v?)" suffix, depending on the amount of code changes after his r-b?
> Going back through the old emails it looks like you are right, it was actually an ack on v3. What is the official tag for that?
>
>>>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
>>>>    drivers/gpu/drm/i915/i915_drv.h         |  43 +++++---------
>>>>    drivers/gpu/drm/i915/i915_gem.c         | 101 +++++++++++++++++++++++++++++---
>>>>    drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>>>>    drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>>>>    6 files changed, 115 insertions(+), 38 deletions(-)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
>>>> index ac7e569..844cc4b 100644
>>>> --- a/drivers/gpu/drm/i915/i915_debugfs.c
>>>> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
>>>> @@ -767,11 +767,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
>>>>                task = NULL;
>>>>                if (req->pid)
>>>>                    task = pid_task(req->pid, PIDTYPE_PID);
>>>> -            seq_printf(m, "    %x @ %d: %s [%d]\n",
>>>> +            seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
>>> In the previous patch fence context and seqno were %d in the timeline->name so it would probably be more consistent.
> It is trying to be consistent with the surroundings. Requests used to be printed as %d but for some reason got changed to be %x recently. Whereas the fence debug output is all %d still. Currently, the request::seqno and fence::seqno are different so maybe it doesn't really matter that they are printed differently but the ultimate aim is to merge the two into a single value. At which point all i915 code would be showing the value in one format but all fence code in another.
>
>>>
>>>>                       req->seqno,
>>>>                       (int) (jiffies - req->emitted_jiffies),
>>>>                       task ? task->comm : "<unknown>",
>>>> -                   task ? task->pid : -1);
>>>> +                   task ? task->pid : -1,
>>>> +                   req->fence.context, req->fence.seqno);
>> req->fence.context is 64-bits, will probably cause a compiler warning.
> Not at the point the above was written. Have rebased to the newer tree and updated to u64 / %llx.
>
>>>>                rcu_read_unlock();
>>>>            }
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>> index a5f8ad8..905feae 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -42,6 +42,7 @@
>>>>    #include <linux/kref.h>
>>>>    #include <linux/pm_qos.h>
>>>>    #include <linux/shmem_fs.h>
>>>> +#include <linux/fence.h>
>>>>
>>>>    #include <drm/drmP.h>
>>>>    #include <drm/intel-gtt.h>
>>>> @@ -2353,7 +2354,11 @@ static inline struct scatterlist *__sg_next(struct scatterlist *sg)
>>>>     * initial reference taken using kref_init
>>>>     */
>>>>    struct drm_i915_gem_request {
>>>> -    struct kref ref;
>>>> +    /**
>>>> +     * Underlying object for implementing the signal/wait stuff.
>>>> +     */
>>>> +    struct fence fence;
>>>> +    struct rcu_head rcu_head;
>> fenece.rcu can be used, no need to duplicate. :)
> Is that true? Does it not matter if someone else is already using the one in the fence structure for some other operation? Or is that one specifically there for the creator (us) to use and so guaranteed not to be used elsewhere?
Yes, it's unused when you implement your own ->release function and don't call fence_free.
>>>>        /** On Which ring this request was generated */
>>>>        struct drm_i915_private *i915;
>>>> @@ -2455,7 +2460,13 @@ struct drm_i915_gem_request {
>>>>    struct drm_i915_gem_request * __must_check
>>>>    i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>>                   struct i915_gem_context *ctx);
>>>> -void i915_gem_request_free(struct kref *req_ref);
>>>> +
>>>> +static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>>> +                          bool lazy_coherency)
>>>> +{
>>>> +    return fence_is_signaled(&req->fence);
>>>> +}
>>> I would squash the following patch into this one, it makes no sense to keep a function with an unused parameter. And fewer patches in the series makes it less scary to review. :) Of course if they are also not too big. :D
>> It's easier to read with all the function parameter changes in a separate patch.
> Yeah, the guidance from on high has been that such things should be in separate patches.
>
>>>> +
>>>>    int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
>>>>                       struct drm_file *file);
>>>>
>>>> @@ -2475,14 +2486,14 @@ static inline struct drm_i915_gem_request *
>>>>    i915_gem_request_reference(struct drm_i915_gem_request *req)
>>>>    {
>>>>        if (req)
>>>> -        kref_get(&req->ref);
>>>> +        fence_get(&req->fence);
>>>>        return req;
>>>>    }
>>>>
>>>>    static inline void
>>>>    i915_gem_request_unreference(struct drm_i915_gem_request *req)
>>>>    {
>>>> -    kref_put(&req->ref, i915_gem_request_free);
>>>> +    fence_put(&req->fence);
>>>>    }
>>>>
>>>>    static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>>> @@ -2498,12 +2509,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
>>>>    }
>>>>
>>>>    /*
>>>> - * XXX: i915_gem_request_completed should be here but currently needs the
>>>> - * definition of i915_seqno_passed() which is below. It will be moved in
>>>> - * a later patch when the call to i915_seqno_passed() is obsoleted...
>>>> - */
>>>> -
>>>> -/*
>>>>     * A command that requires special handling by the command parser.
>>>>     */
>>>>    struct drm_i915_cmd_descriptor {
>>>> @@ -3211,24 +3216,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
>>>>        return (int32_t)(seq1 - seq2) >= 0;
>>>>    }
>>>>
>>>> -static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
>>>> -                       bool lazy_coherency)
>>>> -{
>>>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>>>> -        req->engine->irq_seqno_barrier(req->engine);
>>>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>>>> -                 req->previous_seqno);
>>>> -}
>>>> -
>>>> -static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
>>>> -                          bool lazy_coherency)
>>>> -{
>>>> -    if (!lazy_coherency && req->engine->irq_seqno_barrier)
>>>> -        req->engine->irq_seqno_barrier(req->engine);
>>>> -    return i915_seqno_passed(req->engine->get_seqno(req->engine),
>>>> -                 req->seqno);
>>>> -}
>>>> -
>>>>    int __must_check i915_gem_get_seqno(struct drm_i915_private *dev_priv, u32 *seqno);
>>>>    int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>>> index 57d3593..b67fd7c 100644
>>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>>> @@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>>    {
>>>>        unsigned long timeout;
>>>>        unsigned cpu;
>>>> +    uint32_t seqno;
>>>>
>>>>        /* When waiting for high frequency requests, e.g. during synchronous
>>>>         * rendering split between the CPU and GPU, the finite amount of time
>>>> @@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>>            return -EBUSY;
>>>>
>>>>        /* Only spin if we know the GPU is processing this request */
>>>> -    if (!i915_gem_request_started(req, true))
>>>> +    seqno = req->engine->get_seqno(req->engine);
>>>> +    if (!i915_seqno_passed(seqno, req->previous_seqno))
>>>>            return -EAGAIN;
>>>>
>>>>        timeout = local_clock_us(&cpu) + 5;
>>>>        while (!need_resched()) {
>>>> -        if (i915_gem_request_completed(req, true))
>>>> +        seqno = req->engine->get_seqno(req->engine);
>>>> +        if (i915_seqno_passed(seqno, req->seqno))
>>>>                return 0;
>>>>
>>>>            if (signal_pending_state(state, current))
>>>> @@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
>>>>            cpu_relax_lowlatency();
>>>>        }
>>>>
>>>> -    if (i915_gem_request_completed(req, false))
>>>> +    if (req->engine->irq_seqno_barrier)
>>>> +        req->engine->irq_seqno_barrier(req->engine);
>>>> +    seqno = req->engine->get_seqno(req->engine);
>>>> +    if (i915_seqno_passed(seqno, req->seqno))
>>>>            return 0;
>>>>
>>>>        return -EAGAIN;
>>>> @@ -2736,13 +2742,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
>>>>        }
>>>>    }
>>>>
>>>> -void i915_gem_request_free(struct kref *req_ref)
>>>> +static void i915_gem_request_free_rcu(struct rcu_head *head)
>>>>    {
>>>> -    struct drm_i915_gem_request *req = container_of(req_ref,
>>>> -                         typeof(*req), ref);
>>>> +    struct drm_i915_gem_request *req;
>>>> +
>>>> +    req = container_of(head, typeof(*req), rcu_head);
>>>>        kmem_cache_free(req->i915->requests, req);
>>>>    }
>>>>
>>>> +static void i915_gem_request_free(struct fence *req_fence)
>>>> +{
>>>> +    struct drm_i915_gem_request *req;
>>>> +
>>>> +    req = container_of(req_fence, typeof(*req), fence);
>>>> +    call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>>>> +}
>>>> +
>>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>>> +{
>>>> +    /* Interrupt driven fences are not implemented yet.*/
>>>> +    WARN(true, "This should not be called!");
>>>> +    return true;
>>>> +}
>>>> +
>>>> +static bool i915_gem_request_is_completed(struct fence *req_fence)
>>>> +{
>>>> +    struct drm_i915_gem_request *req = container_of(req_fence,
>>>> +                         typeof(*req), fence);
>>>> +    u32 seqno;
>>>> +
>>>> +    seqno = req->engine->get_seqno(req->engine);
>>>> +
>>>> +    return i915_seqno_passed(seqno, req->seqno);
>>>> +}
>>>> +
>>>> +static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>>>> +{
>>>> +    return "i915";
>>>> +}
>>>> +
>>>> +static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
>>>> +{
>>>> +    struct drm_i915_gem_request *req;
>>>> +    struct i915_fence_timeline *timeline;
>>>> +
>>>> +    req = container_of(req_fence, typeof(*req), fence);
>>>> +    timeline = &req->ctx->engine[req->engine->id].fence_timeline;
>>>> +
>>>> +    return timeline->name;
>>>> +}
>>>> +
>>>> +static void i915_gem_request_timeline_value_str(struct fence *req_fence,
>>>> +                        char *str, int size)
>>>> +{
>>>> +    struct drm_i915_gem_request *req;
>>>> +
>>>> +    req = container_of(req_fence, typeof(*req), fence);
>>>> +
>>>> +    /* Last signalled timeline value ??? */
>>>> +    snprintf(str, size, "? [%d]"/*, timeline->value*/,
>>> Reference to timeline->value a leftover from the past?
>>>
>>> Is the string format defined by the API? Asking because "? [%d]" looks intriguing.
> It is basically just a debug string so can say anything we want. Convention is that it tells you where the timeline is up to. Unfortunately, there is no actual way to know that at present. The original Android implementation had the fence notification done via the timeline so engine->get_seqno() would be equivalent to timeline->current_value. However, all of that automatic update got removed with the switch to the 'official' struct fence instead of the Android only one. So now it is up to the timeline implementer do things how they see fit. And right now, keeping the timeline updated is unnecessary extra complication and overhead. When fence::seqno and request::seqno are unified then get_seqno() will be sufficient. Until then, it is just a pseudo value - hence the '? [%d]'. I have updated the comment in the code to explain it a bit better.
>
>
>>>
>>>> +         req->engine->get_seqno(req->engine));
>>>> +}
>>>> +
>>>> +static void i915_gem_request_fence_value_str(struct fence *req_fence,
>>>> +                         char *str, int size)
>>>> +{
>>>> +    struct drm_i915_gem_request *req;
>>>> +
>>>> +    req = container_of(req_fence, typeof(*req), fence);
>>>> +
>>>> +    snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
>>> Is it OK to put req->seqno in this one? OR it is just for debug anyway so it helps us and fence framework does not care?
>> I think this is used for getting all info from debugfs only, so req->seqno is fine.
> Yes, it is just for debugfs and trace output. And until the two values are unified, it is really useful to have them both present.
>
>>>> +}
>>>> +
>>>> +static const struct fence_ops i915_gem_request_fops = {
>>>> +    .enable_signaling    = i915_gem_request_enable_signaling,
>>>> +    .signaled        = i915_gem_request_is_completed,
>>>> +    .wait            = fence_default_wait,
>>>> +    .release        = i915_gem_request_free,
>>>> +    .get_driver_name    = i915_gem_request_get_driver_name,
>>>> +    .get_timeline_name    = i915_gem_request_get_timeline_name,
>>>> +    .fence_value_str    = i915_gem_request_fence_value_str,
>>>> +    .timeline_value_str    = i915_gem_request_timeline_value_str,
>>>> +};
>>>> +
>>>>    int i915_create_fence_timeline(struct drm_device *dev,
>>>>                       struct i915_gem_context *ctx,
>>>>                       struct intel_engine_cs *engine)
>>>> @@ -2770,7 +2852,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
>>>>        return 0;
>>>>    }
>>>>
>>>> -unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>>>> +static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
>>>>    {
>>>>        unsigned seqno;
>>>>
>>>> @@ -2814,13 +2896,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>>        if (ret)
>>>>            goto err;
>>>>
>>>> -    kref_init(&req->ref);
>>>>        req->i915 = dev_priv;
>>>>        req->engine = engine;
>>>>        req->reset_counter = reset_counter;
>>>>        req->ctx  = ctx;
>>>>        i915_gem_context_reference(req->ctx);
>>>>
>>>> +    fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>>>> +           ctx->engine[engine->id].fence_timeline.fence_context,
>>>> +           i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>>>> +
>>>>        /*
>>>>         * Reserve space in the ring buffer for all the commands required to
>>>>         * eventually emit this request. This is to guarantee that the
>>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>>> index 14bcfb7..f126bcb 100644
>>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>>> @@ -2030,6 +2030,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>>>        INIT_LIST_HEAD(&engine->buffers);
>>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>>        spin_lock_init(&engine->execlist_lock);
>>>> +    spin_lock_init(&engine->fence_lock);
>>>>
>>>>        tasklet_init(&engine->irq_tasklet,
>>>>                 intel_lrc_irq_handler, (unsigned long)engine);
>>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>> index 8d35a39..fbd3f12 100644
>>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>>>        INIT_LIST_HEAD(&engine->request_list);
>>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>>        INIT_LIST_HEAD(&engine->buffers);
>>>> +    spin_lock_init(&engine->fence_lock);
>>>>        i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>>>        memset(engine->semaphore.sync_seqno, 0,
>>>>               sizeof(engine->semaphore.sync_seqno));
>>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>>> index b33c876..3f39daf 100644
>>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>>> @@ -345,6 +345,8 @@ struct intel_engine_cs {
>>>>         * to encode the command length in the header).
>>>>         */
>>>>        u32 (*get_cmd_length_mask)(u32 cmd_header);
>>>> +
>>>> +    spinlock_t fence_lock;
>>> Why is this lock per-engine, and not for example per timeline? Aren't fencees living completely isolated in their per-context-per-engine domains? So presumably there is something somewhere which is shared outside that domain to need a lock at the engine level?
>> All outstanding requests are added to engine->fence_signal_list in patch 4, which means a per engine lock is required.
>
> > Okay, a comment is required here to describe the lock then. All what it
> > protects and how and when it needs to be taken. Both from the i915
> > point of view and from the fence API side.
>
> Will add a comment to say that the lock is used for the signal list as well as the fence itself.
>
>
>>
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences
  2016-06-07 12:02     ` Maarten Lankhorst
  2016-06-07 12:19       ` Tvrtko Ursulin
@ 2016-06-13 15:51       ` John Harrison
  2016-06-14 11:35         ` Tvrtko Ursulin
  1 sibling, 1 reply; 26+ messages in thread
From: John Harrison @ 2016-06-13 15:51 UTC (permalink / raw)
  To: Maarten Lankhorst, Tvrtko Ursulin, Intel-GFX

On 07/06/2016 13:02, Maarten Lankhorst wrote:
> Op 02-06-16 om 15:25 schreef Tvrtko Ursulin:
>> On 01/06/16 18:07, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The intended usage model for struct fence is that the signalled status
>>> should be set on demand rather than polled. That is, there should not
>>> be a need for a 'signaled' function to be called everytime the status
>>> is queried. Instead, 'something' should be done to enable a signal
>>> callback from the hardware which will update the state directly. In
>>> the case of requests, this is the seqno update interrupt. The idea is
>>> that this callback will only be enabled on demand when something
>>> actually tries to wait on the fence.
>>>
>>> This change removes the polling test and replaces it with the callback
>>> scheme. Each fence is added to a 'please poke me' list at the start of
>>> i915_add_request(). The interrupt handler then scans through the 'poke
>>> me' list when a new seqno pops out and signals any matching
>>> fence/request. The fence is then removed from the list so the entire
>>> request stack does not need to be scanned every time. Note that the
>>> fence is added to the list before the commands to generate the seqno
>>> interrupt are added to the ring. Thus the sequence is guaranteed to be
>>> race free if the interrupt is already enabled.
>>>
>>> Note that the interrupt is only enabled on demand (i.e. when
>>> __wait_request() is called). Thus there is still a potential race when
>>> enabling the interrupt as the request may already have completed.
>>> However, this is simply solved by calling the interrupt processing
>>> code immediately after enabling the interrupt and thereby checking for
>>> already completed requests.
>>>
>>> Lastly, the ring clean up code has the possibility to cancel
>>> outstanding requests (e.g. because TDR has reset the ring). These
>>> requests will never get signalled and so must be removed from the
>>> signal list manually. This is done by setting a 'cancelled' flag and
>>> then calling the regular notify/retire code path rather than
>>> attempting to duplicate the list manipulatation and clean up code in
>>> multiple places. This also avoid any race condition where the
>>> cancellation request might occur after/during the completion interrupt
>>> actually arriving.
>>>
>>> v2: Updated to take advantage of the request unreference no longer
>>> requiring the mutex lock.
>>>
>>> v3: Move the signal list processing around to prevent unsubmitted
>>> requests being added to the list. This was occurring on Android
>>> because the native sync implementation calls the
>>> fence->enable_signalling API immediately on fence creation.
>>>
>>> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
>>> 'link' instead of 'list'. Added support for returning an error code on
>>> a cancelled fence. Update list processing to be more efficient/safer
>>> with respect to spinlocks.
>>>
>>> v5: Made i915_gem_request_submit a static as it is only ever called
>>> from one place.
>>>
>>> Fixed up the low latency wait optimisation. The time delay between the
>>> seqno value being to memory and the drive's ISR running can be
>>> significant, at least for the wait request micro-benchmark. This can
>>> be greatly improved by explicitly checking for seqno updates in the
>>> pre-wait busy poll loop. Also added some documentation comments to the
>>> busy poll code.
>>>
>>> Fixed up support for the faking of lost interrupts
>>> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
>>> tells the driver to loose interrupts deliberately and then check that
>>> everything still works as expected (albeit much slower).
>>>
>>> Updates from review comments: use non IRQ-save spinlocking, early exit
>>> on WARN and improved comments (Tvrtko Ursulin).
>>>
>>> v6: Updated to newer nigthly and resolved conflicts around the
>>> wait_request busy spin optimisation. Also fixed a race condition
>>> between this early exit path and the regular completion path.
>>>
>>> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
>>> interface change on get_seqno(). Also added a list_empty() check
>>> before acquring spinlocks and doing list processing.
>>>
>>> v8: Updated to newer nightly - changes to request clean up code mean
>>> non of the deferred free mess is needed any more.
>>>
>>> v9: Moved the request completion processing out of the interrupt
>>> handler and into a worker thread (Chris Wilson).
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> ---
>>>    drivers/gpu/drm/i915/i915_dma.c         |   9 +-
>>>    drivers/gpu/drm/i915/i915_drv.h         |  11 ++
>>>    drivers/gpu/drm/i915/i915_gem.c         | 248 +++++++++++++++++++++++++++++---
>>>    drivers/gpu/drm/i915/i915_irq.c         |   2 +
>>>    drivers/gpu/drm/i915/intel_lrc.c        |   5 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.c |   5 +
>>>    drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
>>>    7 files changed, 260 insertions(+), 23 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_dma.c b/drivers/gpu/drm/i915/i915_dma.c
>>> index 07edaed..f8f60bb 100644
>>> --- a/drivers/gpu/drm/i915/i915_dma.c
>>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>>>        if (dev_priv->wq == NULL)
>>>            goto out_err;
>>>
>>> +    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>>> +    if (dev_priv->req_wq == NULL)
>>> +        goto out_free_wq;
>>> +
>> Single (per-device) ordered workqueue will serialize interrupt processing across all engines to one thread. Together with the fact request worker does not seem to need the sleeping context, I am thinking that a tasklet per engine would be much better (see engine->irq_tasklet for an example).
Other conversations have stated that tasklets are not the best option. I 
did think about having a work queue per engine but that seemed 
excessive. Plus any subsequent work triggered by the fence completion 
will almost certainly require grabbing the driver mutex lock (because 
everything requires the mutex lock) so serialisation of engines doesn't 
sound like much of an issue.

>>
>>>        dev_priv->hotplug.dp_wq = alloc_ordered_workqueue("i915-dp", 0);
>>>        if (dev_priv->hotplug.dp_wq == NULL)
>>> -        goto out_free_wq;
>>> +        goto out_free_req_wq;
>>>
>>>        dev_priv->gpu_error.hangcheck_wq =
>>>            alloc_ordered_workqueue("i915-hangcheck", 0);
>>> @@ -1032,6 +1036,8 @@ static int i915_workqueues_init(struct drm_i915_private *dev_priv)
>>>
>>>    out_free_dp_wq:
>>>        destroy_workqueue(dev_priv->hotplug.dp_wq);
>>> +out_free_req_wq:
>>> +    destroy_workqueue(dev_priv->req_wq);
>>>    out_free_wq:
>>>        destroy_workqueue(dev_priv->wq);
>>>    out_err:
>>> @@ -1044,6 +1050,7 @@ static void i915_workqueues_cleanup(struct drm_i915_private *dev_priv)
>>>    {
>>>        destroy_workqueue(dev_priv->gpu_error.hangcheck_wq);
>>>        destroy_workqueue(dev_priv->hotplug.dp_wq);
>>> +    destroy_workqueue(dev_priv->req_wq);
>>>        destroy_workqueue(dev_priv->wq);
>>>    }
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 69c3412..5a7f256 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -1851,6 +1851,9 @@ struct drm_i915_private {
>>>         */
>>>        struct workqueue_struct *wq;
>>>
>>> +    /* Work queue for request completion processing */
>>> +    struct workqueue_struct *req_wq;
>>> +
>>>        /* Display functions */
>>>        struct drm_i915_display_funcs display;
>>>
>>> @@ -2359,6 +2362,10 @@ struct drm_i915_gem_request {
>>>         */
>>>        struct fence fence;
>>>        struct rcu_head rcu_head;
>>> +    struct list_head signal_link;
>>> +    bool cancelled;
>>> +    bool irq_enabled;
>>> +    bool signal_requested;
>>>
>>>        /** On Which ring this request was generated */
>>>        struct drm_i915_private *i915;
>>> @@ -2460,6 +2467,10 @@ struct drm_i915_gem_request {
>>>    struct drm_i915_gem_request * __must_check
>>>    i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>                   struct i915_gem_context *ctx);
>>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>>> +                       bool fence_locked);
>>> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>>> +void i915_gem_request_worker(struct work_struct *work);
>>>
>>>    static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>>>    {
>>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>>> index 97e3138..83cf9b0 100644
>>> --- a/drivers/gpu/drm/i915/i915_gem.c
>>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>>> @@ -39,6 +39,8 @@
>>>    #include <linux/pci.h>
>>>    #include <linux/dma-buf.h>
>>>
>>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
>>> +
>>>    static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>>>    static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>>>    static void
>>> @@ -1237,9 +1239,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>    {
>>>        struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>>>        struct drm_i915_private *dev_priv = req->i915;
>>> -    const bool irq_test_in_progress =
>>> -        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>>>        int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
>>> +    uint32_t seqno;
>>>        DEFINE_WAIT(wait);
>>>        unsigned long timeout_expire;
>>>        s64 before = 0; /* Only to silence a compiler warning. */
>>> @@ -1247,9 +1248,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>
>>>        WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>>>
>>> -    if (list_empty(&req->list))
>>> -        return 0;
>>> -
>>>        if (i915_gem_request_completed(req))
>>>            return 0;
>>>
>>> @@ -1275,15 +1273,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>        trace_i915_gem_request_wait_begin(req);
>>>
>>>        /* Optimistic spin for the next jiffie before touching IRQs */
>>> -    ret = __i915_spin_request(req, state);
>>> -    if (ret == 0)
>>> -        goto out;
>>> -
>>> -    if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
>>> -        ret = -ENODEV;
>>> -        goto out;
>>> +    if (req->seqno) {
>> This needs a comment I think because it is so unusual and new that req->seqno == 0 is a special path. To explain why and how it can happen here.
Hmm, I think that is left over from earlier re-org of the patches. 
Invalid seqnos only come in with the scheduler as it requires being able 
to dynamically change the seqno, e.g. due to pre-empting a request. 
Although that would disappear again with the move away from global 
seqnos to the per context/engine seqno of the fence. I'll drop the test 
and leave the spin unconditional.

>>
>>> +        ret = __i915_spin_request(req, state);
>>> +        if (ret == 0)
>>> +            goto out;
>>>        }
>>>
>>> +    /*
>>> +     * Enable interrupt completion of the request.
>>> +     */
>>> +    fence_enable_sw_signaling(&req->fence);
>>> +
>>>        for (;;) {
>>>            struct timer_list timer;
>>>
>>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>                break;
>>>            }
>>>
>>> +        if (req->seqno) {
>>> +            /*
>>> +             * There is quite a lot of latency in the user interrupt
>>> +             * path. So do an explicit seqno check and potentially
>>> +             * remove all that delay.
>>> +             */
>>> +            if (req->engine->irq_seqno_barrier)
>>> +                req->engine->irq_seqno_barrier(req->engine);
>>> +            seqno = engine->get_seqno(engine);
>>> +            if (i915_seqno_passed(seqno, req->seqno)) {
>>> +                ret = 0;
>>> +                break;
>>> +            }
>>> +        }
>>> +
>>>            if (signal_pending_state(state, current)) {
>>>                ret = -ERESTARTSYS;
>>>                break;
>>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>>>                destroy_timer_on_stack(&timer);
>>>            }
>>>        }
>>> -    if (!irq_test_in_progress)
>>> -        engine->irq_put(engine);
>>>
>>>        finish_wait(&engine->irq_queue, &wait);
>> Hm I don't understand why our custom waiting remains? Shouldn't fence_wait just be called after the optimistic spin, more or less?
That would solve the 'thundering problem' if we could. In theory, the 
entire wait function should just be a call to 'fence_wait(&req->fence)'. 
Unfortunately, the wait function goes to sleep holding the mutex lock 
and requires having a bail out option on the wait which is not currently 
part of the fence API. I have a work-in-progress patch that almost 
solves the issues but it isn't quite there yet (and I haven't had much 
chance to work on it for a while).

>>
>>>    out:
>>>        trace_i915_gem_request_wait_end(req);
>>>
>>> +    if ((ret == 0) && (req->seqno)) {
>>> +        if (req->engine->irq_seqno_barrier)
>>> +            req->engine->irq_seqno_barrier(req->engine);
>>> +        seqno = engine->get_seqno(engine);
>>> +        if (i915_seqno_passed(seqno, req->seqno) &&
>>> +            !i915_gem_request_completed(req)) {
>>> +            /*
>>> +             * Make sure the request is marked as completed before
>>> +             * returning. NB: Need to acquire the spinlock around
>>> +             * the whole call to avoid a race condition with the
>>> +             * interrupt handler is running concurrently and could
>>> +             * cause this invocation to early exit even though the
>>> +             * request has not actually been fully processed yet.
>>> +             */
>>> +            spin_lock_irq(&req->engine->fence_lock);
>>> +            i915_gem_request_notify(req->engine, true);
>>> +            spin_unlock_irq(&req->engine->fence_lock);
>>> +        }
>>> +    }
>>> +
>>>        if (timeout) {
>>>            s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>>
>>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>>    {
>>>        trace_i915_gem_request_retire(request);
>>>
>>> +    if (request->irq_enabled) {
>>> +        request->engine->irq_put(request->engine);
>>> +        request->irq_enabled = false;
>> What protects request->irq_enabled? Here versus enable_signalling bit? It can be called from the external fence users which would take the fence_lock, but here it does not.
The flag can only be set when enabling interrupt driven completion 
(which can only happen once and only if the fence is not already 
signalled). The flag can only be cleared when the fence is signalled or 
when the request is retired. And retire without signal can only happen 
if the request is being cancelled in some way (e.g. GPU reset) and thus 
will not ever be signalled. So if we get here then none of the other 
paths are possible anymore.

>>
>>> +    }
>>> +
>>>        /* We know the GPU must have read the request to have
>>>         * sent us the seqno + interrupt, so use the position
>>>         * of tail of the request to update the last known position
>>> @@ -1418,6 +1456,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>>        list_del_init(&request->list);
>>>        i915_gem_request_remove_from_client(request);
>>>
>>> +    /*
>>> +     * In case the request is still in the signal pending list,
>>> +     * e.g. due to being cancelled by TDR, preemption, etc.
>>> +     */
>>> +    if (!list_empty(&request->signal_link)) {
>> No locking required here?
> Considering the locked function is used, I'm assuming this function holds the fence_lock.
>
> If not, something's seriously wrong.
No. If a request is being retired that still has the ability to be 
signaled then something is seriously wrong. So nothing can be modifying 
request->signal_link at this point. The code below however is wrong and 
should not be assuming the fence_lock is held because it won't be. Not 
sure how that crept in!

>>> +        /*
>>> +         * The request must be marked as cancelled and the underlying
>>> +         * fence as failed. NB: There is no explicit fence fail API,
>>> +         * there is only a manual poke and signal.
>>> +         */
>>> +        request->cancelled = true;
>>> +        /* How to propagate to any associated sync_fence??? */
> ^This way works, so comment can be removed.
>
> There's deliberately no way to cancel a fence, it's the same path but with status member set.
>
> If you have a fence for another driver, there's really no good way to handle failure. So you have to treat
> it as if it succeeded.
Yeah, it didn't used to be like that. There was a halfway house before 
with Android sync points built on top of struct fence and both had a 
signalled state (with the Android one also carrying error information). 
The comment is left over from then. I'll remove it.

>>> +        request->fence.status = -EIO;
>>> +        fence_signal_locked(&request->fence);
>> And here?
See comment above.

>>
>>> +    }
>>> +
>>>        if (request->previous_context) {
>>>            if (i915.enable_execlists)
>>>                intel_lr_context_unpin(request->previous_context,
>>> @@ -2670,6 +2724,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>>>         */
>>>        request->postfix = intel_ring_get_tail(ringbuf);
>>>
>>> +    /*
>>> +     * Add the fence to the pending list before emitting the commands to
>>> +     * generate a seqno notification interrupt.
>>> +     */
>>> +    i915_gem_request_submit(request);
>>> +
>>>        if (i915.enable_execlists)
>>>            ret = engine->emit_request(request);
>>>        else {
>>> @@ -2755,25 +2815,154 @@ static void i915_gem_request_free(struct fence *req_fence)
>>>        struct drm_i915_gem_request *req;
>>>
>>>        req = container_of(req_fence, typeof(*req), fence);
>>> +
>>> +    WARN_ON(req->irq_enabled);
>> How useful is this? If it went wrong engine irq reference counting would be bad. Okay no one would notice, but we could then stick some other warns here, list !list_empy(req->list) and who knows what, which we don't have, so I am just wondering if this one brings any value.
It could be removed. It was specifically added as part of the debug/test 
cycle for writing this patch. Now that I'm pretty sure there are no 
logic errors, I guess it can be dropped.

>>
>>> +
>>>        call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>>>    }
>>>
>>> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>> +/*
>>> + * The request is about to be submitted to the hardware so add the fence to
>>> + * the list of signalable fences.
>>> + *
>>> + * NB: This does not necessarily enable interrupts yet. That only occurs on
>>> + * demand when the request is actually waited on. However, adding it to the
>>> + * list early ensures that there is no race condition where the interrupt
>>> + * could pop out prematurely and thus be completely lost. The race is merely
>>> + * that the interrupt must be manually checked for after being enabled.
>>> + */
>>> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>>>    {
>>> -    /* Interrupt driven fences are not implemented yet.*/
>>> -    WARN(true, "This should not be called!");
>>> -    return true;
>>> +    /*
>>> +     * Always enable signal processing for the request's fence object
>>> +     * before that request is submitted to the hardware. Thus there is no
>>> +     * race condition whereby the interrupt could pop out before the
>>> +     * request has been added to the signal list. Hence no need to check
>>> +     * for completion, undo the list add and return false.
>>> +     */
>>> +    i915_gem_request_reference(req);
>>> +    spin_lock_irq(&req->engine->fence_lock);
>>> +    WARN_ON(!list_empty(&req->signal_link));
>>> +    list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
>>> +    spin_unlock_irq(&req->engine->fence_lock);
>>> +
>>> +    /*
>>> +     * NB: Interrupts are only enabled on demand. Thus there is still a
>>> +     * race where the request could complete before the interrupt has
>>> +     * been enabled. Thus care must be taken at that point.
>>> +     */
>>> +
>>> +    /* Have interrupts already been requested? */
>>> +    if (req->signal_requested)
>>> +        i915_gem_request_enable_interrupt(req, false);
>> I am thinking that the fence lock could be held here until the end of the function and in such way i915_gem_request_enable_interrupt would not need the fence_locked parameter any more.
>>
>> It would probably also be safer with regards to accesing the req->signal_requested. I am not sure that enable signalling and this otherwise can't race and miss the signal_requested getting set?
Yeah, makes sense to move the enable_interrupt inside the spin lock. 
Will do.

>>> +}
>>> +
>>> +/*
>>> + * The request is being actively waited on, so enable interrupt based
>>> + * completion signalling.
>>> + */
>>> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
>>> +                       bool fence_locked)
>>> +{
>>> +    struct drm_i915_private *dev_priv = req->engine->i915;
>>> +    const bool irq_test_in_progress =
>>> +        ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
>>> +                        intel_engine_flag(req->engine);
>>> +
>>> +    if (req->irq_enabled)
>>> +        return;
>>> +
>>> +    if (irq_test_in_progress)
>>> +        return;
>>> +
>>> +    if (!WARN_ON(!req->engine->irq_get(req->engine)))
>>> +        req->irq_enabled = true;
>> The double negation confused me a bit. It is probably not ideal since WARN_ONs go to the out of line section so in a way it is deliberately penalising the fast and expected path. I think it would be better to put a WARN on the else path.
Will do.

>>> +
>>> +    /*
>>> +     * Because the interrupt is only enabled on demand, there is a race
>>> +     * where the interrupt can fire before anyone is looking for it. So
>>> +     * do an explicit check for missed interrupts.
>>> +     */
>>> +    i915_gem_request_notify(req->engine, fence_locked);
>>>    }
>>>
>>> -static bool i915_gem_request_is_completed(struct fence *req_fence)
>>> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>>>    {
>>>        struct drm_i915_gem_request *req = container_of(req_fence,
>>>                             typeof(*req), fence);
>>> +
>>> +    /*
>>> +     * No need to actually enable interrupt based processing until the
>>> +     * request has been submitted to the hardware. At which point
>>> +     * 'i915_gem_request_submit()' is called. So only really enable
>>> +     * signalling in there. Just set a flag to say that interrupts are
>>> +     * wanted when the request is eventually submitted. On the other hand
>>> +     * if the request has already been submitted then interrupts do need
>>> +     * to be enabled now.
>>> +     */
>>> +
>>> +    req->signal_requested = true;
>>> +
>>> +    if (!list_empty(&req->signal_link))
>> In what scenarios is the list_empty check needed? Someone can somehow enable signalling on a fence not yet submitted?
Yes. It is guaranteed to happen in Android. And when the scheduler 
arrives (which this series is officially prep work for) and submission 
is deferred, it becomes a lot easier in a non-Android system. I don't 
think there is currently a code path that will hit this case but because 
such is definitely coming, I would rather include the facility from the 
start.

>>> +        i915_gem_request_enable_interrupt(req, true);
>>> +
>>> +    return true;
>>> +}
>>> +
>>> +/**
>>> + * i915_gem_request_worker - request work handler callback.
>>> + * @work: Work structure
>>> + * Called in response to a seqno interrupt to process the completed requests.
>>> + */
>>> +void i915_gem_request_worker(struct work_struct *work)
>>> +{
>>> +    struct intel_engine_cs *engine;
>>> +
>>> +    engine = container_of(work, struct intel_engine_cs, request_work);
>>> +    i915_gem_request_notify(engine, false);
>>> +}
>>> +
>>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
>>> +{
>>> +    struct drm_i915_gem_request *req, *req_next;
>>> +    unsigned long flags;
>>>        u32 seqno;
>>>
>>> -    seqno = req->engine->get_seqno(req->engine);
>>> +    if (list_empty(&engine->fence_signal_list))
>> Okay this without the lock still makes me nervous. I'd rather not having to think about why it is safe and can't miss a wakeup.
I don't see how list_empty() can return a false negative. Even if the 
implementation was such that it could see a partially updated state 
across multiple memory accesses, that will just lead to it thinking 
not-empty which is fine. Any update which takes it from empty to 
not-empty is guaranteed to occur before the act of enabling interrupts 
and thus before notify() can be called. So while it could potentially do 
the full processing when an early exit was fine, it can never early exit 
when it needs to do something.

>> Also I would be leaning toward having i915_gem_request_notify and i915_gem_request_notify__unlocked. With the enable_interrupts simplification I suggested it think it would look better and be more consistent with the rest of the driver.
>>
>>> +        return;
>>> +
>>> +    if (!fence_locked)
>>> +        spin_lock_irqsave(&engine->fence_lock, flags);
>> Not called from hard irq context so can be just spin_lock_irq.
>>
>> But if you agree to go with the tasklet it would then be spin_lock_bh.
> fence is always spin_lock_irq, if this requires _bh then it can't go into the tasklet.
Updated.

>>> -    return i915_seqno_passed(seqno, req->seqno);
>>> +    if (engine->irq_seqno_barrier)
>>> +        engine->irq_seqno_barrier(engine);
>>> +    seqno = engine->get_seqno(engine);
>>> +
>>> +    list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
>>> +        if (!req->cancelled) {
>>> +            if (!i915_seqno_passed(seqno, req->seqno))
>>> +                break;
>> Merge to one if statement?
Will do.

>>
>>> +        }
>>> +
>>> +        /*
>>> +         * Start by removing the fence from the signal list otherwise
>>> +         * the retire code can run concurrently and get confused.
>>> +         */
>>> +        list_del_init(&req->signal_link);
>>> +
>>> +        if (!req->cancelled)
>>> +            fence_signal_locked(&req->fence);
>> I forgot how signalling errors to userspace works? Does that still work for cancelled fences in this series?
Yes, as Maarten mentioned there is no explicit error signal. You just 
set the status to an error code and signal the fence as normal. the 
signal here is only for genuinely completed fences. The cancelled case 
is handled further up.

>>> +
>>> +        if (req->irq_enabled) {
>>> +            req->engine->irq_put(req->engine);
>>> +            req->irq_enabled = false;
>>> +        }
>>> +
>>> +        i915_gem_request_unreference(req);
>>> +    }
>>> +
>>> +    if (!fence_locked)
>>> +        spin_unlock_irqrestore(&engine->fence_lock, flags);
>>>    }
>>>
>>>    static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
>>> @@ -2816,7 +3005,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
>>>
>>>    static const struct fence_ops i915_gem_request_fops = {
>>>        .enable_signaling    = i915_gem_request_enable_signaling,
>>> -    .signaled        = i915_gem_request_is_completed,
>>>        .wait            = fence_default_wait,
>>>        .release        = i915_gem_request_free,
>>>        .get_driver_name    = i915_gem_request_get_driver_name,
>>> @@ -2902,6 +3090,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
>>>        req->ctx  = ctx;
>>>        i915_gem_context_reference(req->ctx);
>>>
>>> +    INIT_LIST_HEAD(&req->signal_link);
>>>        fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
>>>               ctx->engine[engine->id].fence_timeline.fence_context,
>>>               i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
>>> @@ -3036,6 +3225,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
>>>            i915_gem_request_retire(request);
>>>        }
>>>
>>> +    /*
>>> +     * Tidy up anything left over. This includes a call to
>>> +     * i915_gem_request_notify() which will make sure that any requests
>>> +     * that were on the signal pending list get also cleaned up.
>>> +     */
>>> +    i915_gem_retire_requests_ring(engine);
>> Hmm.. but this function has just walked the same lists this will, and done the same processing. Why call this from here? It looks bad to me, the two are different special cases of the similar thing so I can't see that calling this from here makes sense.
Hmm, not sure. Possibly left over from an earlier version that was 
slightly different. Will tidy this up.

>>> +
>>>        /* Having flushed all requests from all queues, we know that all
>>>         * ringbuffers must now be empty. However, since we do not reclaim
>>>         * all space when retiring the request (to prevent HEADs colliding
>>> @@ -3082,6 +3278,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
>>>    {
>>>        WARN_ON(i915_verify_lists(engine->dev));
>>>
>>> +    /*
>>> +     * If no-one has waited on a request recently then interrupts will
>>> +     * not have been enabled and thus no requests will ever be marked as
>>> +     * completed. So do an interrupt check now.
>>> +     */
>>> +    i915_gem_request_notify(engine, false);
>> Would it work to signal the fence from the existing loop a bit above in this function which already walks the request list in search for completed ones? Or maybe even in i915_gem_request_retire?
>>
>> I am thinking about doing less list walking and better integration with the core GEM. Downside would be more traffic on the fence_lock, hmm.. not sure then. It just looks a bit bolted on like this.
>>
>> I don't see it being a noticeable cost so perhaps it can stay like this for now.
>>
>>> +
>>>        /* Retire requests first as we use it above for the early return.
>>>         * If we retire requests last, we may use a later seqno and so clear
>>>         * the requests lists without clearing the active list, leading to
>>> @@ -5102,6 +5305,7 @@ init_engine_lists(struct intel_engine_cs *engine)
>>>    {
>>>        INIT_LIST_HEAD(&engine->active_list);
>>>        INIT_LIST_HEAD(&engine->request_list);
>>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>>    }
>>>
>>>    void
>>> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
>>> index f780421..a87a3c5 100644
>>> --- a/drivers/gpu/drm/i915/i915_irq.c
>>> +++ b/drivers/gpu/drm/i915/i915_irq.c
>>> @@ -994,6 +994,8 @@ static void notify_ring(struct intel_engine_cs *engine)
>>>        trace_i915_gem_request_notify(engine);
>>>        engine->user_interrupts++;
>>>
>>> +    queue_work(engine->i915->req_wq, &engine->request_work);
>>> +
>>>        wake_up_all(&engine->irq_queue);
>> Yes that is the weird part, why the engine->irq_queue has to remain with this patch?
See earlier comments about not being able to use the wait-on-fence 
interface instead due to complications with GPU reset and such. 
Certainly the aim is to get rid of the wake_up_all() eventually, but 
that requires a lot more work over and above this patch.

>>
>>>    }
>>>
>>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>>> index f126bcb..134759d 100644
>>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>>> @@ -1879,6 +1879,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *engine)
>>>
>>>        dev_priv = engine->i915;
>>>
>>> +    cancel_work_sync(&engine->request_work);
>>> +
>>>        if (engine->buffer) {
>>>            intel_logical_ring_stop(engine);
>>>            WARN_ON((I915_READ_MODE(engine) & MODE_IDLE) == 0);
>>> @@ -2027,6 +2029,7 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>>
>>>        INIT_LIST_HEAD(&engine->active_list);
>>>        INIT_LIST_HEAD(&engine->request_list);
>>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>>        INIT_LIST_HEAD(&engine->buffers);
>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>        spin_lock_init(&engine->execlist_lock);
>>> @@ -2035,6 +2038,8 @@ logical_ring_setup(struct drm_device *dev, enum intel_engine_id id)
>>>        tasklet_init(&engine->irq_tasklet,
>>>                 intel_lrc_irq_handler, (unsigned long)engine);
>>>
>>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>>> +
>>>        logical_ring_init_platform_invariants(engine);
>>>        logical_ring_default_vfuncs(engine);
>>>        logical_ring_default_irqs(engine, info->irq_shift);
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> index fbd3f12..1641096 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
>>> @@ -2254,6 +2254,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>>        INIT_LIST_HEAD(&engine->request_list);
>>>        INIT_LIST_HEAD(&engine->execlist_queue);
>>>        INIT_LIST_HEAD(&engine->buffers);
>>> +    INIT_LIST_HEAD(&engine->fence_signal_list);
>>>        spin_lock_init(&engine->fence_lock);
>>>        i915_gem_batch_pool_init(dev, &engine->batch_pool);
>>>        memset(engine->semaphore.sync_seqno, 0,
>>> @@ -2261,6 +2262,8 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>>>
>>>        init_waitqueue_head(&engine->irq_queue);
>>>
>>> +    INIT_WORK(&engine->request_work, i915_gem_request_worker);
>>> +
>>>        ringbuf = intel_engine_create_ringbuffer(engine, 32 * PAGE_SIZE);
>>>        if (IS_ERR(ringbuf)) {
>>>            ret = PTR_ERR(ringbuf);
>>> @@ -2307,6 +2310,8 @@ void intel_cleanup_engine(struct intel_engine_cs *engine)
>>>
>>>        dev_priv = engine->i915;
>>>
>>> +    cancel_work_sync(&engine->request_work);
>>> +
>>>        if (engine->buffer) {
>>>            intel_stop_engine(engine);
>>>            WARN_ON(!IS_GEN2(dev_priv) && (I915_READ_MODE(engine) & MODE_IDLE) == 0);
>>> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> index 3f39daf..51779b4 100644
>>> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
>>> @@ -347,6 +347,9 @@ struct intel_engine_cs {
>>>        u32 (*get_cmd_length_mask)(u32 cmd_header);
>>>
>>>        spinlock_t fence_lock;
>>> +    struct list_head fence_signal_list;
>>> +
>>> +    struct work_struct request_work;
>>>    };
>>>
>>>    static inline bool

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 4/6] drm/i915: Interrupt driven fences
  2016-06-13 15:51       ` John Harrison
@ 2016-06-14 11:35         ` Tvrtko Ursulin
  0 siblings, 0 replies; 26+ messages in thread
From: Tvrtko Ursulin @ 2016-06-14 11:35 UTC (permalink / raw)
  To: John Harrison, Maarten Lankhorst, Intel-GFX


On 13/06/16 16:51, John Harrison wrote:

[snip]

>>>> diff --git a/drivers/gpu/drm/i915/i915_dma.c
>>>> b/drivers/gpu/drm/i915/i915_dma.c
>>>> index 07edaed..f8f60bb 100644
>>>> --- a/drivers/gpu/drm/i915/i915_dma.c
>>>> +++ b/drivers/gpu/drm/i915/i915_dma.c
>>>> @@ -1019,9 +1019,13 @@ static int i915_workqueues_init(struct
>>>> drm_i915_private *dev_priv)
>>>>        if (dev_priv->wq == NULL)
>>>>            goto out_err;
>>>>
>>>> +    dev_priv->req_wq = alloc_ordered_workqueue("i915-rq", 0);
>>>> +    if (dev_priv->req_wq == NULL)
>>>> +        goto out_free_wq;
>>>> +
>>> Single (per-device) ordered workqueue will serialize interrupt
>>> processing across all engines to one thread. Together with the fact
>>> request worker does not seem to need the sleeping context, I am
>>> thinking that a tasklet per engine would be much better (see
>>> engine->irq_tasklet for an example).
> Other conversations have stated that tasklets are not the best option. I
> did think about having a work queue per engine but that seemed
> excessive. Plus any subsequent work triggered by the fence completion
> will almost certainly require grabbing the driver mutex lock (because
> everything requires the mutex lock) so serialisation of engines doesn't
> sound like much of an issue.

In this patch AFAICS i915_gem_request_worker calls 
i915_gem_request_notify on the engine which only holds the 
engine->fence_lock. So if you had a per engine wq all engines could do 
that processing in parallel.

It is only a matter of when fence_signal_locked wakes up the waiters 
that they might go and hammer on struct_mutex, but for any work they 
want to do between waking up and submitting new work, serializing via a 
single wq will be bad for latency.

>>>
>>>> +        ret = __i915_spin_request(req, state);
>>>> +        if (ret == 0)
>>>> +            goto out;
>>>>        }
>>>>
>>>> +    /*
>>>> +     * Enable interrupt completion of the request.
>>>> +     */
>>>> +    fence_enable_sw_signaling(&req->fence);
>>>> +
>>>>        for (;;) {
>>>>            struct timer_list timer;
>>>>
>>>> @@ -1306,6 +1306,21 @@ int __i915_wait_request(struct
>>>> drm_i915_gem_request *req,
>>>>                break;
>>>>            }
>>>>
>>>> +        if (req->seqno) {
>>>> +            /*
>>>> +             * There is quite a lot of latency in the user interrupt
>>>> +             * path. So do an explicit seqno check and potentially
>>>> +             * remove all that delay.
>>>> +             */
>>>> +            if (req->engine->irq_seqno_barrier)
>>>> +                req->engine->irq_seqno_barrier(req->engine);
>>>> +            seqno = engine->get_seqno(engine);
>>>> +            if (i915_seqno_passed(seqno, req->seqno)) {
>>>> +                ret = 0;
>>>> +                break;
>>>> +            }
>>>> +        }
>>>> +
>>>>            if (signal_pending_state(state, current)) {
>>>>                ret = -ERESTARTSYS;
>>>>                break;
>>>> @@ -1332,14 +1347,32 @@ int __i915_wait_request(struct
>>>> drm_i915_gem_request *req,
>>>>                destroy_timer_on_stack(&timer);
>>>>            }
>>>>        }
>>>> -    if (!irq_test_in_progress)
>>>> -        engine->irq_put(engine);
>>>>
>>>>        finish_wait(&engine->irq_queue, &wait);
>>> Hm I don't understand why our custom waiting remains? Shouldn't
>>> fence_wait just be called after the optimistic spin, more or less?
> That would solve the 'thundering problem' if we could. In theory, the
> entire wait function should just be a call to 'fence_wait(&req->fence)'.
> Unfortunately, the wait function goes to sleep holding the mutex lock
> and requires having a bail out option on the wait which is not currently
> part of the fence API. I have a work-in-progress patch that almost
> solves the issues but it isn't quite there yet (and I haven't had much
> chance to work on it for a while).

[Btw does it also mean i915 can not handle the incoming 3rd party fences?]

So the userspace waiters can either wait on a dma-buf or i915 wq, 
depending on what ioctl they have called. Then on interrupt i915 waiters 
are woken directly, while the fence api ones go via a worker.

i915 waiters also do a second wake up the fence API waiters by...

>>>
>>>>    out:
>>>>        trace_i915_gem_request_wait_end(req);
>>>>
>>>> +    if ((ret == 0) && (req->seqno)) {
>>>> +        if (req->engine->irq_seqno_barrier)
>>>> +            req->engine->irq_seqno_barrier(req->engine);
>>>> +        seqno = engine->get_seqno(engine);
>>>> +        if (i915_seqno_passed(seqno, req->seqno) &&
>>>> +            !i915_gem_request_completed(req)) {
>>>> +            /*
>>>> +             * Make sure the request is marked as completed before
>>>> +             * returning. NB: Need to acquire the spinlock around
>>>> +             * the whole call to avoid a race condition with the
>>>> +             * interrupt handler is running concurrently and could
>>>> +             * cause this invocation to early exit even though the
>>>> +             * request has not actually been fully processed yet.
>>>> +             */
>>>> +            spin_lock_irq(&req->engine->fence_lock);
>>>> +            i915_gem_request_notify(req->engine, true);

... this call here.

If there are both classes of waiters one of those calls will be a waste 
of cycles (spin lock, irq toggle, list iter, coherent seqno read) correct?

>>>> +            spin_unlock_irq(&req->engine->fence_lock);
>>>> +        }
>>>> +    }
>>>> +
>>>>        if (timeout) {
>>>>            s64 tres = *timeout - (ktime_get_raw_ns() - before);
>>>>
>>>> @@ -1405,6 +1438,11 @@ static void i915_gem_request_retire(struct
>>>> drm_i915_gem_request *request)
>>>>    {
>>>>        trace_i915_gem_request_retire(request);
>>>>
>>>> +    if (request->irq_enabled) {
>>>> +        request->engine->irq_put(request->engine);
>>>> +        request->irq_enabled = false;
>>> What protects request->irq_enabled? Here versus enable_signalling
>>> bit? It can be called from the external fence users which would take
>>> the fence_lock, but here it does not.
> The flag can only be set when enabling interrupt driven completion
> (which can only happen once and only if the fence is not already
> signalled). The flag can only be cleared when the fence is signalled or
> when the request is retired. And retire without signal can only happen
> if the request is being cancelled in some way (e.g. GPU reset) and thus
> will not ever be signalled. So if we get here then none of the other
> paths are possible anymore.

Couldn't retire without signal happen via execbuf paths calling 
i915_gem_retire_requests_ring?

>>>> +        i915_gem_request_enable_interrupt(req, true);
>>>> +
>>>> +    return true;
>>>> +}
>>>> +
>>>> +/**
>>>> + * i915_gem_request_worker - request work handler callback.
>>>> + * @work: Work structure
>>>> + * Called in response to a seqno interrupt to process the completed
>>>> requests.
>>>> + */
>>>> +void i915_gem_request_worker(struct work_struct *work)
>>>> +{
>>>> +    struct intel_engine_cs *engine;
>>>> +
>>>> +    engine = container_of(work, struct intel_engine_cs, request_work);
>>>> +    i915_gem_request_notify(engine, false);
>>>> +}
>>>> +
>>>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool
>>>> fence_locked)
>>>> +{
>>>> +    struct drm_i915_gem_request *req, *req_next;
>>>> +    unsigned long flags;
>>>>        u32 seqno;
>>>>
>>>> -    seqno = req->engine->get_seqno(req->engine);
>>>> +    if (list_empty(&engine->fence_signal_list))
>>> Okay this without the lock still makes me nervous. I'd rather not
>>> having to think about why it is safe and can't miss a wakeup.
> I don't see how list_empty() can return a false negative. Even if the
> implementation was such that it could see a partially updated state
> across multiple memory accesses, that will just lead to it thinking
> not-empty which is fine. Any update which takes it from empty to
> not-empty is guaranteed to occur before the act of enabling interrupts
> and thus before notify() can be called. So while it could potentially do
> the full processing when an early exit was fine, it can never early exit
> when it needs to do something.

Something like that sounds like a good comment to put above then! :)

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-06-07 12:47   ` Maarten Lankhorst
@ 2016-06-16 12:10     ` John Harrison
  0 siblings, 0 replies; 26+ messages in thread
From: John Harrison @ 2016-06-16 12:10 UTC (permalink / raw)
  To: Maarten Lankhorst, Intel-GFX

On 07/06/2016 13:47, Maarten Lankhorst wrote:
> Op 01-06-16 om 19:07 schreef John.C.Harrison@Intel.com:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The notify function can be called many times without the seqno
>> changing. Some are to prevent races due to the requirement of not
>> enabling interrupts until requested. However, when interrupts are
>> enabled the IRQ handler can be called multiple times without the
>> ring's seqno value changing. E.g. two interrupts are generated by
>> batch buffers completing in quick succession, the first call to the
>> handler processes both completions but the handler still gets executed
>> a second time. This patch reduces the overhead of these extra calls by
>> caching the last processed seqno value and early exiting if it has not
>> changed.
> How significant is this overhead?
Doing the cache check hits the early exit approx 98% of the time when 
running GLBenchmark. Although the vast majority of duplicate calls are 
from having to call the notify function from 
i915_gem_retire_requests_ring() and that being called at least once for 
every execbuf IOCTL (possibly multiple times). I have just made a couple 
of tweaks to further reduce the number of these calls and their impact, 
but there are still a lot of them.

>
> Patch looks reasonable otherwise.
>
> ~Maarten

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2016-06-16 12:10 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-06-01 17:07 [PATCH v9 0/6] Convert requests to use struct fence John.C.Harrison
2016-06-01 17:07 ` [PATCH v9 1/6] drm/i915: Add per context timelines for fence objects John.C.Harrison
2016-06-02 10:28   ` Tvrtko Ursulin
2016-06-09 16:08     ` John Harrison
2016-06-07 11:17   ` Maarten Lankhorst
2016-06-09 17:22     ` John Harrison
2016-06-01 17:07 ` [PATCH v9 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
2016-06-02 11:07   ` Tvrtko Ursulin
2016-06-07 11:42     ` Maarten Lankhorst
2016-06-07 12:11       ` Tvrtko Ursulin
2016-06-10 11:26       ` John Harrison
2016-06-13 10:16         ` Maarten Lankhorst
2016-06-01 17:07 ` [PATCH v9 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
2016-06-07 12:07   ` Maarten Lankhorst
2016-06-01 17:07 ` [PATCH v9 4/6] drm/i915: Interrupt driven fences John.C.Harrison
2016-06-02 13:25   ` Tvrtko Ursulin
2016-06-07 12:02     ` Maarten Lankhorst
2016-06-07 12:19       ` Tvrtko Ursulin
2016-06-13 15:51       ` John Harrison
2016-06-14 11:35         ` Tvrtko Ursulin
2016-06-01 17:07 ` [PATCH v9 5/6] drm/i915: Updated request structure tracing John.C.Harrison
2016-06-07 12:15   ` Maarten Lankhorst
2016-06-01 17:07 ` [PATCH v9 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2016-06-07 12:47   ` Maarten Lankhorst
2016-06-16 12:10     ` John Harrison
2016-06-02 11:17 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev6) Patchwork

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.