All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v8 0/6] Convert requests to use struct fence
@ 2016-05-12 21:06 John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 1/6] drm/i915: Add per context timelines to fence object John.C.Harrison
                   ` (7 more replies)
  0 siblings, 8 replies; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

Using the struct fence object also has the advantage that the fence
can be used outside of the i915 driver (by other drivers or by
userland applications). That is the basis of the dma-buff
synchronisation API and allows asynchronous tracking of work
completion. In this case, it allows applications to be signalled
directly when a batch buffer completes without having to make an IOCTL
call into the driver.

Note that in order to allow the full fence API to be used (e.g.
merging multiple fences together), the driver needs to provide an
incrementing timeline for the fence. Currently this timeline is
specific to the fence code as it must be per context. There is future
work planned to make the driver's internal seqno value also be per
context rather than driver global (VIZ-7443). Once this is done the
fence specific timeline code can be dropped in favour of just using
the driver's seqno value.

This is work that was planned since the conversion of the driver from
being seqno value based to being request structure based. This patch
series does that work.

An IGT test to exercise the fence support from user land is in
progress and will follow. Android already makes extensive use of
fences for display composition. Real world linux usage is planned in
the form of Jesse's page table sharing / bufferless execbuf support.
There is also a plan that Wayland (and others) could make use of it in
a similar manner to Android.

v2: Updated for review comments by various people and to add support
for Android style 'native sync'.

v3: Updated from review comments by Tvrtko Ursulin. Also moved sync
framework out of staging and improved request completion handling.

v4: Fixed patch tag (should have been PATCH not RFC). Corrected
ownership of one patch which had passed through many hands before
reaching me. Fixed a bug introduced in v3 and updated for review
comments.

v5: Removed de-staging and further updates to Android sync code. The
de-stage is now being handled by someone else. The sync integration to
the i915 driver will be a separate patch set that can only land after
the external de-stage has been completed.

Assorted changes based on review comments and style checker fixes.
Most significant change is fixing up the fake lost interrupt support
for the 'drv_missed_irq_hang' IGT test and improving the wait request
latency.

v6: Updated to newer nigthly and resolved conflicts around updates
to the wait_request optimisations.

v7: Updated to newer nightly and resolved conflicts around massive
ring -> engine rename and interface change to get_seqno(). Also fixed
up a race condition issue with stale request pointers in file client
lists and added a minor optimisation to not acquire spinlocks when a
list is empty and does not need processing.

v8: Updated to yet another nightly and resolved the merge conflicts.
Dropped 'delay freeing of requests' patch as no longer needed to due
changes in request clean up code. Likewise with the deferred
processing of the fence signalling. Also moved the fence timeline
patch to before the fence conversion. It now means the timeline is
initially added with no actual user but also means the fence
conversion patch does not need to add a horrid hack timeline which is
then removed again in a subsequent patch.

Added support for possible RCU usage of fence object (Review comments
by Maarten Lankhorst).

[Patches against drm-intel-nightly tree fetched 05/05/2016]

John Harrison (6):
  drm/i915: Add per context timelines to fence object
  drm/i915: Convert requests to use struct fence
  drm/i915: Removed now redundant parameter to i915_gem_request_completed()
  drm/i915: Interrupt driven fences
  drm/i915: Updated request structure tracing
  drm/i915: Cache last IRQ seqno to reduce IRQ overhead

 drivers/gpu/drm/i915/i915_debugfs.c     |   7 +-
 drivers/gpu/drm/i915/i915_drv.h         |  63 ++---
 drivers/gpu/drm/i915/i915_gem.c         | 397 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_gem_context.c |  14 ++
 drivers/gpu/drm/i915/i915_irq.c         |   3 +-
 drivers/gpu/drm/i915/i915_trace.h       |  14 +-
 drivers/gpu/drm/i915/intel_display.c    |   2 +-
 drivers/gpu/drm/i915/intel_lrc.c        |  10 +
 drivers/gpu/drm/i915/intel_pm.c         |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   2 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 +
 11 files changed, 451 insertions(+), 69 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v8 1/6] drm/i915: Add per context timelines to fence object
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
@ 2016-05-12 21:06 ` John.C.Harrison
  2016-05-13  7:39   ` Chris Wilson
  2016-05-12 21:06 ` [PATCH v8 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The fence object used inside the request structure requires a sequence
number. Although this is not used by the i915 driver itself, it could
potentially be used by non-i915 code if the fence is passed outside of
the driver. This is the intention as it allows external kernel drivers
and user applications to wait on batch buffer completion
asynchronously via the dma-buff fence API.

To ensure that such external users are not confused by strange things
happening with the seqno, this patch adds in a per context timeline
that can provide a guaranteed in-order seqno value for the fence. This
is safe because the scheduler will not re-order batch buffers within a
context - they are considered to be mutually dependent.

v2: New patch in series.

v3: Renamed/retyped timeline structure fields after review comments by
Tvrtko Ursulin.

Added context information to the timeline's name string for better
identification in debugfs output.

v5: Line wrapping and other white space fixes to keep style checker
happy.

v7: Updated to newer nightly (lots of ring -> engine renaming).

v8: Moved to earlier in patch series so no longer needs to remove the
quick hack timeline that was being added before.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
 drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
 drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
 drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
 4 files changed, 76 insertions(+)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d5496ab..520480b 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
 	bool banned;
 };
 
+struct i915_fence_timeline {
+	char        name[32];
+	unsigned    fence_context;
+	unsigned    next;
+
+	struct intel_context *ctx;
+	struct intel_engine_cs *engine;
+};
+
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *ring);
+
 /* This must match up with the value previously used for execbuf2.rsvd1. */
 #define DEFAULT_CONTEXT_HANDLE 0
 
@@ -869,6 +882,7 @@ struct intel_context {
 		u64 lrc_desc;
 		uint32_t *lrc_reg_state;
 		bool initialised;
+		struct i915_fence_timeline fence_timeline;
 	} engine[I915_NUM_ENGINES];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c2548bd..99bd470 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2723,6 +2723,46 @@ void i915_gem_request_free(struct kref *req_ref)
 	kmem_cache_free(req->i915->requests, req);
 }
 
+int i915_create_fence_timeline(struct drm_device *dev,
+			       struct intel_context *ctx,
+			       struct intel_engine_cs *engine)
+{
+	struct i915_fence_timeline *timeline;
+
+	timeline = &ctx->engine[engine->id].fence_timeline;
+
+	if (timeline->engine)
+		return 0;
+
+	timeline->fence_context = fence_context_alloc(1);
+
+	/*
+	 * Start the timeline from seqno 0 as this is a special value
+	 * that is reserved for invalid sync points.
+	 */
+	timeline->next       = 1;
+	timeline->ctx        = ctx;
+	timeline->engine     = engine;
+
+	snprintf(timeline->name, sizeof(timeline->name), "%d>%s:%d",
+		 timeline->fence_context, engine->name, ctx->user_handle);
+
+	return 0;
+}
+
+unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+{
+	unsigned seqno;
+
+	seqno = timeline->next;
+
+	/* Reserve zero for invalid */
+	if (++timeline->next == 0)
+		timeline->next = 1;
+
+	return seqno;
+}
+
 static inline int
 __i915_gem_request_alloc(struct intel_engine_cs *engine,
 			 struct intel_context *ctx,
diff --git a/drivers/gpu/drm/i915/i915_gem_context.c b/drivers/gpu/drm/i915/i915_gem_context.c
index de8d878..e90762e 100644
--- a/drivers/gpu/drm/i915/i915_gem_context.c
+++ b/drivers/gpu/drm/i915/i915_gem_context.c
@@ -311,6 +311,20 @@ i915_gem_create_context(struct drm_device *dev,
 	if (IS_ERR(ctx))
 		return ctx;
 
+	if (!i915.enable_execlists) {
+		struct intel_engine_cs *engine;
+
+		/* Create a per context timeline for fences */
+		for_each_engine(engine, to_i915(dev)) {
+			ret = i915_create_fence_timeline(dev, ctx, engine);
+			if (ret) {
+				DRM_ERROR("Fence timeline creation failed for legacy %s: %p\n",
+					  engine->name, ctx);
+				goto err_destroy;
+			}
+		}
+	}
+
 	if (is_global_default_ctx && ctx->legacy_hw_ctx.rcs_state) {
 		/* We may need to do things with the shrinker which
 		 * require us to immediately switch back to the default
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index d876352..5e15fbe 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -2526,6 +2526,14 @@ static int execlists_context_deferred_alloc(struct intel_context *ctx,
 		goto error_ringbuf;
 	}
 
+	/* Create a per context timeline for fences */
+	ret = i915_create_fence_timeline(dev, ctx, engine);
+	if (ret) {
+		DRM_ERROR("Fence timeline creation failed for engine %s, ctx %p\n",
+			  engine->name, ctx);
+		goto error_ringbuf;
+	}
+
 	ctx->engine[engine->id].ringbuf = ringbuf;
 	ctx->engine[engine->id].state = ctx_obj;
 	ctx->engine[engine->id].initialised = engine->init_context == NULL;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 2/6] drm/i915: Convert requests to use struct fence
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 1/6] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-05-12 21:06 ` John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

There is a construct in the linux kernel called 'struct fence' that is
intended to keep track of work that is executed on hardware. I.e. it
solves the basic problem that the drivers 'struct
drm_i915_gem_request' is trying to address. The request structure does
quite a lot more than simply track the execution progress so is very
definitely still required. However, the basic completion status side
could be updated to use the ready made fence implementation and gain
all the advantages that provides.

This patch makes the first step of integrating a struct fence into the
request. It replaces the explicit reference count with that of the
fence. It also replaces the 'is completed' test with the fence's
equivalent. Currently, that simply chains on to the original request
implementation. A future patch will improve this.

v3: Updated after review comments by Tvrtko Ursulin. Added fence
context/seqno pair to the debugfs request info. Renamed fence 'driver
name' to just 'i915'. Removed BUG_ONs.

v5: Changed seqno format in debugfs to %x rather than %u as that is
apparently the preferred appearance. Line wrapped some long lines to
keep the style checker happy.

v6: Updated to newer nigthly and resolved conflicts. The biggest issue
was with the re-worked busy spin precursor to waiting on a request. In
particular, the addition of a 'request_started' helper function. This
has no corresponding concept within the fence framework. However, it
is only ever used in one place and the whole point of that place is to
always directly read the seqno for absolutely lowest latency possible.
So the simple solution is to just make the seqno test explicit at that
point now rather than later in the series (it was previously being
done anyway when fences become interrupt driven).

v7: Rebased to newer nightly - lots of ring -> engine renaming and
interface change to get_seqno().

v8: Rebased to newer nightly - no longer needs to worry about mutex
locking in the request free code path. Moved to after fence timeline
patch so no longer needs to add a horrid hack timeline.

Removed commented out code block. Added support for possible RCU usage
of fence object (Review comments by Maarten Lankhorst).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
Reviewed-by: Jesse Barnes <jbarnes@virtuousgeek.org>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |   5 +-
 drivers/gpu/drm/i915/i915_drv.h         |  43 +++++---------
 drivers/gpu/drm/i915/i915_gem.c         | 101 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/intel_lrc.c        |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 115 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6ad008c..6129743 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -740,11 +740,12 @@ static int i915_gem_request_info(struct seq_file *m, void *data)
 			task = NULL;
 			if (req->pid)
 				task = pid_task(req->pid, PIDTYPE_PID);
-			seq_printf(m, "    %x @ %d: %s [%d]\n",
+			seq_printf(m, "    %x @ %d: %s [%d], fence = %x:%x\n",
 				   req->seqno,
 				   (int) (jiffies - req->emitted_jiffies),
 				   task ? task->comm : "<unknown>",
-				   task ? task->pid : -1);
+				   task ? task->pid : -1,
+				   req->fence.context, req->fence.seqno);
 			rcu_read_unlock();
 		}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 520480b..b25886e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -42,6 +42,7 @@
 #include <linux/kref.h>
 #include <linux/pm_qos.h>
 #include <linux/shmem_fs.h>
+#include <linux/fence.h>
 
 #include <drm/drmP.h>
 #include <drm/intel-gtt.h>
@@ -2272,7 +2273,11 @@ void i915_gem_track_fb(struct drm_i915_gem_object *old,
  * initial reference taken using kref_init
  */
 struct drm_i915_gem_request {
-	struct kref ref;
+	/**
+	 * Underlying object for implementing the signal/wait stuff.
+	 */
+	struct fence fence;
+	struct rcu_head rcu_head;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2374,7 +2379,13 @@ struct drm_i915_gem_request {
 struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
-void i915_gem_request_free(struct kref *req_ref);
+
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
+					      bool lazy_coherency)
+{
+	return fence_is_signaled(&req->fence);
+}
+
 int i915_gem_request_add_to_client(struct drm_i915_gem_request *req,
 				   struct drm_file *file);
 
@@ -2394,14 +2405,14 @@ static inline struct drm_i915_gem_request *
 i915_gem_request_reference(struct drm_i915_gem_request *req)
 {
 	if (req)
-		kref_get(&req->ref);
+		fence_get(&req->fence);
 	return req;
 }
 
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	kref_put(&req->ref, i915_gem_request_free);
+	fence_put(&req->fence);
 }
 
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
@@ -2417,12 +2428,6 @@ static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 }
 
 /*
- * XXX: i915_gem_request_completed should be here but currently needs the
- * definition of i915_seqno_passed() which is below. It will be moved in
- * a later patch when the call to i915_seqno_passed() is obsoleted...
- */
-
-/*
  * A command that requires special handling by the command parser.
  */
 struct drm_i915_cmd_descriptor {
@@ -3093,24 +3098,6 @@ i915_seqno_passed(uint32_t seq1, uint32_t seq2)
 	return (int32_t)(seq1 - seq2) >= 0;
 }
 
-static inline bool i915_gem_request_started(struct drm_i915_gem_request *req,
-					   bool lazy_coherency)
-{
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
-				 req->previous_seqno);
-}
-
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
-{
-	if (!lazy_coherency && req->engine->irq_seqno_barrier)
-		req->engine->irq_seqno_barrier(req->engine);
-	return i915_seqno_passed(req->engine->get_seqno(req->engine),
-				 req->seqno);
-}
-
 int __must_check i915_gem_get_seqno(struct drm_device *dev, u32 *seqno);
 int __must_check i915_gem_set_seqno(struct drm_device *dev, u32 seqno);
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 99bd470..e7c24b8 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1170,6 +1170,7 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 {
 	unsigned long timeout;
 	unsigned cpu;
+	uint32_t seqno;
 
 	/* When waiting for high frequency requests, e.g. during synchronous
 	 * rendering split between the CPU and GPU, the finite amount of time
@@ -1185,12 +1186,14 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 		return -EBUSY;
 
 	/* Only spin if we know the GPU is processing this request */
-	if (!i915_gem_request_started(req, true))
+	seqno = req->engine->get_seqno(req->engine);
+	if (!i915_seqno_passed(seqno, req->previous_seqno))
 		return -EAGAIN;
 
 	timeout = local_clock_us(&cpu) + 5;
 	while (!need_resched()) {
-		if (i915_gem_request_completed(req, true))
+		seqno = req->engine->get_seqno(req->engine);
+		if (i915_seqno_passed(seqno, req->seqno))
 			return 0;
 
 		if (signal_pending_state(state, current))
@@ -1202,7 +1205,10 @@ static int __i915_spin_request(struct drm_i915_gem_request *req, int state)
 		cpu_relax_lowlatency();
 	}
 
-	if (i915_gem_request_completed(req, false))
+	if (req->engine->irq_seqno_barrier)
+		req->engine->irq_seqno_barrier(req->engine);
+	seqno = req->engine->get_seqno(req->engine);
+	if (i915_seqno_passed(seqno, req->seqno))
 		return 0;
 
 	return -EAGAIN;
@@ -2716,13 +2722,89 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 	}
 }
 
-void i915_gem_request_free(struct kref *req_ref)
+static void i915_gem_request_free_rcu(struct rcu_head *head)
 {
-	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
+	struct drm_i915_gem_request *req;
+
+	req = container_of(head, typeof(*req), rcu_head);
 	kmem_cache_free(req->i915->requests, req);
 }
 
+static void i915_gem_request_free(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
+}
+
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+{
+	/* Interrupt driven fences are not implemented yet.*/
+	WARN(true, "This should not be called!");
+	return true;
+}
+
+static bool i915_gem_request_is_completed(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req = container_of(req_fence,
+						 typeof(*req), fence);
+	u32 seqno;
+
+	seqno = req->engine->get_seqno(req->engine);
+
+	return i915_seqno_passed(seqno, req->seqno);
+}
+
+static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
+{
+	return "i915";
+}
+
+static const char *i915_gem_request_get_timeline_name(struct fence *req_fence)
+{
+	struct drm_i915_gem_request *req;
+	struct i915_fence_timeline *timeline;
+
+	req = container_of(req_fence, typeof(*req), fence);
+	timeline = &req->ctx->engine[req->engine->id].fence_timeline;
+
+	return timeline->name;
+}
+
+static void i915_gem_request_timeline_value_str(struct fence *req_fence,
+						char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	/* Last signalled timeline value ??? */
+	snprintf(str, size, "? [%d]"/*, timeline->value*/,
+		 req->engine->get_seqno(req->engine));
+}
+
+static void i915_gem_request_fence_value_str(struct fence *req_fence,
+					     char *str, int size)
+{
+	struct drm_i915_gem_request *req;
+
+	req = container_of(req_fence, typeof(*req), fence);
+
+	snprintf(str, size, "%d [%d]", req->fence.seqno, req->seqno);
+}
+
+static const struct fence_ops i915_gem_request_fops = {
+	.enable_signaling	= i915_gem_request_enable_signaling,
+	.signaled		= i915_gem_request_is_completed,
+	.wait			= fence_default_wait,
+	.release		= i915_gem_request_free,
+	.get_driver_name	= i915_gem_request_get_driver_name,
+	.get_timeline_name	= i915_gem_request_get_timeline_name,
+	.fence_value_str	= i915_gem_request_fence_value_str,
+	.timeline_value_str	= i915_gem_request_timeline_value_str,
+};
+
 int i915_create_fence_timeline(struct drm_device *dev,
 			       struct intel_context *ctx,
 			       struct intel_engine_cs *engine)
@@ -2750,7 +2832,7 @@ int i915_create_fence_timeline(struct drm_device *dev,
 	return 0;
 }
 
-unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
+static unsigned i915_fence_timeline_get_next_seqno(struct i915_fence_timeline *timeline)
 {
 	unsigned seqno;
 
@@ -2794,13 +2876,16 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	if (ret)
 		goto err;
 
-	kref_init(&req->ref);
 	req->i915 = dev_priv;
 	req->engine = engine;
 	req->reset_counter = reset_counter;
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
+	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
+		   ctx->engine[engine->id].fence_timeline.fence_context,
+		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
+
 	/*
 	 * Reserve space in the ring buffer for all the commands required to
 	 * eventually emit this request. This is to guarantee that the
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5e15fbe..199aa6e 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1975,6 +1975,7 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
 	engine->dev = dev;
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	init_waitqueue_head(&engine->irq_queue);
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 8f3eb30..671e49c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2287,6 +2287,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0,
 	       sizeof(engine->semaphore.sync_seqno));
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 723ff61..510ed58 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -345,6 +345,8 @@ struct  intel_engine_cs {
 	 * to encode the command length in the header).
 	 */
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
+
+	spinlock_t fence_lock;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed()
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 1/6] drm/i915: Add per context timelines to fence object John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
@ 2016-05-12 21:06 ` John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 4/6] drm/i915: Interrupt driven fences John.C.Harrison
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The change to the implementation of i915_gem_request_completed() means
that the lazy coherency flag is no longer used. This can now be
removed to simplify the interface.

v6: Updated to newer nightly and resolved conflicts.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c  |  2 +-
 drivers/gpu/drm/i915/i915_drv.h      |  3 +--
 drivers/gpu/drm/i915/i915_gem.c      | 14 +++++++-------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  4 ++--
 5 files changed, 12 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 6129743..029f002 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -632,7 +632,7 @@ static int i915_gem_pageflip_info(struct seq_file *m, void *data)
 					   i915_gem_request_get_seqno(work->flip_queued_req),
 					   dev_priv->next_seqno,
 					   engine->get_seqno(engine),
-					   i915_gem_request_completed(work->flip_queued_req, true));
+					   i915_gem_request_completed(work->flip_queued_req));
 			} else
 				seq_printf(m, "Flip not associated with any ring\n");
 			seq_printf(m, "Flip queued on frame %d, (was ready on frame %d), now %d\n",
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index b25886e..433d99a 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2380,8 +2380,7 @@ struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
 
-static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req,
-					      bool lazy_coherency)
+static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
 	return fence_is_signaled(&req->fence);
 }
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index e7c24b8..958edcb 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1251,7 +1251,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	if (list_empty(&req->list))
 		return 0;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return 0;
 
 	timeout_expire = 0;
@@ -1302,7 +1302,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		if (i915_gem_request_completed(req, false)) {
+		if (i915_gem_request_completed(req)) {
 			ret = 0;
 			break;
 		}
@@ -2943,7 +2943,7 @@ i915_gem_find_active_request(struct intel_engine_cs *engine)
 	struct drm_i915_gem_request *request;
 
 	list_for_each_entry(request, &engine->request_list, list) {
-		if (i915_gem_request_completed(request, false))
+		if (i915_gem_request_completed(request))
 			continue;
 
 		return request;
@@ -3074,7 +3074,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_completed(request))
 			break;
 
 		i915_gem_request_retire(request);
@@ -3098,7 +3098,7 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 	}
 
 	if (unlikely(engine->trace_irq_req &&
-		     i915_gem_request_completed(engine->trace_irq_req, true))) {
+		     i915_gem_request_completed(engine->trace_irq_req))) {
 		engine->irq_put(engine);
 		i915_gem_request_assign(&engine->trace_irq_req, NULL);
 	}
@@ -3199,7 +3199,7 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (list_empty(&req->list))
 			goto retire;
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_completed(req)) {
 			__i915_gem_request_retire__upto(req);
 retire:
 			i915_gem_object_retire__read(obj, i);
@@ -3308,7 +3308,7 @@ __i915_gem_object_sync(struct drm_i915_gem_object *obj,
 	if (to == from)
 		return 0;
 
-	if (i915_gem_request_completed(from_req, true))
+	if (i915_gem_request_completed(from_req))
 		return 0;
 
 	if (!i915_semaphore_is_enabled(obj->base.dev)) {
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index ddec3dd..ab9c12a 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11456,7 +11456,7 @@ static bool __intel_pageflip_stall_check(struct drm_device *dev,
 
 	if (work->flip_ready_vblank == 0) {
 		if (work->flip_queued_req &&
-		    !i915_gem_request_completed(work->flip_queued_req, true))
+		    !i915_gem_request_completed(work->flip_queued_req))
 			return false;
 
 		work->flip_ready_vblank = drm_crtc_vblank_count(crtc);
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 017c431..1446779 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7404,7 +7404,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 	struct request_boost *boost = container_of(work, struct request_boost, work);
 	struct drm_i915_gem_request *req = boost->req;
 
-	if (!i915_gem_request_completed(req, true))
+	if (!i915_gem_request_completed(req))
 		gen6_rps_boost(to_i915(req->engine->dev), NULL,
 			       req->emitted_jiffies);
 
@@ -7420,7 +7420,7 @@ void intel_queue_rps_boost_for_request(struct drm_device *dev,
 	if (req == NULL || INTEL_INFO(dev)->gen < 6)
 		return;
 
-	if (i915_gem_request_completed(req, true))
+	if (i915_gem_request_completed(req))
 		return;
 
 	boost = kmalloc(sizeof(*boost), GFP_ATOMIC);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 4/6] drm/i915: Interrupt driven fences
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (2 preceding siblings ...)
  2016-05-12 21:06 ` [PATCH v8 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
@ 2016-05-12 21:06 ` John.C.Harrison
  2016-05-13  7:27   ` Chris Wilson
  2016-05-19  9:24   ` Maarten Lankhorst
  2016-05-12 21:06 ` [PATCH v8 5/6] drm/i915: Updated request structure tracing John.C.Harrison
                   ` (3 subsequent siblings)
  7 siblings, 2 replies; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The intended usage model for struct fence is that the signalled status
should be set on demand rather than polled. That is, there should not
be a need for a 'signaled' function to be called everytime the status
is queried. Instead, 'something' should be done to enable a signal
callback from the hardware which will update the state directly. In
the case of requests, this is the seqno update interrupt. The idea is
that this callback will only be enabled on demand when something
actually tries to wait on the fence.

This change removes the polling test and replaces it with the callback
scheme. Each fence is added to a 'please poke me' list at the start of
i915_add_request(). The interrupt handler then scans through the 'poke
me' list when a new seqno pops out and signals any matching
fence/request. The fence is then removed from the list so the entire
request stack does not need to be scanned every time. Note that the
fence is added to the list before the commands to generate the seqno
interrupt are added to the ring. Thus the sequence is guaranteed to be
race free if the interrupt is already enabled.

Note that the interrupt is only enabled on demand (i.e. when
__wait_request() is called). Thus there is still a potential race when
enabling the interrupt as the request may already have completed.
However, this is simply solved by calling the interrupt processing
code immediately after enabling the interrupt and thereby checking for
already completed requests.

Lastly, the ring clean up code has the possibility to cancel
outstanding requests (e.g. because TDR has reset the ring). These
requests will never get signalled and so must be removed from the
signal list manually. This is done by setting a 'cancelled' flag and
then calling the regular notify/retire code path rather than
attempting to duplicate the list manipulatation and clean up code in
multiple places. This also avoid any race condition where the
cancellation request might occur after/during the completion interrupt
actually arriving.

v2: Updated to take advantage of the request unreference no longer
requiring the mutex lock.

v3: Move the signal list processing around to prevent unsubmitted
requests being added to the list. This was occurring on Android
because the native sync implementation calls the
fence->enable_signalling API immediately on fence creation.

Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
'link' instead of 'list'. Added support for returning an error code on
a cancelled fence. Update list processing to be more efficient/safer
with respect to spinlocks.

v5: Made i915_gem_request_submit a static as it is only ever called
from one place.

Fixed up the low latency wait optimisation. The time delay between the
seqno value being to memory and the drive's ISR running can be
significant, at least for the wait request micro-benchmark. This can
be greatly improved by explicitly checking for seqno updates in the
pre-wait busy poll loop. Also added some documentation comments to the
busy poll code.

Fixed up support for the faking of lost interrupts
(test_irq_rings/missed_irq_rings). That is, there is an IGT test that
tells the driver to loose interrupts deliberately and then check that
everything still works as expected (albeit much slower).

Updates from review comments: use non IRQ-save spinlocking, early exit
on WARN and improved comments (Tvrtko Ursulin).

v6: Updated to newer nigthly and resolved conflicts around the
wait_request busy spin optimisation. Also fixed a race condition
between this early exit path and the regular completion path.

v7: Updated to newer nightly - lots of ring -> engine renaming plus an
interface change on get_seqno(). Also added a list_empty() check
before acquring spinlocks and doing list processing.

v8: Updated to newer nightly - changes to request clean up code mean
non of the deferred free mess is needed any more.

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |   7 +
 drivers/gpu/drm/i915/i915_gem.c         | 235 +++++++++++++++++++++++++++++---
 drivers/gpu/drm/i915/i915_irq.c         |   2 +
 drivers/gpu/drm/i915/intel_lrc.c        |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
 6 files changed, 225 insertions(+), 22 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 433d99a..3adac59 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2278,6 +2278,10 @@ struct drm_i915_gem_request {
 	 */
 	struct fence fence;
 	struct rcu_head rcu_head;
+	struct list_head signal_link;
+	bool cancelled;
+	bool irq_enabled;
+	bool signal_requested;
 
 	/** On Which ring this request was generated */
 	struct drm_i915_private *i915;
@@ -2379,6 +2383,9 @@ struct drm_i915_gem_request {
 struct drm_i915_gem_request * __must_check
 i915_gem_request_alloc(struct intel_engine_cs *engine,
 		       struct intel_context *ctx);
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked);
+void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
 
 static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 958edcb..6669508 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -39,6 +39,8 @@
 #include <linux/pci.h>
 #include <linux/dma-buf.h>
 
+static void i915_gem_request_submit(struct drm_i915_gem_request *req);
+
 static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
 static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
 static void
@@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
 	struct drm_device *dev = engine->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	const bool irq_test_in_progress =
-		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
+	uint32_t seqno;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
 	s64 before = 0; /* Only to silence a compiler warning. */
@@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
 
-	if (list_empty(&req->list))
-		return 0;
-
 	if (i915_gem_request_completed(req))
 		return 0;
 
@@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	trace_i915_gem_request_wait_begin(req);
 
 	/* Optimistic spin for the next jiffie before touching IRQs */
-	ret = __i915_spin_request(req, state);
-	if (ret == 0)
-		goto out;
-
-	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
-		ret = -ENODEV;
-		goto out;
+	if (req->seqno) {
+		ret = __i915_spin_request(req, state);
+		if (ret == 0)
+			goto out;
 	}
 
+	/*
+	 * Enable interrupt completion of the request.
+	 */
+	fence_enable_sw_signaling(&req->fence);
+
 	for (;;) {
 		struct timer_list timer;
 
@@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
+		if (req->seqno) {
+			/*
+			 * There is quite a lot of latency in the user interrupt
+			 * path. So do an explicit seqno check and potentially
+			 * remove all that delay.
+			 */
+			if (req->engine->irq_seqno_barrier)
+				req->engine->irq_seqno_barrier(req->engine);
+			seqno = engine->get_seqno(engine);
+			if (i915_seqno_passed(seqno, req->seqno)) {
+				ret = 0;
+				break;
+			}
+		}
+
 		if (signal_pending_state(state, current)) {
 			ret = -ERESTARTSYS;
 			break;
@@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			destroy_timer_on_stack(&timer);
 		}
 	}
-	if (!irq_test_in_progress)
-		engine->irq_put(engine);
 
 	finish_wait(&engine->irq_queue, &wait);
 
 out:
 	trace_i915_gem_request_wait_end(req);
 
+	if ((ret == 0) && (req->seqno)) {
+		if (req->engine->irq_seqno_barrier)
+			req->engine->irq_seqno_barrier(req->engine);
+		seqno = engine->get_seqno(engine);
+		if (i915_seqno_passed(seqno, req->seqno) &&
+		    !i915_gem_request_completed(req)) {
+			/*
+			 * Make sure the request is marked as completed before
+			 * returning. NB: Need to acquire the spinlock around
+			 * the whole call to avoid a race condition with the
+			 * interrupt handler is running concurrently and could
+			 * cause this invocation to early exit even though the
+			 * request has not actually been fully processed yet.
+			 */
+			spin_lock_irq(&req->engine->fence_lock);
+			i915_gem_request_notify(req->engine, true);
+			spin_unlock_irq(&req->engine->fence_lock);
+		}
+	}
+
 	if (timeout) {
 		s64 tres = *timeout - (ktime_get_raw_ns() - before);
 
@@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 {
 	trace_i915_gem_request_retire(request);
 
+	if (request->irq_enabled) {
+		request->engine->irq_put(request->engine);
+		request->irq_enabled = false;
+	}
+
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
 	 * of tail of the request to update the last known position
@@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	list_del_init(&request->list);
 	i915_gem_request_remove_from_client(request);
 
+	/*
+	 * In case the request is still in the signal pending list,
+	 * e.g. due to being cancelled by TDR, preemption, etc.
+	 */
+	if (!list_empty(&request->signal_link)) {
+		/*
+		 * The request must be marked as cancelled and the underlying
+		 * fence as failed. NB: There is no explicit fence fail API,
+		 * there is only a manual poke and signal.
+		 */
+		request->cancelled = true;
+		/* How to propagate to any associated sync_fence??? */
+		request->fence.status = -EIO;
+		fence_signal_locked(&request->fence);
+	}
+
 	if (request->previous_context) {
 		if (i915.enable_execlists)
 			intel_lr_context_unpin(request->previous_context,
@@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
 	 */
 	request->postfix = intel_ring_get_tail(ringbuf);
 
+	/*
+	 * Add the fence to the pending list before emitting the commands to
+	 * generate a seqno notification interrupt.
+	 */
+	i915_gem_request_submit(request);
+
 	if (i915.enable_execlists)
 		ret = engine->emit_request(request);
 	else {
@@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence)
 	struct drm_i915_gem_request *req;
 
 	req = container_of(req_fence, typeof(*req), fence);
+
+	WARN_ON(req->irq_enabled);
+
 	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
 }
 
-static bool i915_gem_request_enable_signaling(struct fence *req_fence)
+/*
+ * The request is about to be submitted to the hardware so add the fence to
+ * the list of signalable fences.
+ *
+ * NB: This does not necessarily enable interrupts yet. That only occurs on
+ * demand when the request is actually waited on. However, adding it to the
+ * list early ensures that there is no race condition where the interrupt
+ * could pop out prematurely and thus be completely lost. The race is merely
+ * that the interrupt must be manually checked for after being enabled.
+ */
+static void i915_gem_request_submit(struct drm_i915_gem_request *req)
 {
-	/* Interrupt driven fences are not implemented yet.*/
-	WARN(true, "This should not be called!");
-	return true;
+	/*
+	 * Always enable signal processing for the request's fence object
+	 * before that request is submitted to the hardware. Thus there is no
+	 * race condition whereby the interrupt could pop out before the
+	 * request has been added to the signal list. Hence no need to check
+	 * for completion, undo the list add and return false.
+	 */
+	i915_gem_request_reference(req);
+	spin_lock_irq(&req->engine->fence_lock);
+	WARN_ON(!list_empty(&req->signal_link));
+	list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
+	spin_unlock_irq(&req->engine->fence_lock);
+
+	/*
+	 * NB: Interrupts are only enabled on demand. Thus there is still a
+	 * race where the request could complete before the interrupt has
+	 * been enabled. Thus care must be taken at that point.
+	 */
+
+	/* Have interrupts already been requested? */
+	if (req->signal_requested)
+		i915_gem_request_enable_interrupt(req, false);
+}
+
+/*
+ * The request is being actively waited on, so enable interrupt based
+ * completion signalling.
+ */
+void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
+				       bool fence_locked)
+{
+	struct drm_i915_private *dev_priv = to_i915(req->engine->dev);
+	const bool irq_test_in_progress =
+		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
+						intel_engine_flag(req->engine);
+
+	if (req->irq_enabled)
+		return;
+
+	if (irq_test_in_progress)
+		return;
+
+	if (!WARN_ON(!req->engine->irq_get(req->engine)))
+		req->irq_enabled = true;
+
+	/*
+	 * Because the interrupt is only enabled on demand, there is a race
+	 * where the interrupt can fire before anyone is looking for it. So
+	 * do an explicit check for missed interrupts.
+	 */
+	i915_gem_request_notify(req->engine, fence_locked);
 }
 
-static bool i915_gem_request_is_completed(struct fence *req_fence)
+static bool i915_gem_request_enable_signaling(struct fence *req_fence)
 {
 	struct drm_i915_gem_request *req = container_of(req_fence,
 						 typeof(*req), fence);
+
+	/*
+	 * No need to actually enable interrupt based processing until the
+	 * request has been submitted to the hardware. At which point
+	 * 'i915_gem_request_submit()' is called. So only really enable
+	 * signalling in there. Just set a flag to say that interrupts are
+	 * wanted when the request is eventually submitted. On the other hand
+	 * if the request has already been submitted then interrupts do need
+	 * to be enabled now.
+	 */
+
+	req->signal_requested = true;
+
+	if (!list_empty(&req->signal_link))
+		i915_gem_request_enable_interrupt(req, true);
+
+	return true;
+}
+
+void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
+{
+	struct drm_i915_gem_request *req, *req_next;
+	unsigned long flags;
 	u32 seqno;
 
-	seqno = req->engine->get_seqno(req->engine);
+	if (list_empty(&engine->fence_signal_list))
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&engine->fence_lock, flags);
+
+	if (engine->irq_seqno_barrier)
+		engine->irq_seqno_barrier(engine);
+	seqno = engine->get_seqno(engine);
+
+	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
+		if (!req->cancelled) {
+			if (!i915_seqno_passed(seqno, req->seqno))
+				break;
+		}
+
+		/*
+		 * Start by removing the fence from the signal list otherwise
+		 * the retire code can run concurrently and get confused.
+		 */
+		list_del_init(&req->signal_link);
 
-	return i915_seqno_passed(seqno, req->seqno);
+		if (!req->cancelled)
+			fence_signal_locked(&req->fence);
+
+		if (req->irq_enabled) {
+			req->engine->irq_put(req->engine);
+			req->irq_enabled = false;
+		}
+
+		i915_gem_request_unreference(req);
+	}
+
+	if (!fence_locked)
+		spin_unlock_irqrestore(&engine->fence_lock, flags);
 }
 
 static const char *i915_gem_request_get_driver_name(struct fence *req_fence)
@@ -2796,7 +2972,6 @@ static void i915_gem_request_fence_value_str(struct fence *req_fence,
 
 static const struct fence_ops i915_gem_request_fops = {
 	.enable_signaling	= i915_gem_request_enable_signaling,
-	.signaled		= i915_gem_request_is_completed,
 	.wait			= fence_default_wait,
 	.release		= i915_gem_request_free,
 	.get_driver_name	= i915_gem_request_get_driver_name,
@@ -2882,6 +3057,7 @@ __i915_gem_request_alloc(struct intel_engine_cs *engine,
 	req->ctx  = ctx;
 	i915_gem_context_reference(req->ctx);
 
+	INIT_LIST_HEAD(&req->signal_link);
 	fence_init(&req->fence, &i915_gem_request_fops, &engine->fence_lock,
 		   ctx->engine[engine->id].fence_timeline.fence_context,
 		   i915_fence_timeline_get_next_seqno(&ctx->engine[engine->id].fence_timeline));
@@ -3016,6 +3192,13 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 		i915_gem_request_retire(request);
 	}
 
+	/*
+	 * Tidy up anything left over. This includes a call to
+	 * i915_gem_request_notify() which will make sure that any requests
+	 * that were on the signal pending list get also cleaned up.
+	 */
+	i915_gem_retire_requests_ring(engine);
+
 	/* Having flushed all requests from all queues, we know that all
 	 * ringbuffers must now be empty. However, since we do not reclaim
 	 * all space when retiring the request (to prevent HEADs colliding
@@ -3062,6 +3245,13 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *engine)
 {
 	WARN_ON(i915_verify_lists(engine->dev));
 
+	/*
+	 * If no-one has waited on a request recently then interrupts will
+	 * not have been enabled and thus no requests will ever be marked as
+	 * completed. So do an interrupt check now.
+	 */
+	i915_gem_request_notify(engine, false);
+
 	/* Retire requests first as we use it above for the early return.
 	 * If we retire requests last, we may use a later seqno and so clear
 	 * the requests lists without clearing the active list, leading to
@@ -5098,6 +5288,7 @@ init_engine_lists(struct intel_engine_cs *engine)
 {
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
 }
 
 void
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 7a07987..7cdd076 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1002,6 +1002,8 @@ static void notify_ring(struct intel_engine_cs *engine)
 	trace_i915_gem_request_notify(engine);
 	engine->user_interrupts++;
 
+	i915_gem_request_notify(engine, false);
+
 	wake_up_all(&engine->irq_queue);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 199aa6e..7858521 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1975,6 +1975,7 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
 	engine->dev = dev;
 	INIT_LIST_HEAD(&engine->active_list);
 	INIT_LIST_HEAD(&engine->request_list);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
 	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	init_waitqueue_head(&engine->irq_queue);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 671e49c..a2cfd10 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -2287,6 +2287,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 	INIT_LIST_HEAD(&engine->request_list);
 	INIT_LIST_HEAD(&engine->execlist_queue);
 	INIT_LIST_HEAD(&engine->buffers);
+	INIT_LIST_HEAD(&engine->fence_signal_list);
 	spin_lock_init(&engine->fence_lock);
 	i915_gem_batch_pool_init(dev, &engine->batch_pool);
 	memset(engine->semaphore.sync_seqno, 0,
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 510ed58..113646c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -347,6 +347,7 @@ struct  intel_engine_cs {
 	u32 (*get_cmd_length_mask)(u32 cmd_header);
 
 	spinlock_t fence_lock;
+	struct list_head fence_signal_list;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 5/6] drm/i915: Updated request structure tracing
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (3 preceding siblings ...)
  2016-05-12 21:06 ` [PATCH v8 4/6] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-05-12 21:06 ` John.C.Harrison
  2016-05-12 21:06 ` [PATCH v8 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

Added the '_complete' trace event which occurs when a fence/request is
signaled as complete. Also moved the notify event from the IRQ handler
code to inside the notify function itself.

v3: Added the current ring seqno to the notify trace point.

v5: Line wrapping to keep the style checker happy.

v7: Updated to newer nightly (lots of ring -> engine renaming).

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c   |  9 +++++++--
 drivers/gpu/drm/i915/i915_irq.c   |  1 -
 drivers/gpu/drm/i915/i915_trace.h | 14 +++++++++-----
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6669508..4f4e445 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2895,8 +2895,10 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 	unsigned long flags;
 	u32 seqno;
 
-	if (list_empty(&engine->fence_signal_list))
+	if (list_empty(&engine->fence_signal_list)) {
+		trace_i915_gem_request_notify(engine, 0);
 		return;
+	}
 
 	if (!fence_locked)
 		spin_lock_irqsave(&engine->fence_lock, flags);
@@ -2904,6 +2906,7 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 	seqno = engine->get_seqno(engine);
+	trace_i915_gem_request_notify(engine, seqno);
 
 	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -2917,8 +2920,10 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 		 */
 		list_del_init(&req->signal_link);
 
-		if (!req->cancelled)
+		if (!req->cancelled) {
 			fence_signal_locked(&req->fence);
+			trace_i915_gem_request_complete(req);
+		}
 
 		if (req->irq_enabled) {
 			req->engine->irq_put(req->engine);
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 7cdd076..175e454 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -999,7 +999,6 @@ static void notify_ring(struct intel_engine_cs *engine)
 	if (!intel_engine_initialized(engine))
 		return;
 
-	trace_i915_gem_request_notify(engine);
 	engine->user_interrupts++;
 
 	i915_gem_request_notify(engine, false);
diff --git a/drivers/gpu/drm/i915/i915_trace.h b/drivers/gpu/drm/i915/i915_trace.h
index dc0def2..b7c7031 100644
--- a/drivers/gpu/drm/i915/i915_trace.h
+++ b/drivers/gpu/drm/i915/i915_trace.h
@@ -550,23 +550,27 @@ DEFINE_EVENT(i915_gem_request, i915_gem_request_add,
 );
 
 TRACE_EVENT(i915_gem_request_notify,
-	    TP_PROTO(struct intel_engine_cs *engine),
-	    TP_ARGS(engine),
+	    TP_PROTO(struct intel_engine_cs *engine, uint32_t seqno),
+	    TP_ARGS(engine, seqno),
 
 	    TP_STRUCT__entry(
 			     __field(u32, dev)
 			     __field(u32, ring)
 			     __field(u32, seqno)
+			     __field(bool, is_empty)
 			     ),
 
 	    TP_fast_assign(
 			   __entry->dev = engine->dev->primary->index;
 			   __entry->ring = engine->id;
-			   __entry->seqno = engine->get_seqno(engine);
+			   __entry->seqno = seqno;
+			   __entry->is_empty =
+					list_empty(&engine->fence_signal_list);
 			   ),
 
-	    TP_printk("dev=%u, ring=%u, seqno=%u",
-		      __entry->dev, __entry->ring, __entry->seqno)
+	    TP_printk("dev=%u, ring=%u, seqno=%u, empty=%d",
+		      __entry->dev, __entry->ring, __entry->seqno,
+		      __entry->is_empty)
 );
 
 DEFINE_EVENT(i915_gem_request, i915_gem_request_retire,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v8 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (4 preceding siblings ...)
  2016-05-12 21:06 ` [PATCH v8 5/6] drm/i915: Updated request structure tracing John.C.Harrison
@ 2016-05-12 21:06 ` John.C.Harrison
  2016-05-13  7:41   ` Chris Wilson
  2016-05-12 21:10 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev5) Patchwork
  2016-05-13  7:44 ` [PATCH v8 0/6] Convert requests to use struct fence Chris Wilson
  7 siblings, 1 reply; 20+ messages in thread
From: John.C.Harrison @ 2016-05-12 21:06 UTC (permalink / raw)
  To: Intel-GFX

From: John Harrison <John.C.Harrison@Intel.com>

The notify function can be called many times without the seqno
changing. Some are to prevent races due to the requirement of not
enabling interrupts until requested. However, when interrupts are
enabled the IRQ handler can be called multiple times without the
ring's seqno value changing. E.g. two interrupts are generated by
batch buffers completing in quick succession, the first call to the
handler processes both completions but the handler still gets executed
a second time. This patch reduces the overhead of these extra calls by
caching the last processed seqno value and early exiting if it has not
changed.

v3: New patch for series.

v5: Added comment about last_irq_seqno usage due to code review
feedback (Tvrtko Ursulin).

v6: Minor update to resolve a race condition with the wait_request
optimisation.

v7: Updated to newer nightly - lots of ring -> engine renaming plus an
interface change to get_seqno().

For: VIZ-5190
Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c         | 26 ++++++++++++++++++++++----
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 +
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 4f4e445..9ae6148 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1369,6 +1369,7 @@ out:
 			 * request has not actually been fully processed yet.
 			 */
 			spin_lock_irq(&req->engine->fence_lock);
+			req->engine->last_irq_seqno = 0;
 			i915_gem_request_notify(req->engine, true);
 			spin_unlock_irq(&req->engine->fence_lock);
 		}
@@ -2577,9 +2578,12 @@ i915_gem_init_seqno(struct drm_device *dev, u32 seqno)
 	i915_gem_retire_requests(dev);
 
 	/* Finally reset hw state */
-	for_each_engine(engine, dev_priv)
+	for_each_engine(engine, dev_priv) {
 		intel_ring_init_seqno(engine, seqno);
 
+		engine->last_irq_seqno = 0;
+	}
+
 	return 0;
 }
 
@@ -2900,13 +2904,24 @@ void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
 		return;
 	}
 
-	if (!fence_locked)
-		spin_lock_irqsave(&engine->fence_lock, flags);
-
+	/*
+	 * Check for a new seqno. If it hasn't actually changed then early
+	 * exit without even grabbing the spinlock. Note that this is safe
+	 * because any corruption of last_irq_seqno merely results in doing
+	 * the full processing when there is potentially no work to be done.
+	 * It can never lead to not processing work that does need to happen.
+	 */
 	if (engine->irq_seqno_barrier)
 		engine->irq_seqno_barrier(engine);
 	seqno = engine->get_seqno(engine);
 	trace_i915_gem_request_notify(engine, seqno);
+	if (seqno == engine->last_irq_seqno)
+		return;
+
+	if (!fence_locked)
+		spin_lock_irqsave(&engine->fence_lock, flags);
+
+	engine->last_irq_seqno = seqno;
 
 	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
 		if (!req->cancelled) {
@@ -3201,7 +3216,10 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 	 * Tidy up anything left over. This includes a call to
 	 * i915_gem_request_notify() which will make sure that any requests
 	 * that were on the signal pending list get also cleaned up.
+	 * NB: The seqno cache must be cleared otherwise the notify call will
+	 * simply return immediately.
 	 */
+	engine->last_irq_seqno = 0;
 	i915_gem_retire_requests_ring(engine);
 
 	/* Having flushed all requests from all queues, we know that all
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 113646c..1381e52 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -348,6 +348,7 @@ struct  intel_engine_cs {
 
 	spinlock_t fence_lock;
 	struct list_head fence_signal_list;
+	uint32_t last_irq_seqno;
 };
 
 static inline bool
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev5)
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (5 preceding siblings ...)
  2016-05-12 21:06 ` [PATCH v8 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-05-12 21:10 ` Patchwork
  2016-05-13  7:44 ` [PATCH v8 0/6] Convert requests to use struct fence Chris Wilson
  7 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2016-05-12 21:10 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: intel-gfx

== Series Details ==

Series: Convert requests to use struct fence (rev5)
URL   : https://patchwork.freedesktop.org/series/1068/
State : failure

== Summary ==

Applying: drm/i915: Add per context timelines to fence object
Applying: drm/i915: Convert requests to use struct fence
Patch failed at 0002 drm/i915: Convert requests to use struct fence
The copy of the patch that failed is found in: .git/rebase-apply/patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 4/6] drm/i915: Interrupt driven fences
  2016-05-12 21:06 ` [PATCH v8 4/6] drm/i915: Interrupt driven fences John.C.Harrison
@ 2016-05-13  7:27   ` Chris Wilson
  2016-05-13  9:19     ` John Harrison
  2016-05-19  9:24   ` Maarten Lankhorst
  1 sibling, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-05-13  7:27 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, May 12, 2016 at 10:06:34PM +0100, John.C.Harrison@Intel.com wrote:
> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>  	u32 seqno;
>  
> -	seqno = req->engine->get_seqno(req->engine);
> +	if (list_empty(&engine->fence_signal_list))
> +		return;
> +
> +	if (!fence_locked)
> +		spin_lock_irqsave(&engine->fence_lock, flags);
> +
> +	if (engine->irq_seqno_barrier)
> +		engine->irq_seqno_barrier(engine);
> +	seqno = engine->get_seqno(engine);
> +
> +	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {

NO, NO, NO. As we said the very first time, you cannot do this from an
irq handler.

The current code is already bad enough, this is making it large constant
+ N times worse. Please do look at how to do signal driven fences in O(1)
that I posted many months ago and several times since.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/6] drm/i915: Add per context timelines to fence object
  2016-05-12 21:06 ` [PATCH v8 1/6] drm/i915: Add per context timelines to fence object John.C.Harrison
@ 2016-05-13  7:39   ` Chris Wilson
  2016-05-13  9:16     ` John Harrison
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-05-13  7:39 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, May 12, 2016 at 10:06:31PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The fence object used inside the request structure requires a sequence
> number. Although this is not used by the i915 driver itself, it could
> potentially be used by non-i915 code if the fence is passed outside of
> the driver. This is the intention as it allows external kernel drivers
> and user applications to wait on batch buffer completion
> asynchronously via the dma-buff fence API.
> 
> To ensure that such external users are not confused by strange things
> happening with the seqno, this patch adds in a per context timeline
> that can provide a guaranteed in-order seqno value for the fence. This
> is safe because the scheduler will not re-order batch buffers within a
> context - they are considered to be mutually dependent.
> 
> v2: New patch in series.
> 
> v3: Renamed/retyped timeline structure fields after review comments by
> Tvrtko Ursulin.
> 
> Added context information to the timeline's name string for better
> identification in debugfs output.
> 
> v5: Line wrapping and other white space fixes to keep style checker
> happy.
> 
> v7: Updated to newer nightly (lots of ring -> engine renaming).
> 
> v8: Moved to earlier in patch series so no longer needs to remove the
> quick hack timeline that was being added before.
> 
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>  drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>  drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
>  drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>  4 files changed, 76 insertions(+)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d5496ab..520480b 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
>  	bool banned;
>  };
>  
> +struct i915_fence_timeline {
> +	char        name[32];
> +	unsigned    fence_context;
> +	unsigned    next;
> +
> +	struct intel_context *ctx;
> +	struct intel_engine_cs *engine;
> +};
> +
> +int i915_create_fence_timeline(struct drm_device *dev,
> +			       struct intel_context *ctx,
> +			       struct intel_engine_cs *ring);
> +
>  /* This must match up with the value previously used for execbuf2.rsvd1. */
>  #define DEFAULT_CONTEXT_HANDLE 0
>  
> @@ -869,6 +882,7 @@ struct intel_context {
>  		u64 lrc_desc;
>  		uint32_t *lrc_reg_state;
>  		bool initialised;
> +		struct i915_fence_timeline fence_timeline;

This is still fundamentally wrong. This i915_fence_timeline is both the
fence context and timeline. Our timelines are singular per vm, with a
fence context under each timeline per engine.

Please complete the legacy/execlists unification first so that we can
have timelines work correctly for the whole driver.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead
  2016-05-12 21:06 ` [PATCH v8 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
@ 2016-05-13  7:41   ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2016-05-13  7:41 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, May 12, 2016 at 10:06:36PM +0100, John.C.Harrison@Intel.com wrote:
> From: John Harrison <John.C.Harrison@Intel.com>
> 
> The notify function can be called many times without the seqno
> changing. Some are to prevent races due to the requirement of not
> enabling interrupts until requested. However, when interrupts are
> enabled the IRQ handler can be called multiple times without the
> ring's seqno value changing. E.g. two interrupts are generated by
> batch buffers completing in quick succession, the first call to the
> handler processes both completions but the handler still gets executed
> a second time. This patch reduces the overhead of these extra calls by
> caching the last processed seqno value and early exiting if it has not
> changed.

The idea is not to cache the last seqno (since that value is already
cached!) but to post new irq events.

Compare and contrast
https://patchwork.freedesktop.org/patch/85664/
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 0/6] Convert requests to use struct fence
  2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
                   ` (6 preceding siblings ...)
  2016-05-12 21:10 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev5) Patchwork
@ 2016-05-13  7:44 ` Chris Wilson
  2016-05-13 10:11   ` John Harrison
  7 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-05-13  7:44 UTC (permalink / raw)
  To: John.C.Harrison; +Cc: Intel-GFX

On Thu, May 12, 2016 at 10:06:30PM +0100, John.C.Harrison@Intel.com wrote:
> Added support for possible RCU usage of fence object (Review comments
> by Maarten Lankhorst).

You mean like
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=001f3468762ca49137b5658f2aa58507dbfa4a05
?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/6] drm/i915: Add per context timelines to fence object
  2016-05-13  7:39   ` Chris Wilson
@ 2016-05-13  9:16     ` John Harrison
  2016-05-18 12:22       ` Maarten Lankhorst
  0 siblings, 1 reply; 20+ messages in thread
From: John Harrison @ 2016-05-13  9:16 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 13/05/2016 08:39, Chris Wilson wrote:
> On Thu, May 12, 2016 at 10:06:31PM +0100, John.C.Harrison@Intel.com wrote:
>> From: John Harrison <John.C.Harrison@Intel.com>
>>
>> The fence object used inside the request structure requires a sequence
>> number. Although this is not used by the i915 driver itself, it could
>> potentially be used by non-i915 code if the fence is passed outside of
>> the driver. This is the intention as it allows external kernel drivers
>> and user applications to wait on batch buffer completion
>> asynchronously via the dma-buff fence API.
>>
>> To ensure that such external users are not confused by strange things
>> happening with the seqno, this patch adds in a per context timeline
>> that can provide a guaranteed in-order seqno value for the fence. This
>> is safe because the scheduler will not re-order batch buffers within a
>> context - they are considered to be mutually dependent.
>>
>> v2: New patch in series.
>>
>> v3: Renamed/retyped timeline structure fields after review comments by
>> Tvrtko Ursulin.
>>
>> Added context information to the timeline's name string for better
>> identification in debugfs output.
>>
>> v5: Line wrapping and other white space fixes to keep style checker
>> happy.
>>
>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>
>> v8: Moved to earlier in patch series so no longer needs to remove the
>> quick hack timeline that was being added before.
>>
>> For: VIZ-5190
>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>   drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>>   drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
>>   drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>   4 files changed, 76 insertions(+)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index d5496ab..520480b 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
>>   	bool banned;
>>   };
>>   
>> +struct i915_fence_timeline {
>> +	char        name[32];
>> +	unsigned    fence_context;
>> +	unsigned    next;
>> +
>> +	struct intel_context *ctx;
>> +	struct intel_engine_cs *engine;
>> +};
>> +
>> +int i915_create_fence_timeline(struct drm_device *dev,
>> +			       struct intel_context *ctx,
>> +			       struct intel_engine_cs *ring);
>> +
>>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>>   #define DEFAULT_CONTEXT_HANDLE 0
>>   
>> @@ -869,6 +882,7 @@ struct intel_context {
>>   		u64 lrc_desc;
>>   		uint32_t *lrc_reg_state;
>>   		bool initialised;
>> +		struct i915_fence_timeline fence_timeline;
> This is still fundamentally wrong. This i915_fence_timeline is both the
> fence context and timeline. Our timelines are singular per vm, with a
> fence context under each timeline per engine.
As I said last time you mentioned 'per vm', please explain. Where does a 
virtual machine come in? Or is that not what you mean? What is the 
purpose of having multiple fence contexts within a single timeline? It 
will stop external uses attempting to combine fences but it won't stop 
them from actually being out of order. The timeline needs to be per 
hardware context not simply per engine because the whole point is that 
requests should not be re-ordered once they have been allocated a point 
on a timeline. However, the purpose of the scheduler (which is what this 
series is preparation for) is to re-order requests between contexts. 
Thus the timeline must be per context to prevent requests ending up with 
out of order completions.


> Please complete the legacy/execlists unification first so that we can
> have timelines work correctly for the whole driver.
No. That is a much larger task that people are working towards. However, 
we urgently need a scheduler for specific customers to use right now. 
That means we need to get something working right now not in some random 
number of years time. If an intermediate step is less than perfect but 
still functional that is better than not having anything at all.

> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 4/6] drm/i915: Interrupt driven fences
  2016-05-13  7:27   ` Chris Wilson
@ 2016-05-13  9:19     ` John Harrison
  2016-05-19  9:47       ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: John Harrison @ 2016-05-13  9:19 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 13/05/2016 08:27, Chris Wilson wrote:
> On Thu, May 12, 2016 at 10:06:34PM +0100, John.C.Harrison@Intel.com wrote:
>> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
>> +{
>> +	struct drm_i915_gem_request *req, *req_next;
>> +	unsigned long flags;
>>   	u32 seqno;
>>   
>> -	seqno = req->engine->get_seqno(req->engine);
>> +	if (list_empty(&engine->fence_signal_list))
>> +		return;
>> +
>> +	if (!fence_locked)
>> +		spin_lock_irqsave(&engine->fence_lock, flags);
>> +
>> +	if (engine->irq_seqno_barrier)
>> +		engine->irq_seqno_barrier(engine);
>> +	seqno = engine->get_seqno(engine);
>> +
>> +	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
> NO, NO, NO. As we said the very first time, you cannot do this from an
> irq handler.
>
> The current code is already bad enough, this is making it large constant
> + N times worse. Please do look at how to do signal driven fences in O(1)
> that I posted many months ago and several times since.

If you have a better solution available then please post it in a form 
that can be merged and get it reviewed and accepted.

> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 0/6] Convert requests to use struct fence
  2016-05-13  7:44 ` [PATCH v8 0/6] Convert requests to use struct fence Chris Wilson
@ 2016-05-13 10:11   ` John Harrison
  0 siblings, 0 replies; 20+ messages in thread
From: John Harrison @ 2016-05-13 10:11 UTC (permalink / raw)
  To: Chris Wilson, Intel-GFX

On 13/05/2016 08:44, Chris Wilson wrote:
> On Thu, May 12, 2016 at 10:06:30PM +0100, John.C.Harrison@Intel.com wrote:
>> Added support for possible RCU usage of fence object (Review comments
>> by Maarten Lankhorst).
> You mean like
> https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=001f3468762ca49137b5658f2aa58507dbfa4a05
> ?

Except that that patch is 113/190 and is a lot more complex and with 
different aims. The purpose of this small patch series is to add struct 
fence support to the driver to allow external fence usage and scheduling 
not to add RCU support to i915 for faster request list processing.

> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/6] drm/i915: Add per context timelines to fence object
  2016-05-13  9:16     ` John Harrison
@ 2016-05-18 12:22       ` Maarten Lankhorst
  2016-05-18 12:49         ` John Harrison
  0 siblings, 1 reply; 20+ messages in thread
From: Maarten Lankhorst @ 2016-05-18 12:22 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, Intel-GFX

Op 13-05-16 om 11:16 schreef John Harrison:
> On 13/05/2016 08:39, Chris Wilson wrote:
>> On Thu, May 12, 2016 at 10:06:31PM +0100, John.C.Harrison@Intel.com wrote:
>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>
>>> The fence object used inside the request structure requires a sequence
>>> number. Although this is not used by the i915 driver itself, it could
>>> potentially be used by non-i915 code if the fence is passed outside of
>>> the driver. This is the intention as it allows external kernel drivers
>>> and user applications to wait on batch buffer completion
>>> asynchronously via the dma-buff fence API.
>>>
>>> To ensure that such external users are not confused by strange things
>>> happening with the seqno, this patch adds in a per context timeline
>>> that can provide a guaranteed in-order seqno value for the fence. This
>>> is safe because the scheduler will not re-order batch buffers within a
>>> context - they are considered to be mutually dependent.
>>>
>>> v2: New patch in series.
>>>
>>> v3: Renamed/retyped timeline structure fields after review comments by
>>> Tvrtko Ursulin.
>>>
>>> Added context information to the timeline's name string for better
>>> identification in debugfs output.
>>>
>>> v5: Line wrapping and other white space fixes to keep style checker
>>> happy.
>>>
>>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>>
>>> v8: Moved to earlier in patch series so no longer needs to remove the
>>> quick hack timeline that was being added before.
>>>
>>> For: VIZ-5190
>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>>   drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>>>   drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
>>>   drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>>   4 files changed, 76 insertions(+)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index d5496ab..520480b 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
>>>       bool banned;
>>>   };
>>>   +struct i915_fence_timeline {
>>> +    char        name[32];
>>> +    unsigned    fence_context;
>>> +    unsigned    next;
>>> +
>>> +    struct intel_context *ctx;
>>> +    struct intel_engine_cs *engine;
>>> +};
>>> +
>>> +int i915_create_fence_timeline(struct drm_device *dev,
>>> +                   struct intel_context *ctx,
>>> +                   struct intel_engine_cs *ring);
>>> +
>>>   /* This must match up with the value previously used for execbuf2.rsvd1. */
>>>   #define DEFAULT_CONTEXT_HANDLE 0
>>>   @@ -869,6 +882,7 @@ struct intel_context {
>>>           u64 lrc_desc;
>>>           uint32_t *lrc_reg_state;
>>>           bool initialised;
>>> +        struct i915_fence_timeline fence_timeline;
>> This is still fundamentally wrong. This i915_fence_timeline is both the
>> fence context and timeline. Our timelines are singular per vm, with a
>> fence context under each timeline per engine.
> As I said last time you mentioned 'per vm', please explain. Where does a virtual machine come in? Or is that not what you mean? What is the purpose of having multiple fence contexts within a single timeline? It will stop external uses attempting to combine fences but it won't stop them from actually being out of order. The timeline needs to be per hardware context not simply per engine because the whole point is that requests should not be re-ordered once they have been allocated a point on a timeline. However, the purpose of the scheduler (which is what this series is preparation for) is to re-order requests between contexts. Thus the timeline must be per context to prevent requests ending up with out of order completions.

Virtual memory, not machine. Guessing i915_address_space :-)

Fence contexts may not have to be bound to a single timeline, lets say you have 2 engines, running commands for Com1 and Com2.

ENG1: Com1.1
ENG2: Com2.1 Com1.2 Com1.3

then there's no harm if you end up with multiple fence contexts since they're independently executing.

>
>> Please complete the legacy/execlists unification first so that we can
>> have timelines work correctly for the whole driver.
> No. That is a much larger task that people are working towards. However, we urgently need a scheduler for specific customers to use right now. That means we need to get something working right now not in some random number of years time. If an intermediate step is less than perfect but still functional that is better than not having anything at all.
>
>> -Chris
>>
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/intel-gfx


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/6] drm/i915: Add per context timelines to fence object
  2016-05-18 12:22       ` Maarten Lankhorst
@ 2016-05-18 12:49         ` John Harrison
  2016-05-18 13:20           ` Maarten Lankhorst
  0 siblings, 1 reply; 20+ messages in thread
From: John Harrison @ 2016-05-18 12:49 UTC (permalink / raw)
  To: Maarten Lankhorst, Chris Wilson, Intel-GFX

On 18/05/2016 13:22, Maarten Lankhorst wrote:
> Op 13-05-16 om 11:16 schreef John Harrison:
>> On 13/05/2016 08:39, Chris Wilson wrote:
>>> On Thu, May 12, 2016 at 10:06:31PM +0100, John.C.Harrison@Intel.com wrote:
>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>
>>>> The fence object used inside the request structure requires a sequence
>>>> number. Although this is not used by the i915 driver itself, it could
>>>> potentially be used by non-i915 code if the fence is passed outside of
>>>> the driver. This is the intention as it allows external kernel drivers
>>>> and user applications to wait on batch buffer completion
>>>> asynchronously via the dma-buff fence API.
>>>>
>>>> To ensure that such external users are not confused by strange things
>>>> happening with the seqno, this patch adds in a per context timeline
>>>> that can provide a guaranteed in-order seqno value for the fence. This
>>>> is safe because the scheduler will not re-order batch buffers within a
>>>> context - they are considered to be mutually dependent.
>>>>
>>>> v2: New patch in series.
>>>>
>>>> v3: Renamed/retyped timeline structure fields after review comments by
>>>> Tvrtko Ursulin.
>>>>
>>>> Added context information to the timeline's name string for better
>>>> identification in debugfs output.
>>>>
>>>> v5: Line wrapping and other white space fixes to keep style checker
>>>> happy.
>>>>
>>>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>>>
>>>> v8: Moved to earlier in patch series so no longer needs to remove the
>>>> quick hack timeline that was being added before.
>>>>
>>>> For: VIZ-5190
>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>> ---
>>>>    drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>>>    drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
>>>>    drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>>>    4 files changed, 76 insertions(+)
>>>>
>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>> index d5496ab..520480b 100644
>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>> @@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
>>>>        bool banned;
>>>>    };
>>>>    +struct i915_fence_timeline {
>>>> +    char        name[32];
>>>> +    unsigned    fence_context;
>>>> +    unsigned    next;
>>>> +
>>>> +    struct intel_context *ctx;
>>>> +    struct intel_engine_cs *engine;
>>>> +};
>>>> +
>>>> +int i915_create_fence_timeline(struct drm_device *dev,
>>>> +                   struct intel_context *ctx,
>>>> +                   struct intel_engine_cs *ring);
>>>> +
>>>>    /* This must match up with the value previously used for execbuf2.rsvd1. */
>>>>    #define DEFAULT_CONTEXT_HANDLE 0
>>>>    @@ -869,6 +882,7 @@ struct intel_context {
>>>>            u64 lrc_desc;
>>>>            uint32_t *lrc_reg_state;
>>>>            bool initialised;
>>>> +        struct i915_fence_timeline fence_timeline;
>>> This is still fundamentally wrong. This i915_fence_timeline is both the
>>> fence context and timeline. Our timelines are singular per vm, with a
>>> fence context under each timeline per engine.
>> As I said last time you mentioned 'per vm', please explain. Where does a virtual machine come in? Or is that not what you mean? What is the purpose of having multiple fence contexts within a single timeline? It will stop external uses attempting to combine fences but it won't stop them from actually being out of order. The timeline needs to be per hardware context not simply per engine because the whole point is that requests should not be re-ordered once they have been allocated a point on a timeline. However, the purpose of the scheduler (which is what this series is preparation for) is to re-order requests between contexts. Thus the timeline must be per context to prevent requests ending up with out of order completions.
> Virtual memory, not machine. Guessing i915_address_space :-)
How do you have a timeline per 'virtual memory'? Guessing is pointless 
when trying to get patches written and accepted. A design is not 
complete when it is based on guesses about what the important 
stakeholders mean.

> Fence contexts may not have to be bound to a single timeline,
Fence contexts absolutely have to be bound to a single timeline. The 
context is used to say whether any two fences can be combined by 
numerical comparison of their timeline points. If they share a context 
but have points generated from different timelines then any such 
comparison is broken. The fence context is the timeline!

>   lets say you have 2 engines, running commands for Com1 and Com2.
>
> ENG1: Com1.1
> ENG2: Com2.1 Com1.2 Com1.3
>
> then there's no harm if you end up with multiple fence contexts since they're independently executing.
Not sure what you are meaning here. If Com1.1 and Com1.2 are sharing 
fence context '1' and have timeline values '1' and '2' then the system 
is broken. The fence framework could merge the two by simply discarding 
Com1.1 because '2' is greater than '1' (timeline value) and '1' is equal 
to '1' (context). Thus a user would end up waiting on only Com1.2, 
however, being executed concurrently on different engines means that 
Com1.2 might actually complete before Com1.1. And hence the user is 
broken as it now thinks everything is finished when Engine1 is still 
running its work.

>>> Please complete the legacy/execlists unification first so that we can
>>> have timelines work correctly for the whole driver.
>> No. That is a much larger task that people are working towards. However, we urgently need a scheduler for specific customers to use right now. That means we need to get something working right now not in some random number of years time. If an intermediate step is less than perfect but still functional that is better than not having anything at all.
>>
>>> -Chris
>>>
>> _______________________________________________
>> Intel-gfx mailing list
>> Intel-gfx@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 1/6] drm/i915: Add per context timelines to fence object
  2016-05-18 12:49         ` John Harrison
@ 2016-05-18 13:20           ` Maarten Lankhorst
  0 siblings, 0 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2016-05-18 13:20 UTC (permalink / raw)
  To: John Harrison, Chris Wilson, Intel-GFX

Op 18-05-16 om 14:49 schreef John Harrison:
> On 18/05/2016 13:22, Maarten Lankhorst wrote:
>> Op 13-05-16 om 11:16 schreef John Harrison:
>>> On 13/05/2016 08:39, Chris Wilson wrote:
>>>> On Thu, May 12, 2016 at 10:06:31PM +0100, John.C.Harrison@Intel.com wrote:
>>>>> From: John Harrison <John.C.Harrison@Intel.com>
>>>>>
>>>>> The fence object used inside the request structure requires a sequence
>>>>> number. Although this is not used by the i915 driver itself, it could
>>>>> potentially be used by non-i915 code if the fence is passed outside of
>>>>> the driver. This is the intention as it allows external kernel drivers
>>>>> and user applications to wait on batch buffer completion
>>>>> asynchronously via the dma-buff fence API.
>>>>>
>>>>> To ensure that such external users are not confused by strange things
>>>>> happening with the seqno, this patch adds in a per context timeline
>>>>> that can provide a guaranteed in-order seqno value for the fence. This
>>>>> is safe because the scheduler will not re-order batch buffers within a
>>>>> context - they are considered to be mutually dependent.
>>>>>
>>>>> v2: New patch in series.
>>>>>
>>>>> v3: Renamed/retyped timeline structure fields after review comments by
>>>>> Tvrtko Ursulin.
>>>>>
>>>>> Added context information to the timeline's name string for better
>>>>> identification in debugfs output.
>>>>>
>>>>> v5: Line wrapping and other white space fixes to keep style checker
>>>>> happy.
>>>>>
>>>>> v7: Updated to newer nightly (lots of ring -> engine renaming).
>>>>>
>>>>> v8: Moved to earlier in patch series so no longer needs to remove the
>>>>> quick hack timeline that was being added before.
>>>>>
>>>>> For: VIZ-5190
>>>>> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
>>>>> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
>>>>> ---
>>>>>    drivers/gpu/drm/i915/i915_drv.h         | 14 ++++++++++++
>>>>>    drivers/gpu/drm/i915/i915_gem.c         | 40 +++++++++++++++++++++++++++++++++
>>>>>    drivers/gpu/drm/i915/i915_gem_context.c | 14 ++++++++++++
>>>>>    drivers/gpu/drm/i915/intel_lrc.c        |  8 +++++++
>>>>>    4 files changed, 76 insertions(+)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index d5496ab..520480b 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -818,6 +818,19 @@ struct i915_ctx_hang_stats {
>>>>>        bool banned;
>>>>>    };
>>>>>    +struct i915_fence_timeline {
>>>>> +    char        name[32];
>>>>> +    unsigned    fence_context;
>>>>> +    unsigned    next;
>>>>> +
>>>>> +    struct intel_context *ctx;
>>>>> +    struct intel_engine_cs *engine;
>>>>> +};
>>>>> +
>>>>> +int i915_create_fence_timeline(struct drm_device *dev,
>>>>> +                   struct intel_context *ctx,
>>>>> +                   struct intel_engine_cs *ring);
>>>>> +
>>>>>    /* This must match up with the value previously used for execbuf2.rsvd1. */
>>>>>    #define DEFAULT_CONTEXT_HANDLE 0
>>>>>    @@ -869,6 +882,7 @@ struct intel_context {
>>>>>            u64 lrc_desc;
>>>>>            uint32_t *lrc_reg_state;
>>>>>            bool initialised;
>>>>> +        struct i915_fence_timeline fence_timeline;
>>>> This is still fundamentally wrong. This i915_fence_timeline is both the
>>>> fence context and timeline. Our timelines are singular per vm, with a
>>>> fence context under each timeline per engine.
>>> As I said last time you mentioned 'per vm', please explain. Where does a virtual machine come in? Or is that not what you mean? What is the purpose of having multiple fence contexts within a single timeline? It will stop external uses attempting to combine fences but it won't stop them from actually being out of order. The timeline needs to be per hardware context not simply per engine because the whole point is that requests should not be re-ordered once they have been allocated a point on a timeline. However, the purpose of the scheduler (which is what this series is preparation for) is to re-order requests between contexts. Thus the timeline must be per context to prevent requests ending up with out of order completions.
>> Virtual memory, not machine. Guessing i915_address_space :-)
> How do you have a timeline per 'virtual memory'? Guessing is pointless when trying to get patches written and accepted. A design is not complete when it is based on guesses about what the important stakeholders mean.
>
>> Fence contexts may not have to be bound to a single timeline,
> Fence contexts absolutely have to be bound to a single timeline. The context is used to say whether any two fences can be combined by numerical comparison of their timeline points. If they share a context but have points generated from different timelines then any such comparison is broken. The fence context is the timeline!
>
>>   lets say you have 2 engines, running commands for Com1 and Com2.
>>
>> ENG1: Com1.1
>> ENG2: Com2.1 Com1.2 Com1.3
>>
>> then there's no harm if you end up with multiple fence contexts since they're independently executing.
> Not sure what you are meaning here. If Com1.1 and Com1.2 are sharing fence context '1' and have timeline values '1' and '2' then the system is broken. The fence framework could merge the two by simply discarding Com1.1 because '2' is greater than '1' (timeline value) and '1' is equal to '1' (context). Thus a user would end up waiting on only Com1.2, however, being executed concurrently on different engines means that Com1.2 might actually complete before Com1.1. And hence the user is broken as it now thinks everything is finished when Engine1 is still running its work. 
Indeed. Not completely sure what the original comment meant here.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 4/6] drm/i915: Interrupt driven fences
  2016-05-12 21:06 ` [PATCH v8 4/6] drm/i915: Interrupt driven fences John.C.Harrison
  2016-05-13  7:27   ` Chris Wilson
@ 2016-05-19  9:24   ` Maarten Lankhorst
  1 sibling, 0 replies; 20+ messages in thread
From: Maarten Lankhorst @ 2016-05-19  9:24 UTC (permalink / raw)
  To: John.C.Harrison, Intel-GFX

Op 12-05-16 om 23:06 schreef John.C.Harrison@Intel.com:
> From: John Harrison <John.C.Harrison@Intel.com>
>
> The intended usage model for struct fence is that the signalled status
> should be set on demand rather than polled. That is, there should not
> be a need for a 'signaled' function to be called everytime the status
> is queried. Instead, 'something' should be done to enable a signal
> callback from the hardware which will update the state directly. In
> the case of requests, this is the seqno update interrupt. The idea is
> that this callback will only be enabled on demand when something
> actually tries to wait on the fence.
>
> This change removes the polling test and replaces it with the callback
> scheme. Each fence is added to a 'please poke me' list at the start of
> i915_add_request(). The interrupt handler then scans through the 'poke
> me' list when a new seqno pops out and signals any matching
> fence/request. The fence is then removed from the list so the entire
> request stack does not need to be scanned every time. Note that the
> fence is added to the list before the commands to generate the seqno
> interrupt are added to the ring. Thus the sequence is guaranteed to be
> race free if the interrupt is already enabled.
>
> Note that the interrupt is only enabled on demand (i.e. when
> __wait_request() is called). Thus there is still a potential race when
> enabling the interrupt as the request may already have completed.
> However, this is simply solved by calling the interrupt processing
> code immediately after enabling the interrupt and thereby checking for
> already completed requests.
>
> Lastly, the ring clean up code has the possibility to cancel
> outstanding requests (e.g. because TDR has reset the ring). These
> requests will never get signalled and so must be removed from the
> signal list manually. This is done by setting a 'cancelled' flag and
> then calling the regular notify/retire code path rather than
> attempting to duplicate the list manipulatation and clean up code in
> multiple places. This also avoid any race condition where the
> cancellation request might occur after/during the completion interrupt
> actually arriving.
>
> v2: Updated to take advantage of the request unreference no longer
> requiring the mutex lock.
>
> v3: Move the signal list processing around to prevent unsubmitted
> requests being added to the list. This was occurring on Android
> because the native sync implementation calls the
> fence->enable_signalling API immediately on fence creation.
>
> Updated after review comments by Tvrtko Ursulin. Renamed list nodes to
> 'link' instead of 'list'. Added support for returning an error code on
> a cancelled fence. Update list processing to be more efficient/safer
> with respect to spinlocks.
>
> v5: Made i915_gem_request_submit a static as it is only ever called
> from one place.
>
> Fixed up the low latency wait optimisation. The time delay between the
> seqno value being to memory and the drive's ISR running can be
> significant, at least for the wait request micro-benchmark. This can
> be greatly improved by explicitly checking for seqno updates in the
> pre-wait busy poll loop. Also added some documentation comments to the
> busy poll code.
>
> Fixed up support for the faking of lost interrupts
> (test_irq_rings/missed_irq_rings). That is, there is an IGT test that
> tells the driver to loose interrupts deliberately and then check that
> everything still works as expected (albeit much slower).
>
> Updates from review comments: use non IRQ-save spinlocking, early exit
> on WARN and improved comments (Tvrtko Ursulin).
>
> v6: Updated to newer nigthly and resolved conflicts around the
> wait_request busy spin optimisation. Also fixed a race condition
> between this early exit path and the regular completion path.
>
> v7: Updated to newer nightly - lots of ring -> engine renaming plus an
> interface change on get_seqno(). Also added a list_empty() check
> before acquring spinlocks and doing list processing.
>
> v8: Updated to newer nightly - changes to request clean up code mean
> non of the deferred free mess is needed any more.
>
> For: VIZ-5190
> Signed-off-by: John Harrison <John.C.Harrison@Intel.com>
> Cc: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Maarten Lankhorst <maarten.lankhorst@linux.intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |   7 +
>  drivers/gpu/drm/i915/i915_gem.c         | 235 +++++++++++++++++++++++++++++---
>  drivers/gpu/drm/i915/i915_irq.c         |   2 +
>  drivers/gpu/drm/i915/intel_lrc.c        |   1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.c |   1 +
>  drivers/gpu/drm/i915/intel_ringbuffer.h |   1 +
>  6 files changed, 225 insertions(+), 22 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 433d99a..3adac59 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2278,6 +2278,10 @@ struct drm_i915_gem_request {
>  	 */
>  	struct fence fence;
>  	struct rcu_head rcu_head;
> +	struct list_head signal_link;
> +	bool cancelled;
> +	bool irq_enabled;
> +	bool signal_requested;
>  
>  	/** On Which ring this request was generated */
>  	struct drm_i915_private *i915;
> @@ -2379,6 +2383,9 @@ struct drm_i915_gem_request {
>  struct drm_i915_gem_request * __must_check
>  i915_gem_request_alloc(struct intel_engine_cs *engine,
>  		       struct intel_context *ctx);
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked);
> +void i915_gem_request_notify(struct intel_engine_cs *ring, bool fence_locked);
>  
>  static inline bool i915_gem_request_completed(struct drm_i915_gem_request *req)
>  {
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 958edcb..6669508 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -39,6 +39,8 @@
>  #include <linux/pci.h>
>  #include <linux/dma-buf.h>
>  
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req);
> +
>  static void i915_gem_object_flush_gtt_write_domain(struct drm_i915_gem_object *obj);
>  static void i915_gem_object_flush_cpu_write_domain(struct drm_i915_gem_object *obj);
>  static void
> @@ -1238,9 +1240,8 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	struct intel_engine_cs *engine = i915_gem_request_get_engine(req);
>  	struct drm_device *dev = engine->dev;
>  	struct drm_i915_private *dev_priv = dev->dev_private;
> -	const bool irq_test_in_progress =
> -		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) & intel_engine_flag(engine);
>  	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
> +	uint32_t seqno;
>  	DEFINE_WAIT(wait);
>  	unsigned long timeout_expire;
>  	s64 before = 0; /* Only to silence a compiler warning. */
> @@ -1248,9 +1249,6 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  
>  	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
>  
> -	if (list_empty(&req->list))
> -		return 0;
> -
>  	if (i915_gem_request_completed(req))
>  		return 0;
>  
> @@ -1276,15 +1274,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  	trace_i915_gem_request_wait_begin(req);
>  
>  	/* Optimistic spin for the next jiffie before touching IRQs */
> -	ret = __i915_spin_request(req, state);
> -	if (ret == 0)
> -		goto out;
> -
> -	if (!irq_test_in_progress && WARN_ON(!engine->irq_get(engine))) {
> -		ret = -ENODEV;
> -		goto out;
> +	if (req->seqno) {
> +		ret = __i915_spin_request(req, state);
> +		if (ret == 0)
> +			goto out;
>  	}
>  
> +	/*
> +	 * Enable interrupt completion of the request.
> +	 */
> +	fence_enable_sw_signaling(&req->fence);
> +
>  	for (;;) {
>  		struct timer_list timer;
>  
> @@ -1307,6 +1307,21 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			break;
>  		}
>  
> +		if (req->seqno) {
> +			/*
> +			 * There is quite a lot of latency in the user interrupt
> +			 * path. So do an explicit seqno check and potentially
> +			 * remove all that delay.
> +			 */
> +			if (req->engine->irq_seqno_barrier)
> +				req->engine->irq_seqno_barrier(req->engine);
> +			seqno = engine->get_seqno(engine);
> +			if (i915_seqno_passed(seqno, req->seqno)) {
> +				ret = 0;
> +				break;
> +			}
> +		}
> +
>  		if (signal_pending_state(state, current)) {
>  			ret = -ERESTARTSYS;
>  			break;
> @@ -1333,14 +1348,32 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
>  			destroy_timer_on_stack(&timer);
>  		}
>  	}
> -	if (!irq_test_in_progress)
> -		engine->irq_put(engine);
>  
>  	finish_wait(&engine->irq_queue, &wait);
>  
>  out:
>  	trace_i915_gem_request_wait_end(req);
>  
> +	if ((ret == 0) && (req->seqno)) {
> +		if (req->engine->irq_seqno_barrier)
> +			req->engine->irq_seqno_barrier(req->engine);
> +		seqno = engine->get_seqno(engine);
> +		if (i915_seqno_passed(seqno, req->seqno) &&
> +		    !i915_gem_request_completed(req)) {
> +			/*
> +			 * Make sure the request is marked as completed before
> +			 * returning. NB: Need to acquire the spinlock around
> +			 * the whole call to avoid a race condition with the
> +			 * interrupt handler is running concurrently and could
> +			 * cause this invocation to early exit even though the
> +			 * request has not actually been fully processed yet.
> +			 */
> +			spin_lock_irq(&req->engine->fence_lock);
> +			i915_gem_request_notify(req->engine, true);
> +			spin_unlock_irq(&req->engine->fence_lock);
> +		}
> +	}
> +
>  	if (timeout) {
>  		s64 tres = *timeout - (ktime_get_raw_ns() - before);
>  
> @@ -1406,6 +1439,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  {
>  	trace_i915_gem_request_retire(request);
>  
> +	if (request->irq_enabled) {
> +		request->engine->irq_put(request->engine);
> +		request->irq_enabled = false;
> +	}
> +
>  	/* We know the GPU must have read the request to have
>  	 * sent us the seqno + interrupt, so use the position
>  	 * of tail of the request to update the last known position
> @@ -1419,6 +1457,22 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	list_del_init(&request->list);
>  	i915_gem_request_remove_from_client(request);
>  
> +	/*
> +	 * In case the request is still in the signal pending list,
> +	 * e.g. due to being cancelled by TDR, preemption, etc.
> +	 */
> +	if (!list_empty(&request->signal_link)) {
> +		/*
> +		 * The request must be marked as cancelled and the underlying
> +		 * fence as failed. NB: There is no explicit fence fail API,
> +		 * there is only a manual poke and signal.
> +		 */
> +		request->cancelled = true;
> +		/* How to propagate to any associated sync_fence??? */
> +		request->fence.status = -EIO;
> +		fence_signal_locked(&request->fence);
> +	}
> +
>  	if (request->previous_context) {
>  		if (i915.enable_execlists)
>  			intel_lr_context_unpin(request->previous_context,
> @@ -2650,6 +2704,12 @@ void __i915_add_request(struct drm_i915_gem_request *request,
>  	 */
>  	request->postfix = intel_ring_get_tail(ringbuf);
>  
> +	/*
> +	 * Add the fence to the pending list before emitting the commands to
> +	 * generate a seqno notification interrupt.
> +	 */
> +	i915_gem_request_submit(request);
> +
>  	if (i915.enable_execlists)
>  		ret = engine->emit_request(request);
>  	else {
> @@ -2735,25 +2795,141 @@ static void i915_gem_request_free(struct fence *req_fence)
>  	struct drm_i915_gem_request *req;
>  
>  	req = container_of(req_fence, typeof(*req), fence);
> +
> +	WARN_ON(req->irq_enabled);
> +
>  	call_rcu(&req->rcu_head, i915_gem_request_free_rcu);
>  }
>  
> -static bool i915_gem_request_enable_signaling(struct fence *req_fence)
> +/*
> + * The request is about to be submitted to the hardware so add the fence to
> + * the list of signalable fences.
> + *
> + * NB: This does not necessarily enable interrupts yet. That only occurs on
> + * demand when the request is actually waited on. However, adding it to the
> + * list early ensures that there is no race condition where the interrupt
> + * could pop out prematurely and thus be completely lost. The race is merely
> + * that the interrupt must be manually checked for after being enabled.
> + */
> +static void i915_gem_request_submit(struct drm_i915_gem_request *req)
>  {
> -	/* Interrupt driven fences are not implemented yet.*/
> -	WARN(true, "This should not be called!");
> -	return true;
> +	/*
> +	 * Always enable signal processing for the request's fence object
> +	 * before that request is submitted to the hardware. Thus there is no
> +	 * race condition whereby the interrupt could pop out before the
> +	 * request has been added to the signal list. Hence no need to check
> +	 * for completion, undo the list add and return false.
> +	 */
> +	i915_gem_request_reference(req);
> +	spin_lock_irq(&req->engine->fence_lock);
> +	WARN_ON(!list_empty(&req->signal_link));
> +	list_add_tail(&req->signal_link, &req->engine->fence_signal_list);
> +	spin_unlock_irq(&req->engine->fence_lock);
> +
> +	/*
> +	 * NB: Interrupts are only enabled on demand. Thus there is still a
> +	 * race where the request could complete before the interrupt has
> +	 * been enabled. Thus care must be taken at that point.
> +	 */
> +
> +	/* Have interrupts already been requested? */
> +	if (req->signal_requested)
> +		i915_gem_request_enable_interrupt(req, false);
> +}
> +
> +/*
> + * The request is being actively waited on, so enable interrupt based
> + * completion signalling.
> + */
> +void i915_gem_request_enable_interrupt(struct drm_i915_gem_request *req,
> +				       bool fence_locked)
> +{
> +	struct drm_i915_private *dev_priv = to_i915(req->engine->dev);
> +	const bool irq_test_in_progress =
> +		ACCESS_ONCE(dev_priv->gpu_error.test_irq_rings) &
> +						intel_engine_flag(req->engine);
> +
> +	if (req->irq_enabled)
> +		return;
> +
> +	if (irq_test_in_progress)
> +		return;
> +
> +	if (!WARN_ON(!req->engine->irq_get(req->engine)))
> +		req->irq_enabled = true;
> +
> +	/*
> +	 * Because the interrupt is only enabled on demand, there is a race
> +	 * where the interrupt can fire before anyone is looking for it. So
> +	 * do an explicit check for missed interrupts.
> +	 */
> +	i915_gem_request_notify(req->engine, fence_locked);
>  }
>  
> -static bool i915_gem_request_is_completed(struct fence *req_fence)
> +static bool i915_gem_request_enable_signaling(struct fence *req_fence)
>  {
>  	struct drm_i915_gem_request *req = container_of(req_fence,
>  						 typeof(*req), fence);
> +
> +	/*
> +	 * No need to actually enable interrupt based processing until the
> +	 * request has been submitted to the hardware. At which point
> +	 * 'i915_gem_request_submit()' is called. So only really enable
> +	 * signalling in there. Just set a flag to say that interrupts are
> +	 * wanted when the request is eventually submitted. On the other hand
> +	 * if the request has already been submitted then interrupts do need
> +	 * to be enabled now.
> +	 */
> +
> +	req->signal_requested = true;
> +
> +	if (!list_empty(&req->signal_link))
> +		i915_gem_request_enable_interrupt(req, true);
> +
> +	return true;
> +}
> +
> +void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
> +{
> +	struct drm_i915_gem_request *req, *req_next;
> +	unsigned long flags;
>  	u32 seqno;
>  
> -	seqno = req->engine->get_seqno(req->engine);
> +	if (list_empty(&engine->fence_signal_list))
> +		return;
> +
If you don't hold the lock then you should probably use list_empty_careful.

~Maarten
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v8 4/6] drm/i915: Interrupt driven fences
  2016-05-13  9:19     ` John Harrison
@ 2016-05-19  9:47       ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2016-05-19  9:47 UTC (permalink / raw)
  To: John Harrison; +Cc: Intel-GFX

On Fri, May 13, 2016 at 10:19:11AM +0100, John Harrison wrote:
> On 13/05/2016 08:27, Chris Wilson wrote:
> >On Thu, May 12, 2016 at 10:06:34PM +0100, John.C.Harrison@Intel.com wrote:
> >>+void i915_gem_request_notify(struct intel_engine_cs *engine, bool fence_locked)
> >>+{
> >>+	struct drm_i915_gem_request *req, *req_next;
> >>+	unsigned long flags;
> >>  	u32 seqno;
> >>-	seqno = req->engine->get_seqno(req->engine);
> >>+	if (list_empty(&engine->fence_signal_list))
> >>+		return;
> >>+
> >>+	if (!fence_locked)
> >>+		spin_lock_irqsave(&engine->fence_lock, flags);
> >>+
> >>+	if (engine->irq_seqno_barrier)
> >>+		engine->irq_seqno_barrier(engine);
> >>+	seqno = engine->get_seqno(engine);
> >>+
> >>+	list_for_each_entry_safe(req, req_next, &engine->fence_signal_list, signal_link) {
> >NO, NO, NO. As we said the very first time, you cannot do this from an
> >irq handler.
> >
> >The current code is already bad enough, this is making it large constant
> >+ N times worse. Please do look at how to do signal driven fences in O(1)
> >that I posted many months ago and several times since.
> 
> If you have a better solution available then please post it in a
> form that can be merged and get it reviewed and accepted.

Recently reposted as
https://lists.freedesktop.org/archives/intel-gfx/2016-May/095040.html

It is currently blocking getting request + vma tracking overhauled to
fix our leaks, as well as the regression fixes that depend upon them.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-05-19  9:47 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-05-12 21:06 [PATCH v8 0/6] Convert requests to use struct fence John.C.Harrison
2016-05-12 21:06 ` [PATCH v8 1/6] drm/i915: Add per context timelines to fence object John.C.Harrison
2016-05-13  7:39   ` Chris Wilson
2016-05-13  9:16     ` John Harrison
2016-05-18 12:22       ` Maarten Lankhorst
2016-05-18 12:49         ` John Harrison
2016-05-18 13:20           ` Maarten Lankhorst
2016-05-12 21:06 ` [PATCH v8 2/6] drm/i915: Convert requests to use struct fence John.C.Harrison
2016-05-12 21:06 ` [PATCH v8 3/6] drm/i915: Removed now redundant parameter to i915_gem_request_completed() John.C.Harrison
2016-05-12 21:06 ` [PATCH v8 4/6] drm/i915: Interrupt driven fences John.C.Harrison
2016-05-13  7:27   ` Chris Wilson
2016-05-13  9:19     ` John Harrison
2016-05-19  9:47       ` Chris Wilson
2016-05-19  9:24   ` Maarten Lankhorst
2016-05-12 21:06 ` [PATCH v8 5/6] drm/i915: Updated request structure tracing John.C.Harrison
2016-05-12 21:06 ` [PATCH v8 6/6] drm/i915: Cache last IRQ seqno to reduce IRQ overhead John.C.Harrison
2016-05-13  7:41   ` Chris Wilson
2016-05-12 21:10 ` ✗ Ro.CI.BAT: failure for Convert requests to use struct fence (rev5) Patchwork
2016-05-13  7:44 ` [PATCH v8 0/6] Convert requests to use struct fence Chris Wilson
2016-05-13 10:11   ` John Harrison

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.