All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3 0/7] Misc cleanups and locking fixes
@ 2016-01-11 14:08 Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 1/7] drm/i915/bdw+: Replace list_del+list_add_tail with list_move_tail Tvrtko Ursulin
                   ` (6 more replies)
  0 siblings, 7 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Mostly fixing callers who call GEM API wo/ holding the struct_mutex
and a few small code cleanups. 

Tvrtko Ursulin (7):
  drm/i915/bdw+: Replace list_del+list_add_tail with list_move_tail
  drm/i915: Do not call API requiring struct_mutex where it is not
    available
  drm/i915: Cache ringbuffer GTT address
  drm/i915: Cache LRC state page in the context
  drm/i915: Don't need a timer to wake us up
  drm/i915: Only grab timestamps when needed
  drm/i915: GEM operations need to be done under the big lock

 drivers/gpu/drm/i915/i915_debugfs.c     |  15 ++--
 drivers/gpu/drm/i915/i915_drv.h         |   3 +
 drivers/gpu/drm/i915/i915_gem.c         |  40 ++++------
 drivers/gpu/drm/i915/i915_gem_stolen.c  |   3 +
 drivers/gpu/drm/i915/intel_display.c    |   8 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 125 ++++++++++++++++----------------
 drivers/gpu/drm/i915/intel_lrc.h        |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |   3 +
 drivers/gpu/drm/i915/intel_ringbuffer.h |   3 +
 9 files changed, 103 insertions(+), 101 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH 1/7] drm/i915/bdw+: Replace list_del+list_add_tail with list_move_tail
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available Tvrtko Ursulin
                   ` (5 subsequent siblings)
  6 siblings, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Same effect for slightly less source code and resulting binary.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_lrc.c | 15 ++++++---------
 1 file changed, 6 insertions(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index f7fac5f3b5ce..ab344e0b878c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -431,9 +431,8 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			cursor->elsp_submitted = req0->elsp_submitted;
-			list_del(&req0->execlist_link);
-			list_add_tail(&req0->execlist_link,
-				&ring->execlist_retired_req_list);
+			list_move_tail(&req0->execlist_link,
+				       &ring->execlist_retired_req_list);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -485,9 +484,8 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 			     "Never submitted head request\n");
 
 			if (--head_req->elsp_submitted <= 0) {
-				list_del(&head_req->execlist_link);
-				list_add_tail(&head_req->execlist_link,
-					&ring->execlist_retired_req_list);
+				list_move_tail(&head_req->execlist_link,
+					       &ring->execlist_retired_req_list);
 				return true;
 			}
 		}
@@ -608,9 +606,8 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 		if (request->ctx == tail_req->ctx) {
 			WARN(tail_req->elsp_submitted != 0,
 				"More than 2 already-submitted reqs queued\n");
-			list_del(&tail_req->execlist_link);
-			list_add_tail(&tail_req->execlist_link,
-				&ring->execlist_retired_req_list);
+			list_move_tail(&tail_req->execlist_link,
+				       &ring->execlist_retired_req_list);
 		}
 	}
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 1/7] drm/i915/bdw+: Replace list_del+list_add_tail with list_move_tail Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:35   ` Chris Wilson
  2016-01-12 11:41   ` [PATCH v2 " Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 3/7] drm/i915: Cache ringbuffer GTT address Tvrtko Ursulin
                   ` (4 subsequent siblings)
  6 siblings, 2 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).

To avoid that this patch caches some interesting offsets and
values in the engine and context structures.

Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.

Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.

This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_debugfs.c     | 15 +++---
 drivers/gpu/drm/i915/i915_drv.h         |  2 +
 drivers/gpu/drm/i915/intel_lrc.c        | 90 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/intel_lrc.h        |  4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  2 +
 5 files changed, 56 insertions(+), 57 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e3377abc0d4d..0b3550f05026 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1994,12 +1994,13 @@ static int i915_context_status(struct seq_file *m, void *unused)
 }
 
 static void i915_dump_lrc_obj(struct seq_file *m,
-			      struct intel_engine_cs *ring,
-			      struct drm_i915_gem_object *ctx_obj)
+			      struct intel_context *ctx,
+			      struct intel_engine_cs *ring)
 {
 	struct page *page;
 	uint32_t *reg_state;
 	int j;
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	unsigned long ggtt_offset = 0;
 
 	if (ctx_obj == NULL) {
@@ -2009,7 +2010,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 	}
 
 	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-		   intel_execlists_ctx_id(ctx_obj));
+		   intel_execlists_ctx_id(ctx, ring));
 
 	if (!i915_gem_obj_ggtt_bound(ctx_obj))
 		seq_puts(m, "\tNot bound in GGTT\n");
@@ -2058,8 +2059,7 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		for_each_ring(ring, dev_priv, i) {
 			if (ring->default_context != ctx)
-				i915_dump_lrc_obj(m, ring,
-						  ctx->engine[i].state);
+				i915_dump_lrc_obj(m, ctx, ring);
 		}
 	}
 
@@ -2133,11 +2133,8 @@ static int i915_execlists(struct seq_file *m, void *data)
 
 		seq_printf(m, "\t%d requests in queue\n", count);
 		if (head_req) {
-			struct drm_i915_gem_object *ctx_obj;
-
-			ctx_obj = head_req->ctx->engine[ring_id].state;
 			seq_printf(m, "\tHead request id: %u\n",
-				   intel_execlists_ctx_id(ctx_obj));
+				   intel_execlists_ctx_id(head_req->ctx, ring));
 			seq_printf(m, "\tHead request tail: %u\n",
 				   head_req->tail);
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 747d2d84a18c..4b0932634cc9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -883,6 +883,8 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
+		u32 lrc_offset;
+		u64 lrc_desc;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ab344e0b878c..066b3b8cae35 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -265,7 +265,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 
 /**
  * intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
+ * @ctx: Context to get the ID for
+ * @ring: Engine to get the ID for
  *
  * Do not confuse with ctx->id! Unfortunately we have a name overload
  * here: the old context ID we pass to userspace as a handler so that
@@ -275,14 +276,12 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
  *
  * Return: 20-bits globally unique context ID.
  */
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+			   struct intel_engine_cs *ring)
 {
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
 	/* LRCA is required to be 4K aligned so the more significant 20 bits
 	 * are globally unique */
-	return lrca >> 12;
+	return ctx->engine[ring->id].lrc_offset >> 12;
 }
 
 static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
@@ -297,31 +296,7 @@ static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
-	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
-
-	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
-	if (IS_GEN8(ctx_obj->base.dev))
-		desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
-
-	/* TODO: WaDisableLiteRestore when we start using semaphore
-	 * signalling between Command Streamers */
-	/* desc |= GEN8_CTX_FORCE_RESTORE; */
-
-	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
-	if (disable_lite_restore_wa(ring))
-		desc |= GEN8_CTX_FORCE_RESTORE;
-
-	return desc;
+	return ctx->engine[ring->id].lrc_desc;
 }
 
 static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
@@ -369,8 +344,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	uint32_t *reg_state;
 
 	BUG_ON(!ctx_obj);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
-	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
 	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
@@ -477,9 +450,7 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 					    execlist_link);
 
 	if (head_req != NULL) {
-		struct drm_i915_gem_object *ctx_obj =
-				head_req->ctx->engine[ring->id].state;
-		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+		if (intel_execlists_ctx_id(head_req->ctx, ring) == request_id) {
 			WARN(head_req->elsp_submitted == 0,
 			     "Never submitted head request\n");
 
@@ -556,7 +527,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 		}
 	}
 
-	if (disable_lite_restore_wa(ring)) {
+	if (ring->disable_lite_restore_wa) {
 		/* Prevent a ctx to preempt itself */
 		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
 		    (submit_contexts != 0))
@@ -1039,14 +1010,16 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
 }
 
 static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *ctx_obj,
-		struct intel_ringbuffer *ringbuf)
+				   struct intel_context *ctx)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret = 0;
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+
 	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
 			PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 	if (ret)
@@ -1056,6 +1029,11 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto unpin_ctx_obj;
 
+	ctx->engine[ring->id].lrc_offset = i915_gem_obj_ggtt_offset(ctx_obj) +
+					   LRC_PPHWSP_PN * PAGE_SIZE;
+	ctx->engine[ring->id].lrc_desc = ring->ctx_desc_template |
+					 ctx->engine[ring->id].lrc_offset |
+					 ((u64)intel_execlists_ctx_id(ctx, ring) << GEN8_CTX_ID_SHIFT);
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1074,11 +1052,9 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 {
 	int ret = 0;
 	struct intel_engine_cs *ring = rq->ring;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = rq->ringbuf;
 
 	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
-		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
+		ret = intel_lr_context_do_pin(ring, rq->ctx);
 		if (ret)
 			goto reset_pin_count;
 	}
@@ -1100,6 +1076,8 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
+			rq->ctx->engine[ring->id].lrc_offset = 0;
+			rq->ctx->engine[ring->id].lrc_desc = 0;
 		}
 	}
 }
@@ -1938,6 +1916,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		ring->status_page.obj = NULL;
 	}
 
+	ring->disable_lite_restore_wa = false;
+	ring->ctx_desc_template = 0;
+
 	lrc_destroy_wa_ctx_obj(ring);
 	ring->dev = NULL;
 }
@@ -1960,6 +1941,24 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
 	spin_lock_init(&ring->execlist_lock);
 
+	ring->disable_lite_restore_wa = disable_lite_restore_wa(ring);
+
+	ring->ctx_desc_template = GEN8_CTX_VALID;
+	ring->ctx_desc_template |= GEN8_CTX_ADDRESSING_MODE(dev) <<
+				   GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	if (IS_GEN8(dev))
+		ring->ctx_desc_template |= GEN8_CTX_L3LLC_COHERENT;
+	ring->ctx_desc_template |= GEN8_CTX_PRIVILEGE;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE; */
+
+	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
+	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
+	if (ring->disable_lite_restore_wa)
+		ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE;
+
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		goto error;
@@ -1969,10 +1968,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 		goto error;
 
 	/* As this is the default context, always pin it */
-	ret = intel_lr_context_do_pin(
-			ring,
-			ring->default_context->engine[ring->id].state,
-			ring->default_context->engine[ring->id].ringbuf);
+	ret = intel_lr_context_do_pin(ring, ring->default_context);
 	if (ret) {
 		DRM_ERROR(
 			"Failed to pin and map ringbuffer %s: %d\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index de41ad6cd63d..49af638f6213 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -107,13 +107,15 @@ void intel_lr_context_reset(struct drm_device *dev,
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
 
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+			   struct intel_engine_cs *ring);
+
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
 struct i915_execbuffer_params;
 int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7349d9258191..85ce2272f92c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -269,6 +269,8 @@ struct  intel_engine_cs {
 	struct list_head execlist_queue;
 	struct list_head execlist_retired_req_list;
 	u8 next_context_status_buffer;
+	bool disable_lite_restore_wa;
+	u32 ctx_desc_template;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct drm_i915_gem_request *request);
 	int		(*emit_flush)(struct drm_i915_gem_request *request,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 3/7] drm/i915: Cache ringbuffer GTT address
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 1/7] drm/i915/bdw+: Replace list_del+list_add_tail with list_move_tail Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:30   ` Chris Wilson
  2016-01-11 14:31   ` Chris Wilson
  2016-01-11 14:08 ` [PATCH 4/7] drm/i915: Cache LRC state page in the context Tvrtko Ursulin
                   ` (3 subsequent siblings)
  6 siblings, 2 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Purpose is to avoid calling i915_gem_obj_ggtt_offset from the
interrupt context without the big lock held.

v2: Renamed gtt_start to gtt_offset. (Daniel Vetter)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/intel_lrc.c        | 3 +--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 3 +++
 drivers/gpu/drm/i915/intel_ringbuffer.h | 1 +
 3 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 066b3b8cae35..321674d5e1f9 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -339,7 +339,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	struct intel_engine_cs *ring = rq->ring;
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
 	struct page *page;
 	uint32_t *reg_state;
 
@@ -349,7 +348,7 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	reg_state = kmap_atomic(page);
 
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
+	reg_state[CTX_RING_BUFFER_START+1] = rq->ringbuf->gtt_offset;
 
 	if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		/* True 32b PPGTT with dynamic page allocation: update PDP
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index 339701d7a9a5..48b64fd28644 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1988,6 +1988,7 @@ void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
 	else
 		iounmap(ringbuf->virtual_start);
 	ringbuf->virtual_start = NULL;
+	ringbuf->gtt_offset = 0;
 	i915_gem_object_ggtt_unpin(ringbuf->obj);
 }
 
@@ -2054,6 +2055,8 @@ int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
 		}
 	}
 
+	ringbuf->gtt_offset = i915_gem_obj_ggtt_offset(obj);
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 85ce2272f92c..4fa46eb62607 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -99,6 +99,7 @@ struct intel_ring_hangcheck {
 struct intel_ringbuffer {
 	struct drm_i915_gem_object *obj;
 	void __iomem *virtual_start;
+	u64 gtt_offset;
 
 	struct intel_engine_cs *ring;
 	struct list_head link;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 4/7] drm/i915: Cache LRC state page in the context
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2016-01-11 14:08 ` [PATCH 3/7] drm/i915: Cache ringbuffer GTT address Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:29   ` Chris Wilson
  2016-01-11 14:08 ` [PATCH 5/7] drm/i915: Don't need a timer to wake us up Tvrtko Ursulin
                   ` (2 subsequent siblings)
  6 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LRC lifetime is well defined so we can cache the page pointing
to the object backing store in the context in order to avoid
walking over the object SG page list from the interrupt context
without the big lock held.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h  |  1 +
 drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++------
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 4b0932634cc9..377cdcb10ad5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,6 +885,7 @@ struct intel_context {
 		int pin_count;
 		u32 lrc_offset;
 		u64 lrc_desc;
+		struct page *lrc_state_page;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 321674d5e1f9..02ef9413700d 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -338,14 +338,10 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 {
 	struct intel_engine_cs *ring = rq->ring;
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct page *page;
 	uint32_t *reg_state;
 
-	BUG_ON(!ctx_obj);
-
-	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	reg_state = kmap_atomic(page);
+	BUG_ON(!rq->ctx->engine[ring->id].lrc_state_page);
+	reg_state = kmap_atomic(rq->ctx->engine[ring->id].lrc_state_page);
 
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 	reg_state[CTX_RING_BUFFER_START+1] = rq->ringbuf->gtt_offset;
@@ -1015,6 +1011,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct page *lrc_state_page;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -1024,6 +1021,12 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
+	lrc_state_page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
+	if (WARN_ON(!lrc_state_page)) {
+		ret = -ENODEV;
+		goto unpin_ctx_obj;
+	}
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
@@ -1033,6 +1036,7 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	ctx->engine[ring->id].lrc_desc = ring->ctx_desc_template |
 					 ctx->engine[ring->id].lrc_offset |
 					 ((u64)intel_execlists_ctx_id(ctx, ring) << GEN8_CTX_ID_SHIFT);
+	ctx->engine[ring->id].lrc_state_page = lrc_state_page;
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1077,6 +1081,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			rq->ctx->engine[ring->id].lrc_offset = 0;
 			rq->ctx->engine[ring->id].lrc_desc = 0;
+			rq->ctx->engine[ring->id].lrc_state_page = NULL;
 		}
 	}
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 5/7] drm/i915: Don't need a timer to wake us up
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2016-01-11 14:08 ` [PATCH 4/7] drm/i915: Cache LRC state page in the context Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:33   ` Chris Wilson
  2016-01-11 14:08 ` [PATCH 6/7] drm/i915: Only grab timestamps when needed Tvrtko Ursulin
  2016-01-11 14:08 ` [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock Tvrtko Ursulin
  6 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We can avoid open-coding the schedule wake-up since

   commit 9cff8adeaa34b5d2802f03f89803da57856b3b72
   Author: NeilBrown <neilb@suse.de>
   Date:   Fri Feb 13 15:49:17 2015 +1100

       sched: Prevent recursion in io_schedule()

exported the io_schedule_timeout function which we can now use
to simplify the code in __i915_wait_request.

v2: New commit message. (Daniel Vetter)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 28 ++++++++--------------------
 1 file changed, 8 insertions(+), 20 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 6c60e04fc09c..de98dc41fb9f 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1135,11 +1135,6 @@ i915_gem_check_wedge(struct i915_gpu_error *error,
 	return 0;
 }
 
-static void fake_irq(unsigned long data)
-{
-	wake_up_process((struct task_struct *)data);
-}
-
 static bool missed_irq(struct drm_i915_private *dev_priv,
 		       struct intel_engine_cs *ring)
 {
@@ -1291,7 +1286,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	}
 
 	for (;;) {
-		struct timer_list timer;
+		long sched_timeout;
 
 		prepare_to_wait(&ring->irq_queue, &wait, state);
 
@@ -1321,21 +1316,14 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			break;
 		}
 
-		timer.function = NULL;
-		if (timeout || missed_irq(dev_priv, ring)) {
-			unsigned long expire;
-
-			setup_timer_on_stack(&timer, fake_irq, (unsigned long)current);
-			expire = missed_irq(dev_priv, ring) ? jiffies + 1 : timeout_expire;
-			mod_timer(&timer, expire);
-		}
-
-		io_schedule();
+		if (timeout)
+			sched_timeout = timeout_expire - jiffies;
+		else if (missed_irq(dev_priv, ring))
+			sched_timeout = 1;
+		else
+			sched_timeout = MAX_SCHEDULE_TIMEOUT;
 
-		if (timer.function) {
-			del_singleshot_timer_sync(&timer);
-			destroy_timer_on_stack(&timer);
-		}
+		io_schedule_timeout(sched_timeout);
 	}
 	if (!irq_test_in_progress)
 		ring->irq_put(ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 6/7] drm/i915: Only grab timestamps when needed
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2016-01-11 14:08 ` [PATCH 5/7] drm/i915: Don't need a timer to wake us up Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:36   ` Chris Wilson
  2016-01-11 14:08 ` [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock Tvrtko Ursulin
  6 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

No need to call ktime_get_raw_ns twice per unlimited wait and can
also elimate a local variable.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index de98dc41fb9f..c4f69579eb7a 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1246,7 +1246,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
-	s64 before, now;
+	s64 before = 0;
 	int ret;
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
@@ -1266,14 +1266,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			return -ETIME;
 
 		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+
+		/*
+		 * Record current time in case interrupted by signal, or wedged.
+		 */
+		before = ktime_get_raw_ns();
 	}
 
 	if (INTEL_INFO(dev_priv)->gen >= 6)
 		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
 
-	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
-	before = ktime_get_raw_ns();
 
 	/* Optimistic spin for the next jiffie before touching IRQs */
 	ret = __i915_spin_request(req, state);
@@ -1331,11 +1334,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	finish_wait(&ring->irq_queue, &wait);
 
 out:
-	now = ktime_get_raw_ns();
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
-		s64 tres = *timeout - (now - before);
+		s64 tres = *timeout - (ktime_get_raw_ns() - before);
 
 		*timeout = tres < 0 ? 0 : tres;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
                   ` (5 preceding siblings ...)
  2016-01-11 14:08 ` [PATCH 6/7] drm/i915: Only grab timestamps when needed Tvrtko Ursulin
@ 2016-01-11 14:08 ` Tvrtko Ursulin
  2016-01-11 14:38   ` Chris Wilson
  6 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:08 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

VMA creation and GEM list management need the big lock.

v2:

Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)

Not to mention drm_gem_object_unreference was there in existing
code with no mutex held.

v3:

Some callers of i915_gem_object_create_stolen_for_preallocated
already hold the lock so move the mutex into the other caller
as well.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_gem_stolen.c | 3 +++
 drivers/gpu/drm/i915/intel_display.c   | 8 ++++++--
 2 files changed, 9 insertions(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem_stolen.c b/drivers/gpu/drm/i915/i915_gem_stolen.c
index c384dc9c8a63..a97a5e762c0f 100644
--- a/drivers/gpu/drm/i915/i915_gem_stolen.c
+++ b/drivers/gpu/drm/i915/i915_gem_stolen.c
@@ -635,6 +635,9 @@ i915_gem_object_create_stolen_for_preallocated(struct drm_device *dev,
 	if (!drm_mm_initialized(&dev_priv->mm.stolen))
 		return NULL;
 
+	if (WARN_ON_ONCE(!mutex_is_locked(&dev->struct_mutex)))
+		return NULL;
+
 	DRM_DEBUG_KMS("creating preallocated stolen object: stolen_offset=%x, gtt_offset=%x, size=%x\n",
 			stolen_offset, gtt_offset, size);
 
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b4cf9ce16155..073d6c99406b 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -2534,12 +2534,16 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	if (size_aligned * 2 > dev_priv->gtt.stolen_usable_size)
 		return false;
 
+	mutex_lock(&dev->struct_mutex);
+
 	obj = i915_gem_object_create_stolen_for_preallocated(dev,
 							     base_aligned,
 							     base_aligned,
 							     size_aligned);
-	if (!obj)
+	if (!obj) {
+		mutex_unlock(&dev->struct_mutex);
 		return false;
+	}
 
 	obj->tiling_mode = plane_config->tiling;
 	if (obj->tiling_mode == I915_TILING_X)
@@ -2552,12 +2556,12 @@ intel_alloc_initial_plane_obj(struct intel_crtc *crtc,
 	mode_cmd.modifier[0] = fb->modifier[0];
 	mode_cmd.flags = DRM_MODE_FB_MODIFIERS;
 
-	mutex_lock(&dev->struct_mutex);
 	if (intel_framebuffer_init(dev, to_intel_framebuffer(fb),
 				   &mode_cmd, obj)) {
 		DRM_DEBUG_KMS("intel fb init failed\n");
 		goto out_unref_obj;
 	}
+
 	mutex_unlock(&dev->struct_mutex);
 
 	DRM_DEBUG_KMS("initial plane fb obj %p\n", obj);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/7] drm/i915: Cache LRC state page in the context
  2016-01-11 14:08 ` [PATCH 4/7] drm/i915: Cache LRC state page in the context Tvrtko Ursulin
@ 2016-01-11 14:29   ` Chris Wilson
  2016-01-11 15:07     ` Tvrtko Ursulin
                       ` (2 more replies)
  0 siblings, 3 replies; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:29 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 02:08:38PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> LRC lifetime is well defined so we can cache the page pointing
> to the object backing store in the context in order to avoid
> walking over the object SG page list from the interrupt context
> without the big lock held.

You can keep kmap around for the pinned lifetime of the context.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/7] drm/i915: Cache ringbuffer GTT address
  2016-01-11 14:08 ` [PATCH 3/7] drm/i915: Cache ringbuffer GTT address Tvrtko Ursulin
@ 2016-01-11 14:30   ` Chris Wilson
  2016-01-11 14:31   ` Chris Wilson
  1 sibling, 0 replies; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:30 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Mon, Jan 11, 2016 at 02:08:37PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Purpose is to avoid calling i915_gem_obj_ggtt_offset from the
> interrupt context without the big lock held.
> 
> v2: Renamed gtt_start to gtt_offset. (Daniel Vetter)

No, we want to track the VMA.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/7] drm/i915: Cache ringbuffer GTT address
  2016-01-11 14:08 ` [PATCH 3/7] drm/i915: Cache ringbuffer GTT address Tvrtko Ursulin
  2016-01-11 14:30   ` Chris Wilson
@ 2016-01-11 14:31   ` Chris Wilson
  2016-01-11 15:06     ` Tvrtko Ursulin
  1 sibling, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:31 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Mon, Jan 11, 2016 at 02:08:37PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Purpose is to avoid calling i915_gem_obj_ggtt_offset from the
> interrupt context without the big lock held.

No. Tracking the pinned VMA has none of these problems.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/7] drm/i915: Don't need a timer to wake us up
  2016-01-11 14:08 ` [PATCH 5/7] drm/i915: Don't need a timer to wake us up Tvrtko Ursulin
@ 2016-01-11 14:33   ` Chris Wilson
  2016-01-12 10:20     ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:33 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 02:08:39PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We can avoid open-coding the schedule wake-up since
> 
>    commit 9cff8adeaa34b5d2802f03f89803da57856b3b72
>    Author: NeilBrown <neilb@suse.de>
>    Date:   Fri Feb 13 15:49:17 2015 +1100
> 
>        sched: Prevent recursion in io_schedule()
> 
> exported the io_schedule_timeout function which we can now use
> to simplify the code in __i915_wait_request.
> 
> v2: New commit message. (Daniel Vetter)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>

I don't like this because the timer concept still turns out to be useful
when we separate the timer from the bottom-half. And so we have a patch
to remove it and then we add it back again because it serves a purpose.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available
  2016-01-11 14:08 ` [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available Tvrtko Ursulin
@ 2016-01-11 14:35   ` Chris Wilson
  2016-01-12 11:41   ` [PATCH v2 " Tvrtko Ursulin
  1 sibling, 0 replies; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:35 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Mon, Jan 11, 2016 at 02:08:36PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
> places where the struct_mutex cannot be grabbed (irq handlers).

A more general point is that we have to move it out of the interrupt
handler because the current code allows for userspace to trivially panic
the machine...
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/7] drm/i915: Only grab timestamps when needed
  2016-01-11 14:08 ` [PATCH 6/7] drm/i915: Only grab timestamps when needed Tvrtko Ursulin
@ 2016-01-11 14:36   ` Chris Wilson
  2016-01-11 15:04     ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:36 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 02:08:40PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> No need to call ktime_get_raw_ns twice per unlimited wait and can
> also elimate a local variable.

But we could eliminate both, and the unsightly pointless assignment only
required to shut gcc up.

Still preferring my patch.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 14:08 ` [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock Tvrtko Ursulin
@ 2016-01-11 14:38   ` Chris Wilson
  2016-01-11 14:47     ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 14:38 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 02:08:41PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> VMA creation and GEM list management need the big lock.
> 
> v2:
> 
> Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)
> 
> Not to mention drm_gem_object_unreference was there in existing
> code with no mutex held.
> 
> v3:
> 
> Some callers of i915_gem_object_create_stolen_for_preallocated
> already hold the lock so move the mutex into the other caller
> as well.

intel_pm.c valleyview_setup_pctx?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 14:38   ` Chris Wilson
@ 2016-01-11 14:47     ` Tvrtko Ursulin
  2016-01-11 15:00       ` Chris Wilson
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 14:47 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 11/01/16 14:38, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:08:41PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> VMA creation and GEM list management need the big lock.
>>
>> v2:
>>
>> Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)
>>
>> Not to mention drm_gem_object_unreference was there in existing
>> code with no mutex held.
>>
>> v3:
>>
>> Some callers of i915_gem_object_create_stolen_for_preallocated
>> already hold the lock so move the mutex into the other caller
>> as well.
>
> intel_pm.c valleyview_setup_pctx?

Already holds it traced by the WARN_ON at its top.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 14:47     ` Tvrtko Ursulin
@ 2016-01-11 15:00       ` Chris Wilson
  2016-01-11 15:04         ` Chris Wilson
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 15:00 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 02:47:17PM +0000, Tvrtko Ursulin wrote:
> 
> On 11/01/16 14:38, Chris Wilson wrote:
> >On Mon, Jan 11, 2016 at 02:08:41PM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>VMA creation and GEM list management need the big lock.
> >>
> >>v2:
> >>
> >>Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)
> >>
> >>Not to mention drm_gem_object_unreference was there in existing
> >>code with no mutex held.
> >>
> >>v3:
> >>
> >>Some callers of i915_gem_object_create_stolen_for_preallocated
> >>already hold the lock so move the mutex into the other caller
> >>as well.
> >
> >intel_pm.c valleyview_setup_pctx?
> 
> Already holds it traced by the WARN_ON at its top.

Which is a nice little mutex inversion of its own. :|
i.e. rpm vs struct_mutex bug
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/7] drm/i915: Only grab timestamps when needed
  2016-01-11 14:36   ` Chris Wilson
@ 2016-01-11 15:04     ` Tvrtko Ursulin
  2016-01-12 15:52       ` Dave Gordon
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 15:04 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 11/01/16 14:36, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:08:40PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> No need to call ktime_get_raw_ns twice per unlimited wait and can
>> also elimate a local variable.
>
> But we could eliminate both, and the unsightly pointless assignment only
> required to shut gcc up.
>
> Still preferring my patch.

Ah I remember it now.. you were storing it in the pointer provided by 
the caller. I think that is significantly worse, sorry cannot approve that.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 15:00       ` Chris Wilson
@ 2016-01-11 15:04         ` Chris Wilson
  2016-01-11 15:16           ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 15:04 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx

On Mon, Jan 11, 2016 at 03:00:20PM +0000, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:47:17PM +0000, Tvrtko Ursulin wrote:
> > 
> > On 11/01/16 14:38, Chris Wilson wrote:
> > >On Mon, Jan 11, 2016 at 02:08:41PM +0000, Tvrtko Ursulin wrote:
> > >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> > >>
> > >>VMA creation and GEM list management need the big lock.
> > >>
> > >>v2:
> > >>
> > >>Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)
> > >>
> > >>Not to mention drm_gem_object_unreference was there in existing
> > >>code with no mutex held.
> > >>
> > >>v3:
> > >>
> > >>Some callers of i915_gem_object_create_stolen_for_preallocated
> > >>already hold the lock so move the mutex into the other caller
> > >>as well.
> > >
> > >intel_pm.c valleyview_setup_pctx?
> > 
> > Already holds it traced by the WARN_ON at its top.
> 
> Which is a nice little mutex inversion of its own. :|
> i.e. rpm vs struct_mutex bug

diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index b1fb43fcfeea..c8f684f8799c 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -16018,9 +16018,7 @@ void intel_modeset_gem_init(struct drm_device *dev)
        struct drm_crtc *c;
        struct drm_i915_gem_object *obj;
 
-       mutex_lock(&dev->struct_mutex);
        intel_init_gt_powersave(dev);
-       mutex_unlock(&dev->struct_mutex);
 
        intel_modeset_init_hw(dev);
 
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index a082b4577599..90e5bf7a2402 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -5224,8 +5224,6 @@ static void cherryview_setup_pctx(struct drm_device *dev)
        u32 pcbr;
        int pctx_size = 32*1024;
 
-       WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
        pcbr = I915_READ(VLV_PCBR);
        if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {
                DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
@@ -5247,8 +5245,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
        u32 pcbr;
        int pctx_size = 24*1024;
 
-       WARN_ON(!mutex_is_locked(&dev->struct_mutex));
-
+       mutex_lock(&dev->struct_mutex);
        pcbr = I915_READ(VLV_PCBR);
        if (pcbr) {
                /* BIOS set it up already, grab the pre-alloc'd space */
@@ -5275,7 +5272,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
        pctx = i915_gem_object_create_stolen(dev, pctx_size);
        if (!pctx) {
                DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
-               return;
+               goto out;
        }
 
        pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->start;
@@ -5284,6 +5281,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
 out:
        DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
        dev_priv->vlv_pctx = pctx;
+       mutex_unlock(&dev->struct_mutex);
 }
 
 static void valleyview_cleanup_pctx(struct drm_device *dev)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 3/7] drm/i915: Cache ringbuffer GTT address
  2016-01-11 14:31   ` Chris Wilson
@ 2016-01-11 15:06     ` Tvrtko Ursulin
  0 siblings, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 15:06 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Daniel Vetter


On 11/01/16 14:31, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:08:37PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Purpose is to avoid calling i915_gem_obj_ggtt_offset from the
>> interrupt context without the big lock held.
>
> No. Tracking the pinned VMA has none of these problems.

This patch has problems which VMA tracking does not? I think address is 
simpler for time being and good enough. It is a subset of the VMA and 
the lifetime is the same.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 4/7] drm/i915: Cache LRC state page in the context
  2016-01-11 14:29   ` Chris Wilson
@ 2016-01-11 15:07     ` Tvrtko Ursulin
  2016-01-12 11:43     ` [PATCH v2 " Tvrtko Ursulin
  2016-01-12 11:56     ` [PATCH v3 " Tvrtko Ursulin
  2 siblings, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 15:07 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 11/01/16 14:29, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:08:38PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> LRC lifetime is well defined so we can cache the page pointing
>> to the object backing store in the context in order to avoid
>> walking over the object SG page list from the interrupt context
>> without the big lock held.
>
> You can keep kmap around for the pinned lifetime of the context.

Baby steps. :)

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 15:04         ` Chris Wilson
@ 2016-01-11 15:16           ` Tvrtko Ursulin
  2016-01-11 15:36             ` Ville Syrjälä
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-11 15:16 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx



On 11/01/16 15:04, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 03:00:20PM +0000, Chris Wilson wrote:
>> On Mon, Jan 11, 2016 at 02:47:17PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 11/01/16 14:38, Chris Wilson wrote:
>>>> On Mon, Jan 11, 2016 at 02:08:41PM +0000, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> VMA creation and GEM list management need the big lock.
>>>>>
>>>>> v2:
>>>>>
>>>>> Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)
>>>>>
>>>>> Not to mention drm_gem_object_unreference was there in existing
>>>>> code with no mutex held.
>>>>>
>>>>> v3:
>>>>>
>>>>> Some callers of i915_gem_object_create_stolen_for_preallocated
>>>>> already hold the lock so move the mutex into the other caller
>>>>> as well.
>>>>
>>>> intel_pm.c valleyview_setup_pctx?
>>>
>>> Already holds it traced by the WARN_ON at its top.
>>
>> Which is a nice little mutex inversion of its own. :|
>> i.e. rpm vs struct_mutex bug
>
> diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> index b1fb43fcfeea..c8f684f8799c 100644
> --- a/drivers/gpu/drm/i915/intel_display.c
> +++ b/drivers/gpu/drm/i915/intel_display.c
> @@ -16018,9 +16018,7 @@ void intel_modeset_gem_init(struct drm_device *dev)
>          struct drm_crtc *c;
>          struct drm_i915_gem_object *obj;
>
> -       mutex_lock(&dev->struct_mutex);
>          intel_init_gt_powersave(dev);
> -       mutex_unlock(&dev->struct_mutex);
>
>          intel_modeset_init_hw(dev);
>
> diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> index a082b4577599..90e5bf7a2402 100644
> --- a/drivers/gpu/drm/i915/intel_pm.c
> +++ b/drivers/gpu/drm/i915/intel_pm.c
> @@ -5224,8 +5224,6 @@ static void cherryview_setup_pctx(struct drm_device *dev)
>          u32 pcbr;
>          int pctx_size = 32*1024;
>
> -       WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> -
>          pcbr = I915_READ(VLV_PCBR);
>          if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {
>                  DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
> @@ -5247,8 +5245,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
>          u32 pcbr;
>          int pctx_size = 24*1024;
>
> -       WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> -
> +       mutex_lock(&dev->struct_mutex);
>          pcbr = I915_READ(VLV_PCBR);
>          if (pcbr) {
>                  /* BIOS set it up already, grab the pre-alloc'd space */
> @@ -5275,7 +5272,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
>          pctx = i915_gem_object_create_stolen(dev, pctx_size);
>          if (!pctx) {
>                  DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
> -               return;
> +               goto out;
>          }
>
>          pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->start;
> @@ -5284,6 +5281,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
>   out:
>          DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
>          dev_priv->vlv_pctx = pctx;
> +       mutex_unlock(&dev->struct_mutex);
>   }
>
>   static void valleyview_cleanup_pctx(struct drm_device *dev)

Don't know, I leave this one to whoever grabbed the lock around 
intel_init_gt_powersave in the first place. Maybe there was a special 
reason.. after git blame od intel_display.c eventually completed, adding 
Imre and Ville to cc.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 15:16           ` Tvrtko Ursulin
@ 2016-01-11 15:36             ` Ville Syrjälä
  2016-01-11 16:56               ` Chris Wilson
  0 siblings, 1 reply; 48+ messages in thread
From: Ville Syrjälä @ 2016-01-11 15:36 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin wrote:
> 
> 
> On 11/01/16 15:04, Chris Wilson wrote:
> > On Mon, Jan 11, 2016 at 03:00:20PM +0000, Chris Wilson wrote:
> >> On Mon, Jan 11, 2016 at 02:47:17PM +0000, Tvrtko Ursulin wrote:
> >>>
> >>> On 11/01/16 14:38, Chris Wilson wrote:
> >>>> On Mon, Jan 11, 2016 at 02:08:41PM +0000, Tvrtko Ursulin wrote:
> >>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>>
> >>>>> VMA creation and GEM list management need the big lock.
> >>>>>
> >>>>> v2:
> >>>>>
> >>>>> Mutex unlock ended on the wrong path somehow. (0-day, Julia Lawall)
> >>>>>
> >>>>> Not to mention drm_gem_object_unreference was there in existing
> >>>>> code with no mutex held.
> >>>>>
> >>>>> v3:
> >>>>>
> >>>>> Some callers of i915_gem_object_create_stolen_for_preallocated
> >>>>> already hold the lock so move the mutex into the other caller
> >>>>> as well.
> >>>>
> >>>> intel_pm.c valleyview_setup_pctx?
> >>>
> >>> Already holds it traced by the WARN_ON at its top.
> >>
> >> Which is a nice little mutex inversion of its own. :|
> >> i.e. rpm vs struct_mutex bug
> >
> > diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
> > index b1fb43fcfeea..c8f684f8799c 100644
> > --- a/drivers/gpu/drm/i915/intel_display.c
> > +++ b/drivers/gpu/drm/i915/intel_display.c
> > @@ -16018,9 +16018,7 @@ void intel_modeset_gem_init(struct drm_device *dev)
> >          struct drm_crtc *c;
> >          struct drm_i915_gem_object *obj;
> >
> > -       mutex_lock(&dev->struct_mutex);
> >          intel_init_gt_powersave(dev);
> > -       mutex_unlock(&dev->struct_mutex);
> >
> >          intel_modeset_init_hw(dev);
> >
> > diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
> > index a082b4577599..90e5bf7a2402 100644
> > --- a/drivers/gpu/drm/i915/intel_pm.c
> > +++ b/drivers/gpu/drm/i915/intel_pm.c
> > @@ -5224,8 +5224,6 @@ static void cherryview_setup_pctx(struct drm_device *dev)
> >          u32 pcbr;
> >          int pctx_size = 32*1024;
> >
> > -       WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> > -
> >          pcbr = I915_READ(VLV_PCBR);
> >          if ((pcbr >> VLV_PCBR_ADDR_SHIFT) == 0) {
> >                  DRM_DEBUG_DRIVER("BIOS didn't set up PCBR, fixing up\n");
> > @@ -5247,8 +5245,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
> >          u32 pcbr;
> >          int pctx_size = 24*1024;
> >
> > -       WARN_ON(!mutex_is_locked(&dev->struct_mutex));
> > -
> > +       mutex_lock(&dev->struct_mutex);
> >          pcbr = I915_READ(VLV_PCBR);
> >          if (pcbr) {
> >                  /* BIOS set it up already, grab the pre-alloc'd space */
> > @@ -5275,7 +5272,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
> >          pctx = i915_gem_object_create_stolen(dev, pctx_size);
> >          if (!pctx) {
> >                  DRM_DEBUG("not enough stolen space for PCTX, disabling\n");
> > -               return;
> > +               goto out;
> >          }
> >
> >          pctx_paddr = dev_priv->mm.stolen_base + pctx->stolen->start;
> > @@ -5284,6 +5281,7 @@ static void valleyview_setup_pctx(struct drm_device *dev)
> >   out:
> >          DRM_DEBUG_DRIVER("PCBR: 0x%08x\n", I915_READ(VLV_PCBR));
> >          dev_priv->vlv_pctx = pctx;
> > +       mutex_unlock(&dev->struct_mutex);
> >   }
> >
> >   static void valleyview_cleanup_pctx(struct drm_device *dev)
> 
> Don't know, I leave this one to whoever grabbed the lock around 
> intel_init_gt_powersave in the first place. Maybe there was a special 
> reason.. after git blame od intel_display.c eventually completed, adding 
> Imre and Ville to cc.

Hmm. I don't recall the details anymore, but looking at the code pushing
the locking down to valleyview_setup_pctx() looks entirely reasonable to
me. And yeah, doesn't look like it's really protecting anything in the
chv function, so can just be dropped there.

The cleanup path could use the same treatemnt I think.

-- 
Ville Syrjälä
Intel OTC
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 15:36             ` Ville Syrjälä
@ 2016-01-11 16:56               ` Chris Wilson
  2016-01-13 12:46                 ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-11 16:56 UTC (permalink / raw)
  To: Ville Syrjälä; +Cc: Intel-gfx

On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä wrote:
> On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin wrote:
> > Don't know, I leave this one to whoever grabbed the lock around 
> > intel_init_gt_powersave in the first place. Maybe there was a special 
> > reason.. after git blame od intel_display.c eventually completed, adding 
> > Imre and Ville to cc.
> 
> Hmm. I don't recall the details anymore, but looking at the code pushing
> the locking down to valleyview_setup_pctx() looks entirely reasonable to
> me.

iirc, this locking only exists to keep the WARN() at bay. But it is
pedagogical, I guess.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 5/7] drm/i915: Don't need a timer to wake us up
  2016-01-11 14:33   ` Chris Wilson
@ 2016-01-12 10:20     ` Tvrtko Ursulin
  0 siblings, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 10:20 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 11/01/16 14:33, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 02:08:39PM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We can avoid open-coding the schedule wake-up since
>>
>>     commit 9cff8adeaa34b5d2802f03f89803da57856b3b72
>>     Author: NeilBrown <neilb@suse.de>
>>     Date:   Fri Feb 13 15:49:17 2015 +1100
>>
>>         sched: Prevent recursion in io_schedule()
>>
>> exported the io_schedule_timeout function which we can now use
>> to simplify the code in __i915_wait_request.
>>
>> v2: New commit message. (Daniel Vetter)
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> Reviewed-by: Daniel Vetter <daniel.vetter@ffwll.ch>
>
> I don't like this because the timer concept still turns out to be useful
> when we separate the timer from the bottom-half. And so we have a patch
> to remove it and then we add it back again because it serves a purpose.

-EWHATEVER

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available
  2016-01-11 14:08 ` [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available Tvrtko Ursulin
  2016-01-11 14:35   ` Chris Wilson
@ 2016-01-12 11:41   ` Tvrtko Ursulin
  2016-01-12 15:47     ` Dave Gordon
  1 sibling, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 11:41 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).

To avoid that this patch caches some interesting bits and values
in the engine and context structures.

Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.

Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.

This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.

v2:
 * Cache the VMA instead of the address. (Chris Wilson)
 * Incorporate Dave Gordon's good comments and function name.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  15 ++--
 drivers/gpu/drm/i915/i915_drv.h         |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   1 -
 drivers/gpu/drm/i915/intel_lrc.c        | 126 +++++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_lrc.h        |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 90 insertions(+), 60 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e3377abc0d4d..0b3550f05026 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1994,12 +1994,13 @@ static int i915_context_status(struct seq_file *m, void *unused)
 }
 
 static void i915_dump_lrc_obj(struct seq_file *m,
-			      struct intel_engine_cs *ring,
-			      struct drm_i915_gem_object *ctx_obj)
+			      struct intel_context *ctx,
+			      struct intel_engine_cs *ring)
 {
 	struct page *page;
 	uint32_t *reg_state;
 	int j;
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	unsigned long ggtt_offset = 0;
 
 	if (ctx_obj == NULL) {
@@ -2009,7 +2010,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 	}
 
 	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-		   intel_execlists_ctx_id(ctx_obj));
+		   intel_execlists_ctx_id(ctx, ring));
 
 	if (!i915_gem_obj_ggtt_bound(ctx_obj))
 		seq_puts(m, "\tNot bound in GGTT\n");
@@ -2058,8 +2059,7 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		for_each_ring(ring, dev_priv, i) {
 			if (ring->default_context != ctx)
-				i915_dump_lrc_obj(m, ring,
-						  ctx->engine[i].state);
+				i915_dump_lrc_obj(m, ctx, ring);
 		}
 	}
 
@@ -2133,11 +2133,8 @@ static int i915_execlists(struct seq_file *m, void *data)
 
 		seq_printf(m, "\t%d requests in queue\n", count);
 		if (head_req) {
-			struct drm_i915_gem_object *ctx_obj;
-
-			ctx_obj = head_req->ctx->engine[ring_id].state;
 			seq_printf(m, "\tHead request id: %u\n",
-				   intel_execlists_ctx_id(ctx_obj));
+				   intel_execlists_ctx_id(head_req->ctx, ring));
 			seq_printf(m, "\tHead request tail: %u\n",
 				   head_req->tail);
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 747d2d84a18c..79bb3671a15e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -883,6 +883,8 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
+		struct i915_vma *lrc_vma;
+		u64 lrc_desc;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b448ad832dcf..e5737963ab79 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -44,7 +44,6 @@ typedef uint64_t gen8_ppgtt_pml4e_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
-
 /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
 #define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
 #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index ab344e0b878c..504030ce7f93 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -265,7 +265,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 
 /**
  * intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
+ * @ctx: Context to get the ID for
+ * @ring: Engine to get the ID for
  *
  * Do not confuse with ctx->id! Unfortunately we have a name overload
  * here: the old context ID we pass to userspace as a handler so that
@@ -273,16 +274,15 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
  * ELSP so that the GPU can inform us of the context status via
  * interrupts.
  *
+ * The context ID is a portion of the context descriptor, so we can
+ * just extract the required part from the cached descriptor.
+ *
  * Return: 20-bits globally unique context ID.
  */
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+			   struct intel_engine_cs *ring)
 {
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
-	/* LRCA is required to be 4K aligned so the more significant 20 bits
-	 * are globally unique */
-	return lrca >> 12;
+	return ctx->engine[ring->id].lrc_desc >> GEN8_CTX_ID_SHIFT;
 }
 
 static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
@@ -297,31 +297,7 @@ static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
-	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
-
-	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
-	if (IS_GEN8(ctx_obj->base.dev))
-		desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
-
-	/* TODO: WaDisableLiteRestore when we start using semaphore
-	 * signalling between Command Streamers */
-	/* desc |= GEN8_CTX_FORCE_RESTORE; */
-
-	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
-	if (disable_lite_restore_wa(ring))
-		desc |= GEN8_CTX_FORCE_RESTORE;
-
-	return desc;
+	return ctx->engine[ring->id].lrc_desc;
 }
 
 static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
@@ -369,8 +345,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	uint32_t *reg_state;
 
 	BUG_ON(!ctx_obj);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
-	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
 	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
@@ -477,9 +451,7 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 					    execlist_link);
 
 	if (head_req != NULL) {
-		struct drm_i915_gem_object *ctx_obj =
-				head_req->ctx->engine[ring->id].state;
-		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+		if (intel_execlists_ctx_id(head_req->ctx, ring) == request_id) {
 			WARN(head_req->elsp_submitted == 0,
 			     "Never submitted head request\n");
 
@@ -556,7 +528,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 		}
 	}
 
-	if (disable_lite_restore_wa(ring)) {
+	if (ring->disable_lite_restore_wa) {
 		/* Prevent a ctx to preempt itself */
 		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
 		    (submit_contexts != 0))
@@ -1038,15 +1010,51 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
 	return 0;
 }
 
+/**
+ * intel_lr_context_descriptor_update() - calculate & cache the
+ *                      descriptor for a pinned
+ *                       context
+ * @ctx: Context to work on
+ * @ring: Engine the descriptor will be used with
+ *
+ * The context descriptor encodes various attributes of a context,
+ * including its GTT address and some flags. Because it's fairly
+ * expensive to calculate, we'll just do it once and cache the result,
+ * which remains valid until the context is unpinned.
+ *
+ * This is what a descriptor looks like, from LSB to MSB:
+ *    bits 0-11:    flags, GEN8_CTX_* (cached in ctx_desc_template)
+ *    bits 12-31:    LRCA, GTT address of (the HWSP of) this context
+ *    bits 32-51:    ctx ID, a globally unique tag (the LRCA again!)
+ *    bits 52-63:    reserved, may encode the engine ID (for GuC)
+ */
+static void
+intel_lr_context_descriptor_update(struct intel_context *ctx,
+				   struct intel_engine_cs *ring)
+{
+	uint64_t lrca, desc;
+
+	lrca = ctx->engine[ring->id].lrc_vma->node.start +
+	       LRC_PPHWSP_PN * PAGE_SIZE;
+
+	desc = ring->ctx_desc_template;			   /* bits  0-11 */
+	desc |= lrca;					   /* bits 12-31 */
+	desc |= (lrca >> PAGE_SHIFT) << GEN8_CTX_ID_SHIFT; /* bits 32-51 */
+
+	ctx->engine[ring->id].lrc_desc = desc;
+}
+
 static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *ctx_obj,
-		struct intel_ringbuffer *ringbuf)
+				   struct intel_context *ctx)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret = 0;
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+
 	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
 			PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 	if (ret)
@@ -1056,6 +1064,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto unpin_ctx_obj;
 
+	ctx->engine[ring->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
+	intel_lr_context_descriptor_update(ctx, ring);
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1074,11 +1084,9 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 {
 	int ret = 0;
 	struct intel_engine_cs *ring = rq->ring;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = rq->ringbuf;
 
 	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
-		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
+		ret = intel_lr_context_do_pin(ring, rq->ctx);
 		if (ret)
 			goto reset_pin_count;
 	}
@@ -1100,6 +1108,8 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
+			rq->ctx->engine[ring->id].lrc_vma = NULL;
+			rq->ctx->engine[ring->id].lrc_desc = 0;
 		}
 	}
 }
@@ -1938,6 +1948,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		ring->status_page.obj = NULL;
 	}
 
+	ring->disable_lite_restore_wa = false;
+	ring->ctx_desc_template = 0;
+
 	lrc_destroy_wa_ctx_obj(ring);
 	ring->dev = NULL;
 }
@@ -1960,6 +1973,24 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
 	spin_lock_init(&ring->execlist_lock);
 
+	ring->disable_lite_restore_wa = disable_lite_restore_wa(ring);
+
+	ring->ctx_desc_template = GEN8_CTX_VALID;
+	ring->ctx_desc_template |= GEN8_CTX_ADDRESSING_MODE(dev) <<
+				   GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	if (IS_GEN8(dev))
+		ring->ctx_desc_template |= GEN8_CTX_L3LLC_COHERENT;
+	ring->ctx_desc_template |= GEN8_CTX_PRIVILEGE;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE; */
+
+	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
+	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
+	if (ring->disable_lite_restore_wa)
+		ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE;
+
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		goto error;
@@ -1969,10 +2000,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 		goto error;
 
 	/* As this is the default context, always pin it */
-	ret = intel_lr_context_do_pin(
-			ring,
-			ring->default_context->engine[ring->id].state,
-			ring->default_context->engine[ring->id].ringbuf);
+	ret = intel_lr_context_do_pin(ring, ring->default_context);
 	if (ret) {
 		DRM_ERROR(
 			"Failed to pin and map ringbuffer %s: %d\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index de41ad6cd63d..49af638f6213 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -107,13 +107,15 @@ void intel_lr_context_reset(struct drm_device *dev,
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
 
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+			   struct intel_engine_cs *ring);
+
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
 struct i915_execbuffer_params;
 int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7349d9258191..85ce2272f92c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -269,6 +269,8 @@ struct  intel_engine_cs {
 	struct list_head execlist_queue;
 	struct list_head execlist_retired_req_list;
 	u8 next_context_status_buffer;
+	bool disable_lite_restore_wa;
+	u32 ctx_desc_template;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct drm_i915_gem_request *request);
 	int		(*emit_flush)(struct drm_i915_gem_request *request,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v2 4/7] drm/i915: Cache LRC state page in the context
  2016-01-11 14:29   ` Chris Wilson
  2016-01-11 15:07     ` Tvrtko Ursulin
@ 2016-01-12 11:43     ` Tvrtko Ursulin
  2016-01-12 11:56     ` [PATCH v3 " Tvrtko Ursulin
  2 siblings, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 11:43 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LRC lifetime is well defined so we can cache the page pointing
to the object backing store in the context in order to avoid
walking over the object SG page list from the interrupt context
without the big lock held.

v2: Also cache the mapping. (Chris Wilson)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c | 30 ++++++++++++++++++++----------
 2 files changed, 22 insertions(+), 10 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 79bb3671a15e..0c6a274a2150 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,6 +885,8 @@ struct intel_context {
 		int pin_count;
 		struct i915_vma *lrc_vma;
 		u64 lrc_desc;
+		struct page *lrc_state_page;
+		uint32_t *lrc_reg_state;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 94314b344f38..9bd20422cfbf 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -339,14 +339,7 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 {
 	struct intel_engine_cs *ring = rq->ring;
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct page *page;
-	uint32_t *reg_state;
-
-	BUG_ON(!ctx_obj);
-
-	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	reg_state = kmap_atomic(page);
+	uint32_t *reg_state = rq->ctx->engine[ring->id].lrc_reg_state;
 
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 	reg_state[CTX_RING_BUFFER_START+1] = rq->ringbuf->vma->node.start;
@@ -363,8 +356,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
 	}
 
-	kunmap_atomic(reg_state);
-
 	return 0;
 }
 
@@ -1050,6 +1041,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct page *lrc_state_page;
+	uint32_t *lrc_reg_state;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -1059,12 +1052,26 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
+	lrc_state_page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
+	if (WARN_ON(!lrc_state_page)) {
+		ret = -ENODEV;
+		goto unpin_ctx_obj;
+	}
+
+	lrc_reg_state = kmap(lrc_state_page);
+	if (!lrc_reg_state) {
+		ret = -ENOMEM;
+		goto unpin_ctx_obj;
+	}
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
 		goto unpin_ctx_obj;
 
 	ctx->engine[ring->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
 	intel_lr_context_descriptor_update(ctx, ring);
+	ctx->engine[ring->id].lrc_state_page = lrc_state_page;
+	ctx->engine[ring->id].lrc_reg_state = lrc_reg_state;
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1105,10 +1112,13 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
+			kunmap(rq->ctx->engine[ring->id].lrc_state_page);
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			rq->ctx->engine[ring->id].lrc_vma = NULL;
 			rq->ctx->engine[ring->id].lrc_desc = 0;
+			rq->ctx->engine[ring->id].lrc_state_page = NULL;
+			rq->ctx->engine[ring->id].lrc_reg_state = NULL;
 		}
 	}
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-11 14:29   ` Chris Wilson
  2016-01-11 15:07     ` Tvrtko Ursulin
  2016-01-12 11:43     ` [PATCH v2 " Tvrtko Ursulin
@ 2016-01-12 11:56     ` Tvrtko Ursulin
  2016-01-12 12:12       ` Chris Wilson
  2 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 11:56 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LRC lifetime is well defined so we can cache the page pointing
to the object backing store in the context in order to avoid
walking over the object SG page list from the interrupt context
without the big lock held.

v2: Also cache the mapping. (Chris Wilson)
v3: Unmap on the error path.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  |  2 ++
 drivers/gpu/drm/i915/intel_lrc.c | 34 +++++++++++++++++++++++-----------
 2 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 79bb3671a15e..0c6a274a2150 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -885,6 +885,8 @@ struct intel_context {
 		int pin_count;
 		struct i915_vma *lrc_vma;
 		u64 lrc_desc;
+		struct page *lrc_state_page;
+		uint32_t *lrc_reg_state;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 94314b344f38..213c4b789683 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -339,14 +339,7 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 {
 	struct intel_engine_cs *ring = rq->ring;
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct page *page;
-	uint32_t *reg_state;
-
-	BUG_ON(!ctx_obj);
-
-	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	reg_state = kmap_atomic(page);
+	uint32_t *reg_state = rq->ctx->engine[ring->id].lrc_reg_state;
 
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 	reg_state[CTX_RING_BUFFER_START+1] = rq->ringbuf->vma->node.start;
@@ -363,8 +356,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
 	}
 
-	kunmap_atomic(reg_state);
-
 	return 0;
 }
 
@@ -1050,6 +1041,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	struct page *lrc_state_page;
+	uint32_t *lrc_reg_state;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
@@ -1059,12 +1052,26 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
+	lrc_state_page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
+	if (WARN_ON(!lrc_state_page)) {
+		ret = -ENODEV;
+		goto unpin_ctx_obj;
+	}
+
+	lrc_reg_state = kmap(lrc_state_page);
+	if (!lrc_reg_state) {
+		ret = -ENOMEM;
+		goto unpin_ctx_obj;
+	}
+
 	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
 	if (ret)
-		goto unpin_ctx_obj;
+		goto unmap_state_page;
 
 	ctx->engine[ring->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
 	intel_lr_context_descriptor_update(ctx, ring);
+	ctx->engine[ring->id].lrc_state_page = lrc_state_page;
+	ctx->engine[ring->id].lrc_reg_state = lrc_reg_state;
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1073,6 +1080,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 
 	return ret;
 
+unmap_state_page;
+	kunmap(lrc_state_page);
 unpin_ctx_obj:
 	i915_gem_object_ggtt_unpin(ctx_obj);
 
@@ -1105,10 +1114,13 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
+			kunmap(rq->ctx->engine[ring->id].lrc_state_page);
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 			rq->ctx->engine[ring->id].lrc_vma = NULL;
 			rq->ctx->engine[ring->id].lrc_desc = 0;
+			rq->ctx->engine[ring->id].lrc_state_page = NULL;
+			rq->ctx->engine[ring->id].lrc_reg_state = NULL;
 		}
 	}
 }
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-12 11:56     ` [PATCH v3 " Tvrtko Ursulin
@ 2016-01-12 12:12       ` Chris Wilson
  2016-01-12 12:54         ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-12 12:12 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> LRC lifetime is well defined so we can cache the page pointing
> to the object backing store in the context in order to avoid
> walking over the object SG page list from the interrupt context
> without the big lock held.
> 
> v2: Also cache the mapping. (Chris Wilson)
> v3: Unmap on the error path.

Then we only use the lrc_state_page in the unmapping, hardly worth the
cost of saving it.

The reg_state is better associated with the ring (since it basically
contains the analog of the RING_HEAD and friends).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-12 12:12       ` Chris Wilson
@ 2016-01-12 12:54         ` Tvrtko Ursulin
  2016-01-12 13:11           ` Chris Wilson
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-12 12:54 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 12/01/16 12:12, Chris Wilson wrote:
> On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> LRC lifetime is well defined so we can cache the page pointing
>> to the object backing store in the context in order to avoid
>> walking over the object SG page list from the interrupt context
>> without the big lock held.
>>
>> v2: Also cache the mapping. (Chris Wilson)
>> v3: Unmap on the error path.
>
> Then we only use the lrc_state_page in the unmapping, hardly worth the
> cost of saving it.

Ok.

Do you also know if this would now require any flushing or something if 
previously kunmap_atomic might have done something under the covers?

> The reg_state is better associated with the ring (since it basically
> contains the analog of the RING_HEAD and friends).

Hm, not sure. The page belongs to the object from that anonymous struct 
in intel_context so I think it is best to keep them together.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-12 12:54         ` Tvrtko Ursulin
@ 2016-01-12 13:11           ` Chris Wilson
  2016-01-12 16:45             ` Dave Gordon
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-12 13:11 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Tue, Jan 12, 2016 at 12:54:25PM +0000, Tvrtko Ursulin wrote:
> 
> On 12/01/16 12:12, Chris Wilson wrote:
> >On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>LRC lifetime is well defined so we can cache the page pointing
> >>to the object backing store in the context in order to avoid
> >>walking over the object SG page list from the interrupt context
> >>without the big lock held.
> >>
> >>v2: Also cache the mapping. (Chris Wilson)
> >>v3: Unmap on the error path.
> >
> >Then we only use the lrc_state_page in the unmapping, hardly worth the
> >cost of saving it.
> 
> Ok.
> 
> Do you also know if this would now require any flushing or something
> if previously kunmap_atomic might have done something under the
> covers?

kmap_atomic only changes the PTE and the unmap would flush the TLB. In
terms of our pointer access, using kmap/kmap_atomic is equivalent. (Just
kmap_atomic is meant to be cheaper to set up and a limited resource
which can only be used without preemption.)

> >The reg_state is better associated with the ring (since it basically
> >contains the analog of the RING_HEAD and friends).
> 
> Hm, not sure. The page belongs to the object from that anonymous
> struct in intel_context so I think it is best to keep them together.

The page does sure, but the state does not.

The result is that we get:

ring->registers[CTX_RING_BUFFER_STATE+1] = ring->vma->node.state;
ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
ring->registers[CTX_RING_TAIL+1] = req->tail;

which makes a lot more sense, to me, when viewed against the underlying
architecture of the hardware and comparing against the legacy ringbuffer.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v2 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available
  2016-01-12 11:41   ` [PATCH v2 " Tvrtko Ursulin
@ 2016-01-12 15:47     ` Dave Gordon
  2016-01-13 16:16       ` [PATCH v3] " Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Gordon @ 2016-01-12 15:47 UTC (permalink / raw)
  To: Tvrtko Ursulin, Intel-gfx; +Cc: Daniel Vetter

On 12/01/16 11:41, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>
> LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
> places where the struct_mutex cannot be grabbed (irq handlers).
>
> To avoid that this patch caches some interesting bits and values
> in the engine and context structures.
>
> Some usages are also removed where they are not needed like a
> few asserts which are either impossible or have been checked
> already during engine initialization.
>
> Side benefit is also that interrupt handlers and command
> submission stop evaluating invariant conditionals, like what
> Gen we are running on, on every interrupt and every command
> submitted.
>
> This patch deals with logical ring context id and descriptors
> while subsequent patches will deal with the remaining issues.
>
> v2:
>   * Cache the VMA instead of the address. (Chris Wilson)
>   * Incorporate Dave Gordon's good comments and function name.
>
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> ---
>   drivers/gpu/drm/i915/i915_debugfs.c     |  15 ++--
>   drivers/gpu/drm/i915/i915_drv.h         |   2 +
>   drivers/gpu/drm/i915/i915_gem_gtt.h     |   1 -
>   drivers/gpu/drm/i915/intel_lrc.c        | 126 +++++++++++++++++++-------------
>   drivers/gpu/drm/i915/intel_lrc.h        |   4 +-
>   drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
>   6 files changed, 90 insertions(+), 60 deletions(-)

I would like the new descriptor-related code to be organised as follows, 
rather than scattered around:

1.	a per-ring descriptor-template-setup function, essentially the
	block of code currently embedded in logical_ring_init(), that
	used to be part of intel_lr_context_descriptor(). Its purpose
	is to calculate and cache the invariant parts of all context
	descriptors for this engine.
2.	the per-context-pinning intel_lr_context_descriptor_update()
	should be next, because it uses the result from the above and
	calculates and caches a derived value.
3.	the trivial accessor intel_lr_context_descriptor() which returns
	the value calculated above.
4.	the nearly-trivial intel_execlists_ctx_id(), which ideally
	should call intel_lr_context_descriptor() rather than repeat the
	same set of []-> operations.

Keeping all these together in the source file makes it easier to check 
that definitions and assumptions in one match those made in the others, 
and means you can have just one block of comments relating to all of 
them. The whole block can go wherever you think fit, but probably near 
the top of the file is better, because these are leaf functions.

Anyway, this is mostly a matter of style & maintainability. The code 
looks correct anyway, so even if you don't reorganise it, it gets:

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

and if you do reorganise it as described, you can keep the R-B too :)

> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index ab344e0b878c..504030ce7f93 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -265,7 +265,8 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
>
>   /**
>    * intel_execlists_ctx_id() - get the Execlists Context ID
> - * @ctx_obj: Logical Ring Context backing object.
> + * @ctx: Context to get the ID for
> + * @ring: Engine to get the ID for
>    *
>    * Do not confuse with ctx->id! Unfortunately we have a name overload
>    * here: the old context ID we pass to userspace as a handler so that
> @@ -273,16 +274,15 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
>    * ELSP so that the GPU can inform us of the context status via
>    * interrupts.
>    *
> + * The context ID is a portion of the context descriptor, so we can
> + * just extract the required part from the cached descriptor.
> + *
>    * Return: 20-bits globally unique context ID.
>    */
> -u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
> +u32 intel_execlists_ctx_id(struct intel_context *ctx,
> +			   struct intel_engine_cs *ring)
>   {
> -	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
> -			LRC_PPHWSP_PN * PAGE_SIZE;
> -
> -	/* LRCA is required to be 4K aligned so the more significant 20 bits
> -	 * are globally unique */
> -	return lrca >> 12;
> +	return ctx->engine[ring->id].lrc_desc >> GEN8_CTX_ID_SHIFT;
>   }
>
>   static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
> @@ -297,31 +297,7 @@ static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
>   uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
>   				     struct intel_engine_cs *ring)
>   {
> -	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> -	uint64_t desc;
> -	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
> -			LRC_PPHWSP_PN * PAGE_SIZE;
> -
> -	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
> -
> -	desc = GEN8_CTX_VALID;
> -	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
> -	if (IS_GEN8(ctx_obj->base.dev))
> -		desc |= GEN8_CTX_L3LLC_COHERENT;
> -	desc |= GEN8_CTX_PRIVILEGE;
> -	desc |= lrca;
> -	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
> -
> -	/* TODO: WaDisableLiteRestore when we start using semaphore
> -	 * signalling between Command Streamers */
> -	/* desc |= GEN8_CTX_FORCE_RESTORE; */
> -
> -	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
> -	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
> -	if (disable_lite_restore_wa(ring))
> -		desc |= GEN8_CTX_FORCE_RESTORE;
> -
> -	return desc;
> +	return ctx->engine[ring->id].lrc_desc;
>   }
>
>   static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
> @@ -369,8 +345,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>   	uint32_t *reg_state;
>
>   	BUG_ON(!ctx_obj);
> -	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
> -	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
>
>   	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
>   	reg_state = kmap_atomic(page);
> @@ -477,9 +451,7 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>   					    execlist_link);
>
>   	if (head_req != NULL) {
> -		struct drm_i915_gem_object *ctx_obj =
> -				head_req->ctx->engine[ring->id].state;
> -		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
> +		if (intel_execlists_ctx_id(head_req->ctx, ring) == request_id) {
>   			WARN(head_req->elsp_submitted == 0,
>   			     "Never submitted head request\n");
>
> @@ -556,7 +528,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
>   		}
>   	}
>
> -	if (disable_lite_restore_wa(ring)) {
> +	if (ring->disable_lite_restore_wa) {
>   		/* Prevent a ctx to preempt itself */
>   		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
>   		    (submit_contexts != 0))
> @@ -1038,15 +1010,51 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
>   	return 0;
>   }
>
> +/**
> + * intel_lr_context_descriptor_update() - calculate & cache the
> + *                      descriptor for a pinned
> + *                       context
> + * @ctx: Context to work on
> + * @ring: Engine the descriptor will be used with
> + *
> + * The context descriptor encodes various attributes of a context,
> + * including its GTT address and some flags. Because it's fairly
> + * expensive to calculate, we'll just do it once and cache the result,
> + * which remains valid until the context is unpinned.
> + *
> + * This is what a descriptor looks like, from LSB to MSB:
> + *    bits 0-11:    flags, GEN8_CTX_* (cached in ctx_desc_template)
> + *    bits 12-31:    LRCA, GTT address of (the HWSP of) this context
> + *    bits 32-51:    ctx ID, a globally unique tag (the LRCA again!)
> + *    bits 52-63:    reserved, may encode the engine ID (for GuC)
> + */
> +static void
> +intel_lr_context_descriptor_update(struct intel_context *ctx,
> +				   struct intel_engine_cs *ring)
> +{
> +	uint64_t lrca, desc;
> +
> +	lrca = ctx->engine[ring->id].lrc_vma->node.start +
> +	       LRC_PPHWSP_PN * PAGE_SIZE;
> +
> +	desc = ring->ctx_desc_template;			   /* bits  0-11 */
> +	desc |= lrca;					   /* bits 12-31 */
> +	desc |= (lrca >> PAGE_SHIFT) << GEN8_CTX_ID_SHIFT; /* bits 32-51 */
> +
> +	ctx->engine[ring->id].lrc_desc = desc;
> +}
> +
>   static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
> -		struct drm_i915_gem_object *ctx_obj,
> -		struct intel_ringbuffer *ringbuf)
> +				   struct intel_context *ctx)
>   {
>   	struct drm_device *dev = ring->dev;
>   	struct drm_i915_private *dev_priv = dev->dev_private;
> -	int ret = 0;
> +	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
> +	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
> +	int ret;
>
>   	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> +
>   	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
>   			PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
>   	if (ret)
> @@ -1056,6 +1064,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
>   	if (ret)
>   		goto unpin_ctx_obj;
>
> +	ctx->engine[ring->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
> +	intel_lr_context_descriptor_update(ctx, ring);
>   	ctx_obj->dirty = true;
>
>   	/* Invalidate GuC TLB. */
> @@ -1074,11 +1084,9 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
>   {
>   	int ret = 0;
>   	struct intel_engine_cs *ring = rq->ring;
> -	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
> -	struct intel_ringbuffer *ringbuf = rq->ringbuf;
>
>   	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
> -		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
> +		ret = intel_lr_context_do_pin(ring, rq->ctx);
>   		if (ret)
>   			goto reset_pin_count;
>   	}
> @@ -1100,6 +1108,8 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
>   		if (--rq->ctx->engine[ring->id].pin_count == 0) {
>   			intel_unpin_ringbuffer_obj(ringbuf);
>   			i915_gem_object_ggtt_unpin(ctx_obj);
> +			rq->ctx->engine[ring->id].lrc_vma = NULL;
> +			rq->ctx->engine[ring->id].lrc_desc = 0;
>   		}
>   	}
>   }
> @@ -1938,6 +1948,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>   		ring->status_page.obj = NULL;
>   	}
>
> +	ring->disable_lite_restore_wa = false;
> +	ring->ctx_desc_template = 0;
> +
>   	lrc_destroy_wa_ctx_obj(ring);
>   	ring->dev = NULL;
>   }
> @@ -1960,6 +1973,24 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>   	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
>   	spin_lock_init(&ring->execlist_lock);
>
> +	ring->disable_lite_restore_wa = disable_lite_restore_wa(ring);
> +
> +	ring->ctx_desc_template = GEN8_CTX_VALID;
> +	ring->ctx_desc_template |= GEN8_CTX_ADDRESSING_MODE(dev) <<
> +				   GEN8_CTX_ADDRESSING_MODE_SHIFT;
> +	if (IS_GEN8(dev))
> +		ring->ctx_desc_template |= GEN8_CTX_L3LLC_COHERENT;
> +	ring->ctx_desc_template |= GEN8_CTX_PRIVILEGE;
> +
> +	/* TODO: WaDisableLiteRestore when we start using semaphore
> +	 * signalling between Command Streamers */
> +	/* ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE; */
> +
> +	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
> +	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
> +	if (ring->disable_lite_restore_wa)
> +		ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE;
> +
>   	ret = i915_cmd_parser_init_ring(ring);
>   	if (ret)
>   		goto error;
> @@ -1969,10 +2000,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>   		goto error;
>
>   	/* As this is the default context, always pin it */
> -	ret = intel_lr_context_do_pin(
> -			ring,
> -			ring->default_context->engine[ring->id].state,
> -			ring->default_context->engine[ring->id].ringbuf);
> +	ret = intel_lr_context_do_pin(ring, ring->default_context);
>   	if (ret) {
>   		DRM_ERROR(
>   			"Failed to pin and map ringbuffer %s: %d\n",
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index de41ad6cd63d..49af638f6213 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -107,13 +107,15 @@ void intel_lr_context_reset(struct drm_device *dev,
>   uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
>   				     struct intel_engine_cs *ring);
>
> +u32 intel_execlists_ctx_id(struct intel_context *ctx,
> +			   struct intel_engine_cs *ring);
> +
>   /* Execlists */
>   int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
>   struct i915_execbuffer_params;
>   int intel_execlists_submission(struct i915_execbuffer_params *params,
>   			       struct drm_i915_gem_execbuffer2 *args,
>   			       struct list_head *vmas);
> -u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
>
>   void intel_lrc_irq_handler(struct intel_engine_cs *ring);
>   void intel_execlists_retire_requests(struct intel_engine_cs *ring);
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 7349d9258191..85ce2272f92c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -269,6 +269,8 @@ struct  intel_engine_cs {
>   	struct list_head execlist_queue;
>   	struct list_head execlist_retired_req_list;
>   	u8 next_context_status_buffer;
> +	bool disable_lite_restore_wa;
> +	u32 ctx_desc_template;
>   	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>   	int		(*emit_request)(struct drm_i915_gem_request *request);
>   	int		(*emit_flush)(struct drm_i915_gem_request *request,
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/7] drm/i915: Only grab timestamps when needed
  2016-01-11 15:04     ` Tvrtko Ursulin
@ 2016-01-12 15:52       ` Dave Gordon
  2016-01-12 17:14         ` Daniel Vetter
  0 siblings, 1 reply; 48+ messages in thread
From: Dave Gordon @ 2016-01-12 15:52 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, Intel-gfx

On 11/01/16 15:04, Tvrtko Ursulin wrote:
>
> On 11/01/16 14:36, Chris Wilson wrote:
>> On Mon, Jan 11, 2016 at 02:08:40PM +0000, Tvrtko Ursulin wrote:
>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>
>>> No need to call ktime_get_raw_ns twice per unlimited wait and can
>>> also elimate a local variable.
>>
>> But we could eliminate both, and the unsightly pointless assignment only
>> required to shut gcc up.
>>
>> Still preferring my patch.
>
> Ah I remember it now.. you were storing it in the pointer provided by
> the caller. I think that is significantly worse, sorry cannot approve that.
>
> Regards,
>
> Tvrtko

Local variable good, pointer indirection through parameter bad.

Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-12 13:11           ` Chris Wilson
@ 2016-01-12 16:45             ` Dave Gordon
  2016-01-13  1:37               ` Chris Wilson
  2016-01-13 13:49               ` Tvrtko Ursulin
  0 siblings, 2 replies; 48+ messages in thread
From: Dave Gordon @ 2016-01-12 16:45 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin, Intel-gfx, Tvrtko Ursulin

On 12/01/16 13:11, Chris Wilson wrote:
> On Tue, Jan 12, 2016 at 12:54:25PM +0000, Tvrtko Ursulin wrote:
>>
>> On 12/01/16 12:12, Chris Wilson wrote:
>>> On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>
>>>> LRC lifetime is well defined so we can cache the page pointing
>>>> to the object backing store in the context in order to avoid
>>>> walking over the object SG page list from the interrupt context
>>>> without the big lock held.
>>>>
>>>> v2: Also cache the mapping. (Chris Wilson)
>>>> v3: Unmap on the error path.
>>>
>>> Then we only use the lrc_state_page in the unmapping, hardly worth the
>>> cost of saving it.
>>
>> Ok.
>>
>> Do you also know if this would now require any flushing or something
>> if previously kunmap_atomic might have done something under the
>> covers?
>
> kmap_atomic only changes the PTE and the unmap would flush the TLB. In
> terms of our pointer access, using kmap/kmap_atomic is equivalent. (Just
> kmap_atomic is meant to be cheaper to set up and a limited resource
> which can only be used without preemption.)
>
>>> The reg_state is better associated with the ring (since it basically
>>> contains the analog of the RING_HEAD and friends).
>>
>> Hm, not sure. The page belongs to the object from that anonymous
>> struct in intel_context so I think it is best to keep them together.
>
> The page does sure, but the state does not.
>
> The result is that we get:
>
> ring->registers[CTX_RING_BUFFER_STATE+1] = ring->vma->node.state;
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
> ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
> ring->registers[CTX_RING_TAIL+1] = req->tail;
>
> which makes a lot more sense, to me, when viewed against the underlying
> architecture of the hardware and comparing against the legacy ringbuffer.
> -Chris

When you say "ring", do you mean "engine" or "ringbuffer"?

The register state can't be associated with the engine /per se/, because 
it has to be per-context as well as per-engine. It doesn't really belong 
with the ringbuffer; in particular I've seen code for allocating the 
ringbuffer in stolen memory and dropping it during hibernation, whereas 
this register state shouldn't be lost across hibernation. So the 
lifetime of a register state image and a ringbuffer are different, 
therefore they don't belong together.

The register state is pretty much the /definition/ of a context, in 
hardware terms. OK, it's got an HWSP and other bits, but the register 
save area is what the h/w really cares about. So it makes sense that the 
kmapping for that area is also part of (the CPU's idea of) the context. Yes,

	ctx.engine[engine_id].registers[regno] = ...

is a bit clumsy, but I'd expect to cache the register-image pointer in a 
local here, along the lines of:

	uint32_t *registers = ctx.engine[engine_id].registers;
	registers[CTX_RING_TAIL+1] = req->tail;

etc.

[aside]
Some of this would be much less clumsy if the anonymous engine[]
struct had a name, say, "engine_info", so we could just do
	struct engine_info *eip = &ctx.engines[engine->id];
once at the beginning of any per-engine function and then use
eip->ringbuf, eip->state, eip->registers, etc without continually 
repeating the 'ctx.' and 'engine->id' bits over and over ...
[/aside]

Apart from that, I think Tvrtko's patch has lost the dirtying from:

 > -	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);

in execlists_update_context(), so should add

	set_page_dirty(lrc_state_page)

instead (and that's the use for it, so you /do/ want to cache it).

.Dave.
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 6/7] drm/i915: Only grab timestamps when needed
  2016-01-12 15:52       ` Dave Gordon
@ 2016-01-12 17:14         ` Daniel Vetter
  2016-01-13 13:54           ` [PATCH v2] " Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Daniel Vetter @ 2016-01-12 17:14 UTC (permalink / raw)
  To: Dave Gordon; +Cc: Intel-gfx

On Tue, Jan 12, 2016 at 03:52:36PM +0000, Dave Gordon wrote:
> On 11/01/16 15:04, Tvrtko Ursulin wrote:
> >
> >On 11/01/16 14:36, Chris Wilson wrote:
> >>On Mon, Jan 11, 2016 at 02:08:40PM +0000, Tvrtko Ursulin wrote:
> >>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>
> >>>No need to call ktime_get_raw_ns twice per unlimited wait and can
> >>>also elimate a local variable.
> >>
> >>But we could eliminate both, and the unsightly pointless assignment only
> >>required to shut gcc up.
> >>
> >>Still preferring my patch.
> >
> >Ah I remember it now.. you were storing it in the pointer provided by
> >the caller. I think that is significantly worse, sorry cannot approve that.
> >
> >Regards,
> >
> >Tvrtko
> 
> Local variable good, pointer indirection through parameter bad.
> 
> Reviewed-by: Dave Gordon <david.s.gordon@intel.com>

Needs a comment like now = 0; /* shut up dense gcc */, with that acked.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-12 16:45             ` Dave Gordon
@ 2016-01-13  1:37               ` Chris Wilson
  2016-01-13 13:49               ` Tvrtko Ursulin
  1 sibling, 0 replies; 48+ messages in thread
From: Chris Wilson @ 2016-01-13  1:37 UTC (permalink / raw)
  To: Dave Gordon; +Cc: Intel-gfx

On Tue, Jan 12, 2016 at 04:45:02PM +0000, Dave Gordon wrote:
> On 12/01/16 13:11, Chris Wilson wrote:
> >On Tue, Jan 12, 2016 at 12:54:25PM +0000, Tvrtko Ursulin wrote:
> >>
> >>On 12/01/16 12:12, Chris Wilson wrote:
> >>>On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
> >>>>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>>>
> >>>>LRC lifetime is well defined so we can cache the page pointing
> >>>>to the object backing store in the context in order to avoid
> >>>>walking over the object SG page list from the interrupt context
> >>>>without the big lock held.
> >>>>
> >>>>v2: Also cache the mapping. (Chris Wilson)
> >>>>v3: Unmap on the error path.
> >>>
> >>>Then we only use the lrc_state_page in the unmapping, hardly worth the
> >>>cost of saving it.
> >>
> >>Ok.
> >>
> >>Do you also know if this would now require any flushing or something
> >>if previously kunmap_atomic might have done something under the
> >>covers?
> >
> >kmap_atomic only changes the PTE and the unmap would flush the TLB. In
> >terms of our pointer access, using kmap/kmap_atomic is equivalent. (Just
> >kmap_atomic is meant to be cheaper to set up and a limited resource
> >which can only be used without preemption.)
> >
> >>>The reg_state is better associated with the ring (since it basically
> >>>contains the analog of the RING_HEAD and friends).
> >>
> >>Hm, not sure. The page belongs to the object from that anonymous
> >>struct in intel_context so I think it is best to keep them together.
> >
> >The page does sure, but the state does not.
> >
> >The result is that we get:
> >
> >ring->registers[CTX_RING_BUFFER_STATE+1] = ring->vma->node.state;
> >ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
> >ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
> >ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
> >ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
> >ring->registers[CTX_RING_TAIL+1] = req->tail;
> >
> >which makes a lot more sense, to me, when viewed against the underlying
> >architecture of the hardware and comparing against the legacy ringbuffer.
> >-Chris
> 
> When you say "ring", do you mean "engine" or "ringbuffer"?

The ring; the buffer object, its ggtt assignment, its
virtual address, the cpu head/tail, its registers - pretty much
everything we update and use between pinning the context for a request
and submitting that request to the engine (via any intermediate). I also
included the ctx_descriptor on the ring for example.
 
> The register state can't be associated with the engine /per se/,
> because it has to be per-context as well as per-engine. It doesn't
> really belong with the ringbuffer; in particular I've seen code for
> allocating the ringbuffer in stolen memory and dropping it during
> hibernation, whereas this register state shouldn't be lost across
> hibernation. So the lifetime of a register state image and a
> ringbuffer are different, therefore they don't belong together.

Indeed, we are not talking about the backing storage for the GPU ring
buffer. Just the register kmap and associated transient state.

struct intel_ring, if you like, is the part of struct intel_context that
we use to build individual requests.

> The register state is pretty much the /definition/ of a context, in
> hardware terms. OK, it's got an HWSP and other bits, but the
> register save area is what the h/w really cares about. So it makes
> sense that the kmapping for that area is also part of (the CPU's
> idea of) the context.

The ring is part of the context, too. The entire view of a driver for a
particular process/state is the context.  The logical state for the GPU
is just a part of that.

> Yes,
> 
> 	ctx.engine[engine_id].registers[regno] = ...
> 
> is a bit clumsy, but I'd expect to cache the register-image pointer
> in a local here, along the lines of:
> 
> 	uint32_t *registers = ctx.engine[engine_id].registers;
> 	registers[CTX_RING_TAIL+1] = req->tail;

I liked ring->registers precisely because I felt it was more descriptive
here:

req->ring->registers[CTX_RING_TAIL+1] = req->tail; where req->ring is the
intel_ring and not the intel_engine_cs.

So instead of adding more to struct intel_context_engine that is only to
be used for the request construction, I have been putting that same
information into struct intel_ring, to keep the two methods for
constructing requests using the same structs in a similar fashion and
looking roughly the same and sharing more code.

> etc.
> 
> [aside]
> Some of this would be much less clumsy if the anonymous engine[]
> struct had a name, say, "engine_info", so we could just do

Many times I have said the same thing and tried to get it changed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-11 16:56               ` Chris Wilson
@ 2016-01-13 12:46                 ` Tvrtko Ursulin
  2016-01-13 13:36                   ` Imre Deak
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-13 12:46 UTC (permalink / raw)
  To: Chris Wilson, Ville Syrjälä, Intel-gfx, Imre Deak

On 11/01/16 16:56, Chris Wilson wrote:
> On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä wrote:
>> On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin wrote:
>>> Don't know, I leave this one to whoever grabbed the lock around
>>> intel_init_gt_powersave in the first place. Maybe there was a special
>>> reason.. after git blame od intel_display.c eventually completed, adding
>>> Imre and Ville to cc.
>>
>> Hmm. I don't recall the details anymore, but looking at the code pushing
>> the locking down to valleyview_setup_pctx() looks entirely reasonable to
>> me.
>
> iirc, this locking only exists to keep the WARN() at bay. But it is
> pedagogical, I guess.

Don't really know this area, but what about the 
intel_gen6_powersave_work->valleyview_enable_rps->valleyview_check_pctx 
which dereferences the dev_priv->vlv_pctx, which is set/cleared in 
valleyview_setup_pctx/valleyview_cleanup_pctx, which would now be 
outside both struct_mutex and the rps lock?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 12:46                 ` Tvrtko Ursulin
@ 2016-01-13 13:36                   ` Imre Deak
  2016-01-13 14:11                     ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Imre Deak @ 2016-01-13 13:36 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson, Ville Syrjälä, Intel-gfx

On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
> On 11/01/16 16:56, Chris Wilson wrote:
> > On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä wrote:
> > > On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin wrote:
> > > > Don't know, I leave this one to whoever grabbed the lock around
> > > > intel_init_gt_powersave in the first place. Maybe there was a
> > > > special
> > > > reason.. after git blame od intel_display.c eventually
> > > > completed, adding
> > > > Imre and Ville to cc.
> > > 
> > > Hmm. I don't recall the details anymore, but looking at the code
> > > pushing
> > > the locking down to valleyview_setup_pctx() looks entirely
> > > reasonable to
> > > me.
> > 
> > iirc, this locking only exists to keep the WARN() at bay. But it is
> > pedagogical, I guess.
> 
> Don't really know this area, but what about the 
> intel_gen6_powersave_work->valleyview_enable_rps-
> >valleyview_check_pctx 
> which dereferences the dev_priv->vlv_pctx, which is set/cleared in 
> valleyview_setup_pctx/valleyview_cleanup_pctx, which would now be 
> outside both struct_mutex and the rps lock?

dev_priv->vlv_pctx is not protected on the premise that the driver
init/cleanup functions can't race and gen6_powersave_work() is
scheduled only after intel_init_gt_powersave() and flushed
before intel_cleanup_gt_powersave().

rps_lock protects the RPS HW accesses themselves and struct_mutex was
taken for the GEM allocation. Taking it at high level around 
intel_init_gt_powersave() was kind of a copy&paste in the commit you
found, there is more on that in 79f5b2c75992.

--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH v3 4/7] drm/i915: Cache LRC state page in the context
  2016-01-12 16:45             ` Dave Gordon
  2016-01-13  1:37               ` Chris Wilson
@ 2016-01-13 13:49               ` Tvrtko Ursulin
  1 sibling, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-13 13:49 UTC (permalink / raw)
  To: Dave Gordon, Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 12/01/16 16:45, Dave Gordon wrote:
> On 12/01/16 13:11, Chris Wilson wrote:
>> On Tue, Jan 12, 2016 at 12:54:25PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 12/01/16 12:12, Chris Wilson wrote:
>>>> On Tue, Jan 12, 2016 at 11:56:11AM +0000, Tvrtko Ursulin wrote:
>>>>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>>>>
>>>>> LRC lifetime is well defined so we can cache the page pointing
>>>>> to the object backing store in the context in order to avoid
>>>>> walking over the object SG page list from the interrupt context
>>>>> without the big lock held.
>>>>>
>>>>> v2: Also cache the mapping. (Chris Wilson)
>>>>> v3: Unmap on the error path.
>>>>
>>>> Then we only use the lrc_state_page in the unmapping, hardly worth the
>>>> cost of saving it.
>>>
>>> Ok.
>>>
>>> Do you also know if this would now require any flushing or something
>>> if previously kunmap_atomic might have done something under the
>>> covers?
>>
>> kmap_atomic only changes the PTE and the unmap would flush the TLB. In
>> terms of our pointer access, using kmap/kmap_atomic is equivalent. (Just
>> kmap_atomic is meant to be cheaper to set up and a limited resource
>> which can only be used without preemption.)
>>
>>>> The reg_state is better associated with the ring (since it basically
>>>> contains the analog of the RING_HEAD and friends).
>>>
>>> Hm, not sure. The page belongs to the object from that anonymous
>>> struct in intel_context so I think it is best to keep them together.
>>
>> The page does sure, but the state does not.
>>
>> The result is that we get:
>>
>> ring->registers[CTX_RING_BUFFER_STATE+1] = ring->vma->node.state;
>> ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
>> ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
>> ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
>> ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
>> ring->registers[CTX_RING_TAIL+1] = req->tail;
>>
>> which makes a lot more sense, to me, when viewed against the underlying
>> architecture of the hardware and comparing against the legacy ringbuffer.
>> -Chris
>
> When you say "ring", do you mean "engine" or "ringbuffer"?
>
> The register state can't be associated with the engine /per se/, because
> it has to be per-context as well as per-engine. It doesn't really belong
> with the ringbuffer; in particular I've seen code for allocating the
> ringbuffer in stolen memory and dropping it during hibernation, whereas
> this register state shouldn't be lost across hibernation. So the
> lifetime of a register state image and a ringbuffer are different,
> therefore they don't belong together.
>
> The register state is pretty much the /definition/ of a context, in
> hardware terms. OK, it's got an HWSP and other bits, but the register
> save area is what the h/w really cares about. So it makes sense that the
> kmapping for that area is also part of (the CPU's idea of) the context.
> Yes,
>
>      ctx.engine[engine_id].registers[regno] = ...
>
> is a bit clumsy, but I'd expect to cache the register-image pointer in a
> local here, along the lines of:
>
>      uint32_t *registers = ctx.engine[engine_id].registers;
>      registers[CTX_RING_TAIL+1] = req->tail;
>
> etc.
>
> [aside]
> Some of this would be much less clumsy if the anonymous engine[]
> struct had a name, say, "engine_info", so we could just do
>      struct engine_info *eip = &ctx.engines[engine->id];
> once at the beginning of any per-engine function and then use
> eip->ringbuf, eip->state, eip->registers, etc without continually
> repeating the 'ctx.' and 'engine->id' bits over and over ...
> [/aside]
>
> Apart from that, I think Tvrtko's patch has lost the dirtying from:
>
>  > -    page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
>
> in execlists_update_context(), so should add
>
>      set_page_dirty(lrc_state_page)
>
> instead (and that's the use for it, so you /do/ want to cache it).

It is still there (i915_gem_object_get_dirty_page) but at the point of 
pinning, not each ctx update. Do you think that is a problem? I thought 
no one can flush and clear the dirty page while the ctx is pinned.

On the more general level, can we agree to let this fix with the pointer 
cached where it currently is (in the context) and leave the anonymous 
struct - which we all dislike - and I'd add it should probably be 
defined in intel_lrc.h - for a later cleanup?

If so I only need to respin with the struct page * caching removed.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v2] drm/i915: Only grab timestamps when needed
  2016-01-12 17:14         ` Daniel Vetter
@ 2016-01-13 13:54           ` Tvrtko Ursulin
  0 siblings, 0 replies; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-13 13:54 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

No need to call ktime_get_raw_ns twice per unlimited wait and can
also elimate a local variable.

v2: Added comment about silencing the compiler warning. (Daniel Vetter)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Reviewed-by: Dave Gordon <david.s.gordon@intel.com>
Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch>
---
 drivers/gpu/drm/i915/i915_gem.c | 12 +++++++-----
 1 file changed, 7 insertions(+), 5 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index ddc21d4b388d..6b0102da859c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1251,7 +1251,7 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	int state = interruptible ? TASK_INTERRUPTIBLE : TASK_UNINTERRUPTIBLE;
 	DEFINE_WAIT(wait);
 	unsigned long timeout_expire;
-	s64 before, now;
+	s64 before = 0; /* Only to silence a compiler warning. */
 	int ret;
 
 	WARN(!intel_irqs_enabled(dev_priv), "IRQs disabled");
@@ -1271,14 +1271,17 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 			return -ETIME;
 
 		timeout_expire = jiffies + nsecs_to_jiffies_timeout(*timeout);
+
+		/*
+		 * Record current time in case interrupted by signal, or wedged.
+		 */
+		before = ktime_get_raw_ns();
 	}
 
 	if (INTEL_INFO(dev_priv)->gen >= 6)
 		gen6_rps_boost(dev_priv, rps, req->emitted_jiffies);
 
-	/* Record current time in case interrupted by signal, or wedged */
 	trace_i915_gem_request_wait_begin(req);
-	before = ktime_get_raw_ns();
 
 	/* Optimistic spin for the next jiffie before touching IRQs */
 	ret = __i915_spin_request(req, state);
@@ -1343,11 +1346,10 @@ int __i915_wait_request(struct drm_i915_gem_request *req,
 	finish_wait(&ring->irq_queue, &wait);
 
 out:
-	now = ktime_get_raw_ns();
 	trace_i915_gem_request_wait_end(req);
 
 	if (timeout) {
-		s64 tres = *timeout - (now - before);
+		s64 tres = *timeout - (ktime_get_raw_ns() - before);
 
 		*timeout = tres < 0 ? 0 : tres;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 13:36                   ` Imre Deak
@ 2016-01-13 14:11                     ` Tvrtko Ursulin
  2016-01-13 14:32                       ` Chris Wilson
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-13 14:11 UTC (permalink / raw)
  To: imre.deak, Chris Wilson, Ville Syrjälä, Intel-gfx


On 13/01/16 13:36, Imre Deak wrote:
> On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
>> On 11/01/16 16:56, Chris Wilson wrote:
>>> On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä wrote:
>>>> On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin wrote:
>>>>> Don't know, I leave this one to whoever grabbed the lock around
>>>>> intel_init_gt_powersave in the first place. Maybe there was a
>>>>> special
>>>>> reason.. after git blame od intel_display.c eventually
>>>>> completed, adding
>>>>> Imre and Ville to cc.
>>>>
>>>> Hmm. I don't recall the details anymore, but looking at the code
>>>> pushing
>>>> the locking down to valleyview_setup_pctx() looks entirely
>>>> reasonable to
>>>> me.
>>>
>>> iirc, this locking only exists to keep the WARN() at bay. But it is
>>> pedagogical, I guess.
>>
>> Don't really know this area, but what about the
>> intel_gen6_powersave_work->valleyview_enable_rps-
>>> valleyview_check_pctx
>> which dereferences the dev_priv->vlv_pctx, which is set/cleared in
>> valleyview_setup_pctx/valleyview_cleanup_pctx, which would now be
>> outside both struct_mutex and the rps lock?
>
> dev_priv->vlv_pctx is not protected on the premise that the driver
> init/cleanup functions can't race and gen6_powersave_work() is
> scheduled only after intel_init_gt_powersave() and flushed
> before intel_cleanup_gt_powersave().
>
> rps_lock protects the RPS HW accesses themselves and struct_mutex was
> taken for the GEM allocation. Taking it at high level around
> intel_init_gt_powersave() was kind of a copy&paste in the commit you
> found, there is more on that in 79f5b2c75992.

Thanks for digging this out.

It is more involved than what Chris pasted then, and while I hoped to be 
able to quickly shove that into a patch, I cannot allow the time for 
more the extensive analysis.

So my patch fixes the issue of struct_mutex not being held in 
intel_alloc_initial_plane_obj while calling 
i915_gem_object_create_stolen_for_preallocated and I'll leave 
struct_mutex/rps.lock locking inversion in RPM to someone else.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 14:11                     ` Tvrtko Ursulin
@ 2016-01-13 14:32                       ` Chris Wilson
  2016-01-13 14:41                         ` Imre Deak
  0 siblings, 1 reply; 48+ messages in thread
From: Chris Wilson @ 2016-01-13 14:32 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Wed, Jan 13, 2016 at 02:11:42PM +0000, Tvrtko Ursulin wrote:
> 
> On 13/01/16 13:36, Imre Deak wrote:
> >On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
> >>On 11/01/16 16:56, Chris Wilson wrote:
> >>>On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä wrote:
> >>>>On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin wrote:
> >>>>>Don't know, I leave this one to whoever grabbed the lock around
> >>>>>intel_init_gt_powersave in the first place. Maybe there was a
> >>>>>special
> >>>>>reason.. after git blame od intel_display.c eventually
> >>>>>completed, adding
> >>>>>Imre and Ville to cc.
> >>>>
> >>>>Hmm. I don't recall the details anymore, but looking at the code
> >>>>pushing
> >>>>the locking down to valleyview_setup_pctx() looks entirely
> >>>>reasonable to
> >>>>me.
> >>>
> >>>iirc, this locking only exists to keep the WARN() at bay. But it is
> >>>pedagogical, I guess.
> >>
> >>Don't really know this area, but what about the
> >>intel_gen6_powersave_work->valleyview_enable_rps-
> >>>valleyview_check_pctx
> >>which dereferences the dev_priv->vlv_pctx, which is set/cleared in
> >>valleyview_setup_pctx/valleyview_cleanup_pctx, which would now be
> >>outside both struct_mutex and the rps lock?
> >
> >dev_priv->vlv_pctx is not protected on the premise that the driver
> >init/cleanup functions can't race and gen6_powersave_work() is
> >scheduled only after intel_init_gt_powersave() and flushed
> >before intel_cleanup_gt_powersave().
> >
> >rps_lock protects the RPS HW accesses themselves and struct_mutex was
> >taken for the GEM allocation. Taking it at high level around
> >intel_init_gt_powersave() was kind of a copy&paste in the commit you
> >found, there is more on that in 79f5b2c75992.
> 
> Thanks for digging this out.
> 
> It is more involved than what Chris pasted then, and while I hoped
> to be able to quickly shove that into a patch, I cannot allow the
> time for more the extensive analysis.

Imre just confirmed that struct_mutex is only for the GEM
allocations in the vlv_setup_ctx, right Imre?
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 14:32                       ` Chris Wilson
@ 2016-01-13 14:41                         ` Imre Deak
  2016-01-13 14:53                           ` Tvrtko Ursulin
  0 siblings, 1 reply; 48+ messages in thread
From: Imre Deak @ 2016-01-13 14:41 UTC (permalink / raw)
  To: Chris Wilson, Tvrtko Ursulin; +Cc: Intel-gfx

On ke, 2016-01-13 at 14:32 +0000, Chris Wilson wrote:
> On Wed, Jan 13, 2016 at 02:11:42PM +0000, Tvrtko Ursulin wrote:
> > 
> > On 13/01/16 13:36, Imre Deak wrote:
> > > On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
> > > > On 11/01/16 16:56, Chris Wilson wrote:
> > > > > On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä
> > > > > wrote:
> > > > > > On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin
> > > > > > wrote:
> > > > > > > Don't know, I leave this one to whoever grabbed the lock
> > > > > > > around
> > > > > > > intel_init_gt_powersave in the first place. Maybe there
> > > > > > > was a
> > > > > > > special
> > > > > > > reason.. after git blame od intel_display.c eventually
> > > > > > > completed, adding
> > > > > > > Imre and Ville to cc.
> > > > > > 
> > > > > > Hmm. I don't recall the details anymore, but looking at the
> > > > > > code
> > > > > > pushing
> > > > > > the locking down to valleyview_setup_pctx() looks entirely
> > > > > > reasonable to
> > > > > > me.
> > > > > 
> > > > > iirc, this locking only exists to keep the WARN() at bay. But
> > > > > it is
> > > > > pedagogical, I guess.
> > > > 
> > > > Don't really know this area, but what about the
> > > > intel_gen6_powersave_work->valleyview_enable_rps-
> > > > > valleyview_check_pctx
> > > > which dereferences the dev_priv->vlv_pctx, which is set/cleared
> > > > in
> > > > valleyview_setup_pctx/valleyview_cleanup_pctx, which would now
> > > > be
> > > > outside both struct_mutex and the rps lock?
> > > 
> > > dev_priv->vlv_pctx is not protected on the premise that the
> > > driver
> > > init/cleanup functions can't race and gen6_powersave_work() is
> > > scheduled only after intel_init_gt_powersave() and flushed
> > > before intel_cleanup_gt_powersave().
> > > 
> > > rps_lock protects the RPS HW accesses themselves and struct_mutex
> > > was
> > > taken for the GEM allocation. Taking it at high level around
> > > intel_init_gt_powersave() was kind of a copy&paste in the commit
> > > you
> > > found, there is more on that in 79f5b2c75992.
> > 
> > Thanks for digging this out.
> > 
> > It is more involved than what Chris pasted then, and while I hoped
> > to be able to quickly shove that into a patch, I cannot allow the
> > time for more the extensive analysis.
> 
> Imre just confirmed that struct_mutex is only for the GEM
> allocations in the vlv_setup_ctx, right Imre?

Yes, that's my understanding.

--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 14:41                         ` Imre Deak
@ 2016-01-13 14:53                           ` Tvrtko Ursulin
  2016-01-13 15:25                             ` Imre Deak
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-13 14:53 UTC (permalink / raw)
  To: imre.deak, Chris Wilson; +Cc: Intel-gfx


On 13/01/16 14:41, Imre Deak wrote:
> On ke, 2016-01-13 at 14:32 +0000, Chris Wilson wrote:
>> On Wed, Jan 13, 2016 at 02:11:42PM +0000, Tvrtko Ursulin wrote:
>>>
>>> On 13/01/16 13:36, Imre Deak wrote:
>>>> On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
>>>>> On 11/01/16 16:56, Chris Wilson wrote:
>>>>>> On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä
>>>>>> wrote:
>>>>>>> On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko Ursulin
>>>>>>> wrote:
>>>>>>>> Don't know, I leave this one to whoever grabbed the lock
>>>>>>>> around
>>>>>>>> intel_init_gt_powersave in the first place. Maybe there
>>>>>>>> was a
>>>>>>>> special
>>>>>>>> reason.. after git blame od intel_display.c eventually
>>>>>>>> completed, adding
>>>>>>>> Imre and Ville to cc.
>>>>>>>
>>>>>>> Hmm. I don't recall the details anymore, but looking at the
>>>>>>> code
>>>>>>> pushing
>>>>>>> the locking down to valleyview_setup_pctx() looks entirely
>>>>>>> reasonable to
>>>>>>> me.
>>>>>>
>>>>>> iirc, this locking only exists to keep the WARN() at bay. But
>>>>>> it is
>>>>>> pedagogical, I guess.
>>>>>
>>>>> Don't really know this area, but what about the
>>>>> intel_gen6_powersave_work->valleyview_enable_rps-
>>>>>> valleyview_check_pctx
>>>>> which dereferences the dev_priv->vlv_pctx, which is set/cleared
>>>>> in
>>>>> valleyview_setup_pctx/valleyview_cleanup_pctx, which would now
>>>>> be
>>>>> outside both struct_mutex and the rps lock?
>>>>
>>>> dev_priv->vlv_pctx is not protected on the premise that the
>>>> driver
>>>> init/cleanup functions can't race and gen6_powersave_work() is
>>>> scheduled only after intel_init_gt_powersave() and flushed
>>>> before intel_cleanup_gt_powersave().
>>>>
>>>> rps_lock protects the RPS HW accesses themselves and struct_mutex
>>>> was
>>>> taken for the GEM allocation. Taking it at high level around
>>>> intel_init_gt_powersave() was kind of a copy&paste in the commit
>>>> you
>>>> found, there is more on that in 79f5b2c75992.
>>>
>>> Thanks for digging this out.
>>>
>>> It is more involved than what Chris pasted then, and while I hoped
>>> to be able to quickly shove that into a patch, I cannot allow the
>>> time for more the extensive analysis.
>>
>> Imre just confirmed that struct_mutex is only for the GEM
>> allocations in the vlv_setup_ctx, right Imre?
>
> Yes, that's my understanding.

I glanced at 79f5b2c75992 and saw it pulled out a bunch of locking from 
lower down to the top level so it sounded like doing the reverse would 
not be straightforward. Perhaps I overestimated it.

Regards,

Tvrtko





_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 14:53                           ` Tvrtko Ursulin
@ 2016-01-13 15:25                             ` Imre Deak
  2016-01-13 15:55                               ` Daniel Vetter
  0 siblings, 1 reply; 48+ messages in thread
From: Imre Deak @ 2016-01-13 15:25 UTC (permalink / raw)
  To: Tvrtko Ursulin, Chris Wilson; +Cc: Intel-gfx

On ke, 2016-01-13 at 14:53 +0000, Tvrtko Ursulin wrote:
> On 13/01/16 14:41, Imre Deak wrote:
> > On ke, 2016-01-13 at 14:32 +0000, Chris Wilson wrote:
> > > On Wed, Jan 13, 2016 at 02:11:42PM +0000, Tvrtko Ursulin wrote:
> > > > 
> > > > On 13/01/16 13:36, Imre Deak wrote:
> > > > > On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
> > > > > > On 11/01/16 16:56, Chris Wilson wrote:
> > > > > > > On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä
> > > > > > > wrote:
> > > > > > > > On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko
> > > > > > > > Ursulin
> > > > > > > > wrote:
> > > > > > > > > Don't know, I leave this one to whoever grabbed the
> > > > > > > > > lock
> > > > > > > > > around
> > > > > > > > > intel_init_gt_powersave in the first place. Maybe
> > > > > > > > > there
> > > > > > > > > was a
> > > > > > > > > special
> > > > > > > > > reason.. after git blame od intel_display.c
> > > > > > > > > eventually
> > > > > > > > > completed, adding
> > > > > > > > > Imre and Ville to cc.
> > > > > > > > 
> > > > > > > > Hmm. I don't recall the details anymore, but looking at
> > > > > > > > the
> > > > > > > > code
> > > > > > > > pushing
> > > > > > > > the locking down to valleyview_setup_pctx() looks
> > > > > > > > entirely
> > > > > > > > reasonable to
> > > > > > > > me.
> > > > > > > 
> > > > > > > iirc, this locking only exists to keep the WARN() at bay.
> > > > > > > But
> > > > > > > it is
> > > > > > > pedagogical, I guess.
> > > > > > 
> > > > > > Don't really know this area, but what about the
> > > > > > intel_gen6_powersave_work->valleyview_enable_rps-
> > > > > > > valleyview_check_pctx
> > > > > > which dereferences the dev_priv->vlv_pctx, which is
> > > > > > set/cleared
> > > > > > in
> > > > > > valleyview_setup_pctx/valleyview_cleanup_pctx, which would
> > > > > > now
> > > > > > be
> > > > > > outside both struct_mutex and the rps lock?
> > > > > 
> > > > > dev_priv->vlv_pctx is not protected on the premise that the
> > > > > driver
> > > > > init/cleanup functions can't race and gen6_powersave_work()
> > > > > is
> > > > > scheduled only after intel_init_gt_powersave() and flushed
> > > > > before intel_cleanup_gt_powersave().
> > > > > 
> > > > > rps_lock protects the RPS HW accesses themselves and
> > > > > struct_mutex
> > > > > was
> > > > > taken for the GEM allocation. Taking it at high level around
> > > > > intel_init_gt_powersave() was kind of a copy&paste in the
> > > > > commit
> > > > > you
> > > > > found, there is more on that in 79f5b2c75992.
> > > > 
> > > > Thanks for digging this out.
> > > > 
> > > > It is more involved than what Chris pasted then, and while I
> > > > hoped
> > > > to be able to quickly shove that into a patch, I cannot allow
> > > > the
> > > > time for more the extensive analysis.
> > > 
> > > Imre just confirmed that struct_mutex is only for the GEM
> > > allocations in the vlv_setup_ctx, right Imre?
> > 
> > Yes, that's my understanding.
> 
> I glanced at 79f5b2c75992 and saw it pulled out a bunch of locking
> from 
> lower down to the top level so it sounded like doing the reverse
> would 
> not be straightforward. Perhaps I overestimated it.

Ok, some more background info :)

Initially the HW access was also protected by struct_mutex but
4fc688ce7 added rps_lock for just the the HW access. So as I understand
we don't need struct_mutex for anything else besides the GEM
allocation. So feel free to add my ACK on the change moving the lock to
vlv_setup_ctx().

Btw, we could also remove struct_mutex from intel_enable_gt_powersave()
where it's taken for old platforms and rely on mchdev_lock or take
rps_lock instead, but someone needs to double-check this.

--Imre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* Re: [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock
  2016-01-13 15:25                             ` Imre Deak
@ 2016-01-13 15:55                               ` Daniel Vetter
  0 siblings, 0 replies; 48+ messages in thread
From: Daniel Vetter @ 2016-01-13 15:55 UTC (permalink / raw)
  To: Imre Deak; +Cc: Intel-gfx

On Wed, Jan 13, 2016 at 05:25:09PM +0200, Imre Deak wrote:
> On ke, 2016-01-13 at 14:53 +0000, Tvrtko Ursulin wrote:
> > On 13/01/16 14:41, Imre Deak wrote:
> > > On ke, 2016-01-13 at 14:32 +0000, Chris Wilson wrote:
> > > > On Wed, Jan 13, 2016 at 02:11:42PM +0000, Tvrtko Ursulin wrote:
> > > > > 
> > > > > On 13/01/16 13:36, Imre Deak wrote:
> > > > > > On ke, 2016-01-13 at 12:46 +0000, Tvrtko Ursulin wrote:
> > > > > > > On 11/01/16 16:56, Chris Wilson wrote:
> > > > > > > > On Mon, Jan 11, 2016 at 05:36:46PM +0200, Ville Syrjälä
> > > > > > > > wrote:
> > > > > > > > > On Mon, Jan 11, 2016 at 03:16:16PM +0000, Tvrtko
> > > > > > > > > Ursulin
> > > > > > > > > wrote:
> > > > > > > > > > Don't know, I leave this one to whoever grabbed the
> > > > > > > > > > lock
> > > > > > > > > > around
> > > > > > > > > > intel_init_gt_powersave in the first place. Maybe
> > > > > > > > > > there
> > > > > > > > > > was a
> > > > > > > > > > special
> > > > > > > > > > reason.. after git blame od intel_display.c
> > > > > > > > > > eventually
> > > > > > > > > > completed, adding
> > > > > > > > > > Imre and Ville to cc.
> > > > > > > > > 
> > > > > > > > > Hmm. I don't recall the details anymore, but looking at
> > > > > > > > > the
> > > > > > > > > code
> > > > > > > > > pushing
> > > > > > > > > the locking down to valleyview_setup_pctx() looks
> > > > > > > > > entirely
> > > > > > > > > reasonable to
> > > > > > > > > me.
> > > > > > > > 
> > > > > > > > iirc, this locking only exists to keep the WARN() at bay.
> > > > > > > > But
> > > > > > > > it is
> > > > > > > > pedagogical, I guess.
> > > > > > > 
> > > > > > > Don't really know this area, but what about the
> > > > > > > intel_gen6_powersave_work->valleyview_enable_rps-
> > > > > > > > valleyview_check_pctx
> > > > > > > which dereferences the dev_priv->vlv_pctx, which is
> > > > > > > set/cleared
> > > > > > > in
> > > > > > > valleyview_setup_pctx/valleyview_cleanup_pctx, which would
> > > > > > > now
> > > > > > > be
> > > > > > > outside both struct_mutex and the rps lock?
> > > > > > 
> > > > > > dev_priv->vlv_pctx is not protected on the premise that the
> > > > > > driver
> > > > > > init/cleanup functions can't race and gen6_powersave_work()
> > > > > > is
> > > > > > scheduled only after intel_init_gt_powersave() and flushed
> > > > > > before intel_cleanup_gt_powersave().
> > > > > > 
> > > > > > rps_lock protects the RPS HW accesses themselves and
> > > > > > struct_mutex
> > > > > > was
> > > > > > taken for the GEM allocation. Taking it at high level around
> > > > > > intel_init_gt_powersave() was kind of a copy&paste in the
> > > > > > commit
> > > > > > you
> > > > > > found, there is more on that in 79f5b2c75992.
> > > > > 
> > > > > Thanks for digging this out.
> > > > > 
> > > > > It is more involved than what Chris pasted then, and while I
> > > > > hoped
> > > > > to be able to quickly shove that into a patch, I cannot allow
> > > > > the
> > > > > time for more the extensive analysis.
> > > > 
> > > > Imre just confirmed that struct_mutex is only for the GEM
> > > > allocations in the vlv_setup_ctx, right Imre?
> > > 
> > > Yes, that's my understanding.
> > 
> > I glanced at 79f5b2c75992 and saw it pulled out a bunch of locking
> > from 
> > lower down to the top level so it sounded like doing the reverse
> > would 
> > not be straightforward. Perhaps I overestimated it.
> 
> Ok, some more background info :)
> 
> Initially the HW access was also protected by struct_mutex but
> 4fc688ce7 added rps_lock for just the the HW access. So as I understand
> we don't need struct_mutex for anything else besides the GEM
> allocation. So feel free to add my ACK on the change moving the lock to
> vlv_setup_ctx().
> 
> Btw, we could also remove struct_mutex from intel_enable_gt_powersave()
> where it's taken for old platforms and rely on mchdev_lock or take
> rps_lock instead, but someone needs to double-check this.

For more context we needed dev->struct_mutex for ilk rc6 support, which
needed a powerctx allocated from gem (not just stolen). That support died
in fire in

commit a561165493e5fec2f74bd3ae0577ed659e44ab7f
Author: John Harrison <John.C.Harrison@Intel.com>
Date:   Thu Mar 5 14:03:03 2015 +0000

    drm/i915: Remove ironlake rc6 support

Maybe reference that too, on top of the commit Imre dug out?

Thanks, Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

* [PATCH v3] drm/i915: Do not call API requiring struct_mutex where it is not available
  2016-01-12 15:47     ` Dave Gordon
@ 2016-01-13 16:16       ` Tvrtko Ursulin
  2016-01-13 19:34         ` Chris Wilson
  0 siblings, 1 reply; 48+ messages in thread
From: Tvrtko Ursulin @ 2016-01-13 16:16 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Daniel Vetter

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
places where the struct_mutex cannot be grabbed (irq handlers).

To avoid that this patch caches some interesting bits and values
in the engine and context structures.

Some usages are also removed where they are not needed like a
few asserts which are either impossible or have been checked
already during engine initialization.

Side benefit is also that interrupt handlers and command
submission stop evaluating invariant conditionals, like what
Gen we are running on, on every interrupt and every command
submitted.

This patch deals with logical ring context id and descriptors
while subsequent patches will deal with the remaining issues.

v2:
 * Cache the VMA instead of the address. (Chris Wilson)
 * Incorporate Dave Gordon's good comments and function name.

v3:
 * Extract ctx descriptor template to a function and group
   functions dealing with ctx descriptor & co together near
   top of the file. (Dave Gordon)

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
Cc: Dave Gordon <david.s.gordon@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c     |  15 ++--
 drivers/gpu/drm/i915/i915_drv.h         |   2 +
 drivers/gpu/drm/i915/i915_gem_gtt.h     |   1 -
 drivers/gpu/drm/i915/intel_lrc.c        | 151 +++++++++++++++++++-------------
 drivers/gpu/drm/i915/intel_lrc.h        |   4 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |   2 +
 6 files changed, 103 insertions(+), 72 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index e3377abc0d4d..0b3550f05026 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1994,12 +1994,13 @@ static int i915_context_status(struct seq_file *m, void *unused)
 }
 
 static void i915_dump_lrc_obj(struct seq_file *m,
-			      struct intel_engine_cs *ring,
-			      struct drm_i915_gem_object *ctx_obj)
+			      struct intel_context *ctx,
+			      struct intel_engine_cs *ring)
 {
 	struct page *page;
 	uint32_t *reg_state;
 	int j;
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
 	unsigned long ggtt_offset = 0;
 
 	if (ctx_obj == NULL) {
@@ -2009,7 +2010,7 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 	}
 
 	seq_printf(m, "CONTEXT: %s %u\n", ring->name,
-		   intel_execlists_ctx_id(ctx_obj));
+		   intel_execlists_ctx_id(ctx, ring));
 
 	if (!i915_gem_obj_ggtt_bound(ctx_obj))
 		seq_puts(m, "\tNot bound in GGTT\n");
@@ -2058,8 +2059,7 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 	list_for_each_entry(ctx, &dev_priv->context_list, link) {
 		for_each_ring(ring, dev_priv, i) {
 			if (ring->default_context != ctx)
-				i915_dump_lrc_obj(m, ring,
-						  ctx->engine[i].state);
+				i915_dump_lrc_obj(m, ctx, ring);
 		}
 	}
 
@@ -2133,11 +2133,8 @@ static int i915_execlists(struct seq_file *m, void *data)
 
 		seq_printf(m, "\t%d requests in queue\n", count);
 		if (head_req) {
-			struct drm_i915_gem_object *ctx_obj;
-
-			ctx_obj = head_req->ctx->engine[ring_id].state;
 			seq_printf(m, "\tHead request id: %u\n",
-				   intel_execlists_ctx_id(ctx_obj));
+				   intel_execlists_ctx_id(head_req->ctx, ring));
 			seq_printf(m, "\tHead request tail: %u\n",
 				   head_req->tail);
 		}
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 104bd1809936..c900db272b96 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -883,6 +883,8 @@ struct intel_context {
 		struct drm_i915_gem_object *state;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
+		struct i915_vma *lrc_vma;
+		u64 lrc_desc;
 	} engine[I915_NUM_RINGS];
 
 	struct list_head link;
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.h b/drivers/gpu/drm/i915/i915_gem_gtt.h
index b448ad832dcf..e5737963ab79 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.h
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.h
@@ -44,7 +44,6 @@ typedef uint64_t gen8_ppgtt_pml4e_t;
 
 #define gtt_total_entries(gtt) ((gtt).base.total >> PAGE_SHIFT)
 
-
 /* gen6-hsw has bit 11-4 for physical addr bit 39-32 */
 #define GEN6_GTT_ADDR_ENCODE(addr)	((addr) | (((addr) >> 28) & 0xff0))
 #define GEN6_PTE_ADDR_ENCODE(addr)	GEN6_GTT_ADDR_ENCODE(addr)
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5027699c5291..1a0d85325072 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -263,65 +263,92 @@ int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists
 	return 0;
 }
 
+static void
+logical_ring_init_platform_invariants(struct intel_engine_cs *ring)
+{
+	struct drm_device *dev = ring->dev;
+
+	ring->disable_lite_restore_wa = (IS_SKL_REVID(dev, 0, SKL_REVID_B0) ||
+					IS_BXT_REVID(dev, 0, BXT_REVID_A1)) &&
+					(ring->id == VCS || ring->id == VCS2);
+
+	ring->ctx_desc_template = GEN8_CTX_VALID;
+	ring->ctx_desc_template |= GEN8_CTX_ADDRESSING_MODE(dev) <<
+				   GEN8_CTX_ADDRESSING_MODE_SHIFT;
+	if (IS_GEN8(dev))
+		ring->ctx_desc_template |= GEN8_CTX_L3LLC_COHERENT;
+	ring->ctx_desc_template |= GEN8_CTX_PRIVILEGE;
+
+	/* TODO: WaDisableLiteRestore when we start using semaphore
+	 * signalling between Command Streamers */
+	/* ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE; */
+
+	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
+	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
+	if (ring->disable_lite_restore_wa)
+		ring->ctx_desc_template |= GEN8_CTX_FORCE_RESTORE;
+}
+
 /**
- * intel_execlists_ctx_id() - get the Execlists Context ID
- * @ctx_obj: Logical Ring Context backing object.
+ * intel_lr_context_descriptor_update() - calculate & cache the descriptor
+ * 					  descriptor for a pinned context
  *
- * Do not confuse with ctx->id! Unfortunately we have a name overload
- * here: the old context ID we pass to userspace as a handler so that
- * they can refer to a context, and the new context ID we pass to the
- * ELSP so that the GPU can inform us of the context status via
- * interrupts.
+ * @ctx: Context to work on
+ * @ring: Engine the descriptor will be used with
  *
- * Return: 20-bits globally unique context ID.
+ * The context descriptor encodes various attributes of a context,
+ * including its GTT address and some flags. Because it's fairly
+ * expensive to calculate, we'll just do it once and cache the result,
+ * which remains valid until the context is unpinned.
+ *
+ * This is what a descriptor looks like, from LSB to MSB:
+ *    bits 0-11:    flags, GEN8_CTX_* (cached in ctx_desc_template)
+ *    bits 12-31:    LRCA, GTT address of (the HWSP of) this context
+ *    bits 32-51:    ctx ID, a globally unique tag (the LRCA again!)
+ *    bits 52-63:    reserved, may encode the engine ID (for GuC)
  */
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj)
+static void
+intel_lr_context_descriptor_update(struct intel_context *ctx,
+				   struct intel_engine_cs *ring)
 {
-	u32 lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
+	uint64_t lrca, desc;
 
-	/* LRCA is required to be 4K aligned so the more significant 20 bits
-	 * are globally unique */
-	return lrca >> 12;
-}
+	lrca = ctx->engine[ring->id].lrc_vma->node.start +
+	       LRC_PPHWSP_PN * PAGE_SIZE;
 
-static bool disable_lite_restore_wa(struct intel_engine_cs *ring)
-{
-	struct drm_device *dev = ring->dev;
+	desc = ring->ctx_desc_template;			   /* bits  0-11 */
+	desc |= lrca;					   /* bits 12-31 */
+	desc |= (lrca >> PAGE_SHIFT) << GEN8_CTX_ID_SHIFT; /* bits 32-51 */
 
-	return (IS_SKL_REVID(dev, 0, SKL_REVID_B0) ||
-		IS_BXT_REVID(dev, 0, BXT_REVID_A1)) &&
-	       (ring->id == VCS || ring->id == VCS2);
+	ctx->engine[ring->id].lrc_desc = desc;
 }
 
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 				     struct intel_engine_cs *ring)
 {
-	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
-	uint64_t desc;
-	uint64_t lrca = i915_gem_obj_ggtt_offset(ctx_obj) +
-			LRC_PPHWSP_PN * PAGE_SIZE;
-
-	WARN_ON(lrca & 0xFFFFFFFF00000FFFULL);
-
-	desc = GEN8_CTX_VALID;
-	desc |= GEN8_CTX_ADDRESSING_MODE(dev) << GEN8_CTX_ADDRESSING_MODE_SHIFT;
-	if (IS_GEN8(ctx_obj->base.dev))
-		desc |= GEN8_CTX_L3LLC_COHERENT;
-	desc |= GEN8_CTX_PRIVILEGE;
-	desc |= lrca;
-	desc |= (u64)intel_execlists_ctx_id(ctx_obj) << GEN8_CTX_ID_SHIFT;
-
-	/* TODO: WaDisableLiteRestore when we start using semaphore
-	 * signalling between Command Streamers */
-	/* desc |= GEN8_CTX_FORCE_RESTORE; */
-
-	/* WaEnableForceRestoreInCtxtDescForVCS:skl */
-	/* WaEnableForceRestoreInCtxtDescForVCS:bxt */
-	if (disable_lite_restore_wa(ring))
-		desc |= GEN8_CTX_FORCE_RESTORE;
+	return ctx->engine[ring->id].lrc_desc;
+}
 
-	return desc;
+/**
+ * intel_execlists_ctx_id() - get the Execlists Context ID
+ * @ctx: Context to get the ID for
+ * @ring: Engine to get the ID for
+ *
+ * Do not confuse with ctx->id! Unfortunately we have a name overload
+ * here: the old context ID we pass to userspace as a handler so that
+ * they can refer to a context, and the new context ID we pass to the
+ * ELSP so that the GPU can inform us of the context status via
+ * interrupts.
+ *
+ * The context ID is a portion of the context descriptor, so we can
+ * just extract the required part from the cached descriptor.
+ *
+ * Return: 20-bits globally unique context ID.
+ */
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+			   struct intel_engine_cs *ring)
+{
+	return intel_lr_context_descriptor(ctx, ring) >> GEN8_CTX_ID_SHIFT;
 }
 
 static void execlists_elsp_write(struct drm_i915_gem_request *rq0,
@@ -369,8 +396,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	uint32_t *reg_state;
 
 	BUG_ON(!ctx_obj);
-	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
-	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
 	page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
 	reg_state = kmap_atomic(page);
@@ -477,9 +502,7 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 					    execlist_link);
 
 	if (head_req != NULL) {
-		struct drm_i915_gem_object *ctx_obj =
-				head_req->ctx->engine[ring->id].state;
-		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
+		if (intel_execlists_ctx_id(head_req->ctx, ring) == request_id) {
 			WARN(head_req->elsp_submitted == 0,
 			     "Never submitted head request\n");
 
@@ -556,7 +579,7 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 		}
 	}
 
-	if (disable_lite_restore_wa(ring)) {
+	if (ring->disable_lite_restore_wa) {
 		/* Prevent a ctx to preempt itself */
 		if ((status & GEN8_CTX_STATUS_ACTIVE_IDLE) &&
 		    (submit_contexts != 0))
@@ -1039,14 +1062,16 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
 }
 
 static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
-		struct drm_i915_gem_object *ctx_obj,
-		struct intel_ringbuffer *ringbuf)
+				   struct intel_context *ctx)
 {
 	struct drm_device *dev = ring->dev;
 	struct drm_i915_private *dev_priv = dev->dev_private;
-	int ret = 0;
+	struct drm_i915_gem_object *ctx_obj = ctx->engine[ring->id].state;
+	struct intel_ringbuffer *ringbuf = ctx->engine[ring->id].ringbuf;
+	int ret;
 
 	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
+
 	ret = i915_gem_obj_ggtt_pin(ctx_obj, GEN8_LR_CONTEXT_ALIGN,
 			PIN_OFFSET_BIAS | GUC_WOPCM_TOP);
 	if (ret)
@@ -1056,6 +1081,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		goto unpin_ctx_obj;
 
+	ctx->engine[ring->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
+	intel_lr_context_descriptor_update(ctx, ring);
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1074,11 +1101,9 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 {
 	int ret = 0;
 	struct intel_engine_cs *ring = rq->ring;
-	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
-	struct intel_ringbuffer *ringbuf = rq->ringbuf;
 
 	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
-		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
+		ret = intel_lr_context_do_pin(ring, rq->ctx);
 		if (ret)
 			goto reset_pin_count;
 	}
@@ -1100,6 +1125,8 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
 			intel_unpin_ringbuffer_obj(ringbuf);
 			i915_gem_object_ggtt_unpin(ctx_obj);
+			rq->ctx->engine[ring->id].lrc_vma = NULL;
+			rq->ctx->engine[ring->id].lrc_desc = 0;
 		}
 	}
 }
@@ -1938,6 +1965,9 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 		ring->status_page.obj = NULL;
 	}
 
+	ring->disable_lite_restore_wa = false;
+	ring->ctx_desc_template = 0;
+
 	lrc_destroy_wa_ctx_obj(ring);
 	ring->dev = NULL;
 }
@@ -1988,6 +2018,8 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
 	spin_lock_init(&ring->execlist_lock);
 
+	logical_ring_init_platform_invariants(ring);
+
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		goto error;
@@ -1997,10 +2029,7 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 		goto error;
 
 	/* As this is the default context, always pin it */
-	ret = intel_lr_context_do_pin(
-			ring,
-			ring->default_context->engine[ring->id].state,
-			ring->default_context->engine[ring->id].ringbuf);
+	ret = intel_lr_context_do_pin(ring, ring->default_context);
 	if (ret) {
 		DRM_ERROR(
 			"Failed to pin and map ringbuffer %s: %d\n",
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index de41ad6cd63d..49af638f6213 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -107,13 +107,15 @@ void intel_lr_context_reset(struct drm_device *dev,
 uint64_t intel_lr_context_descriptor(struct intel_context *ctx,
 				     struct intel_engine_cs *ring);
 
+u32 intel_execlists_ctx_id(struct intel_context *ctx,
+			   struct intel_engine_cs *ring);
+
 /* Execlists */
 int intel_sanitize_enable_execlists(struct drm_device *dev, int enable_execlists);
 struct i915_execbuffer_params;
 int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas);
-u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
 void intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 7349d9258191..85ce2272f92c 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -269,6 +269,8 @@ struct  intel_engine_cs {
 	struct list_head execlist_queue;
 	struct list_head execlist_retired_req_list;
 	u8 next_context_status_buffer;
+	bool disable_lite_restore_wa;
+	u32 ctx_desc_template;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct drm_i915_gem_request *request);
 	int		(*emit_flush)(struct drm_i915_gem_request *request,
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 48+ messages in thread

* Re: [PATCH v3] drm/i915: Do not call API requiring struct_mutex where it is not available
  2016-01-13 16:16       ` [PATCH v3] " Tvrtko Ursulin
@ 2016-01-13 19:34         ` Chris Wilson
  0 siblings, 0 replies; 48+ messages in thread
From: Chris Wilson @ 2016-01-13 19:34 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Daniel Vetter, Intel-gfx

On Wed, Jan 13, 2016 at 04:16:21PM +0000, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> LRC code was calling GEM API like i915_gem_obj_ggtt_offset from
> places where the struct_mutex cannot be grabbed (irq handlers).
> 
> To avoid that this patch caches some interesting bits and values
> in the engine and context structures.
> 
> Some usages are also removed where they are not needed like a
> few asserts which are either impossible or have been checked
> already during engine initialization.
> 
> Side benefit is also that interrupt handlers and command
> submission stop evaluating invariant conditionals, like what
> Gen we are running on, on every interrupt and every command
> submitted.
> 
> This patch deals with logical ring context id and descriptors
> while subsequent patches will deal with the remaining issues.
> 
> v2:
>  * Cache the VMA instead of the address. (Chris Wilson)
>  * Incorporate Dave Gordon's good comments and function name.
> 
> v3:
>  * Extract ctx descriptor template to a function and group
>    functions dealing with ctx descriptor & co together near
>    top of the file. (Dave Gordon)
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> Cc: Daniel Vetter <daniel.vetter@ffwll.ch>
> Cc: Dave Gordon <david.s.gordon@intel.com>

Considering the current state of affairs, this is not bad. I have a
different shade of slightly-off-white that I like but that applies to a
world where execlists is more request focused, e.g.:

static u32 execlists_request_write_tail(struct drm_i915_gem_request *req)
{
        struct intel_ring *ring = req->ring;
        struct i915_hw_ppgtt *ppgtt = req->ctx->ppgtt;

        if (ppgtt && !USES_FULL_48BIT_PPGTT(req->i915)) {
                /* True 32b PPGTT with dynamic page allocation: update PDP
                 * registers and point the unallocated PDPs to scratch page.
                 * PML4 is allocated during ppgtt init, so this is not needed
                 * in 48-bit mode.
                 */
                if (ppgtt->pd_dirty_rings & intel_engine_flag(req->engine)) {
                        ASSIGN_CTX_PDP(ppgtt, ring->registers, 3);
                        ASSIGN_CTX_PDP(ppgtt, ring->registers, 2);
                        ASSIGN_CTX_PDP(ppgtt, ring->registers, 1);
                        ASSIGN_CTX_PDP(ppgtt, ring->registers, 0);
                        ppgtt->pd_dirty_rings &= ~intel_engine_flag(req->engine);
                }
        }

        ring->registers[CTX_RING_TAIL+1] = req->tail;
        return ring->context_descriptor;
}

where the req->ring is much more readily available than
	struct intel_context_engine *ce = &req->ctx->engine[req->engine->id];
though a req->context_engine would shut me up entirely. 


> +/**
> + * intel_execlists_ctx_id() - get the Execlists Context ID
> + * @ctx: Context to get the ID for
> + * @ring: Engine to get the ID for
> + *
> + * Do not confuse with ctx->id! Unfortunately we have a name overload
> + * here: the old context ID we pass to userspace as a handler so that
> + * they can refer to a context, and the new context ID we pass to the
> + * ELSP so that the GPU can inform us of the context status via
> + * interrupts.
> + *
> + * The context ID is a portion of the context descriptor, so we can
> + * just extract the required part from the cached descriptor.
> + *
> + * Return: 20-bits globally unique context ID.
> + */
> +u32 intel_execlists_ctx_id(struct intel_context *ctx,
> +			   struct intel_engine_cs *ring)

I would suggest intel_execlists_status_tag (or at least offer it as a
starting point for a non-conflicting name).

> +{
> +	return intel_lr_context_descriptor(ctx, ring) >> GEN8_CTX_ID_SHIFT;
>  }

Next up on the hit list is struct intel_context_engine!

> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 7349d9258191..85ce2272f92c 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -269,6 +269,8 @@ struct  intel_engine_cs {
>  	struct list_head execlist_queue;
>  	struct list_head execlist_retired_req_list;
>  	u8 next_context_status_buffer;
> +	bool disable_lite_restore_wa;

The holes, the holes!

> +	u32 ctx_desc_template;

I would suggest this is better named something like
execlist_context_base_descriptor since we already have been using the
execlist_ prefix to distinguish lrc specific state, and being verbose
doesn't really hurt when we only use it twice.

Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 48+ messages in thread

end of thread, other threads:[~2016-01-13 19:34 UTC | newest]

Thread overview: 48+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-01-11 14:08 [PATCH v3 0/7] Misc cleanups and locking fixes Tvrtko Ursulin
2016-01-11 14:08 ` [PATCH 1/7] drm/i915/bdw+: Replace list_del+list_add_tail with list_move_tail Tvrtko Ursulin
2016-01-11 14:08 ` [PATCH 2/7] drm/i915: Do not call API requiring struct_mutex where it is not available Tvrtko Ursulin
2016-01-11 14:35   ` Chris Wilson
2016-01-12 11:41   ` [PATCH v2 " Tvrtko Ursulin
2016-01-12 15:47     ` Dave Gordon
2016-01-13 16:16       ` [PATCH v3] " Tvrtko Ursulin
2016-01-13 19:34         ` Chris Wilson
2016-01-11 14:08 ` [PATCH 3/7] drm/i915: Cache ringbuffer GTT address Tvrtko Ursulin
2016-01-11 14:30   ` Chris Wilson
2016-01-11 14:31   ` Chris Wilson
2016-01-11 15:06     ` Tvrtko Ursulin
2016-01-11 14:08 ` [PATCH 4/7] drm/i915: Cache LRC state page in the context Tvrtko Ursulin
2016-01-11 14:29   ` Chris Wilson
2016-01-11 15:07     ` Tvrtko Ursulin
2016-01-12 11:43     ` [PATCH v2 " Tvrtko Ursulin
2016-01-12 11:56     ` [PATCH v3 " Tvrtko Ursulin
2016-01-12 12:12       ` Chris Wilson
2016-01-12 12:54         ` Tvrtko Ursulin
2016-01-12 13:11           ` Chris Wilson
2016-01-12 16:45             ` Dave Gordon
2016-01-13  1:37               ` Chris Wilson
2016-01-13 13:49               ` Tvrtko Ursulin
2016-01-11 14:08 ` [PATCH 5/7] drm/i915: Don't need a timer to wake us up Tvrtko Ursulin
2016-01-11 14:33   ` Chris Wilson
2016-01-12 10:20     ` Tvrtko Ursulin
2016-01-11 14:08 ` [PATCH 6/7] drm/i915: Only grab timestamps when needed Tvrtko Ursulin
2016-01-11 14:36   ` Chris Wilson
2016-01-11 15:04     ` Tvrtko Ursulin
2016-01-12 15:52       ` Dave Gordon
2016-01-12 17:14         ` Daniel Vetter
2016-01-13 13:54           ` [PATCH v2] " Tvrtko Ursulin
2016-01-11 14:08 ` [PATCH 7/7] drm/i915: GEM operations need to be done under the big lock Tvrtko Ursulin
2016-01-11 14:38   ` Chris Wilson
2016-01-11 14:47     ` Tvrtko Ursulin
2016-01-11 15:00       ` Chris Wilson
2016-01-11 15:04         ` Chris Wilson
2016-01-11 15:16           ` Tvrtko Ursulin
2016-01-11 15:36             ` Ville Syrjälä
2016-01-11 16:56               ` Chris Wilson
2016-01-13 12:46                 ` Tvrtko Ursulin
2016-01-13 13:36                   ` Imre Deak
2016-01-13 14:11                     ` Tvrtko Ursulin
2016-01-13 14:32                       ` Chris Wilson
2016-01-13 14:41                         ` Imre Deak
2016-01-13 14:53                           ` Tvrtko Ursulin
2016-01-13 15:25                             ` Imre Deak
2016-01-13 15:55                               ` Daniel Vetter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.