All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/4] lrc lifecycle cleanups
@ 2015-10-06 14:52 Nick Hoath
  2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
                   ` (3 more replies)
  0 siblings, 4 replies; 24+ messages in thread
From: Nick Hoath @ 2015-10-06 14:52 UTC (permalink / raw)
  To: intel-gfx

These changes are a result of the requests made in VIZ-4277.
Make the lrc path more like the legacy submission path.
Attach the CPU mappings to vma (un)bind, so that the shrinker
also cleans those up.
Pin the CPU mappings while context is busy (pending bbs), so
that the mappings aren't released/made continuously as this is
an expensive process.

Nick Hoath (4):
  drm/i915: Unify execlist and legacy request life-cycles
  drm/i915: Improve dynamic management/eviction of lrc backing objects
  drm/i915: Add the CPU mapping of the hw context to the pinned items.
  drm/i915: Only update ringbuf address when necessary

 drivers/gpu/drm/i915/i915_debugfs.c     |  14 ++--
 drivers/gpu/drm/i915/i915_drv.h         |  14 +++-
 drivers/gpu/drm/i915/i915_gem.c         |  70 +++++++++++++----
 drivers/gpu/drm/i915/i915_gem_gtt.c     |   8 ++
 drivers/gpu/drm/i915/i915_irq.c         |  81 +++++++++++++-------
 drivers/gpu/drm/i915/intel_lrc.c        | 131 ++++++++++++++++++--------------
 drivers/gpu/drm/i915/intel_lrc.h        |   2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.c |  71 +++++++++--------
 drivers/gpu/drm/i915/intel_ringbuffer.h |   4 -
 9 files changed, 250 insertions(+), 145 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
@ 2015-10-06 14:52 ` Nick Hoath
  2015-10-07 16:03   ` Daniel Vetter
  2015-10-08 12:32   ` Chris Wilson
  2015-10-06 14:52 ` [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects Nick Hoath
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 24+ messages in thread
From: Nick Hoath @ 2015-10-06 14:52 UTC (permalink / raw)
  To: intel-gfx

There is a desire to simplify the i915 driver by reducing the number of
different code paths introduced by the LRC / execlists support.  As the
execlists request is now part of the gem request it is possible and
desirable to unify the request life-cycles for execlist and legacy
requests.

Added a context complete flag to a request which gets set during the
context switch interrupt.

Added a function i915_gem_request_retireable().  A request is considered
retireable if its seqno passed (i.e. the request has completed) and either
it was never submitted to the ELSP or its context completed.  This ensures
that context save is carried out before the last request for a context is
considered retireable.  retire_requests_ring() now uses
i915_gem_request_retireable() rather than request_complete() when deciding
which requests to retire. Requests that were not waiting for a context
switch interrupt (either as a result of being merged into a following
request or by being a legacy request) will be considered retireable as
soon as their seqno has passed.

Removed the extra request reference held for the execlist request.

Removed intel_execlists_retire_requests() and all references to
intel_engine_cs.execlist_retired_req_list.

Moved context unpinning into retire_requests_ring() for now.  Further work
is pending for the context pinning - this patch should allow us to use the
active list to track context and ring buffer objects later.

Changed gen8_cs_irq_handler() so that notify_ring() is called when
contexts complete as well as when a user interrupt occurs so that
notification happens when a request is complete and context save has
finished.

v2: Rebase over the read-read optimisation changes

v3: Reworked IRQ handler after removing IRQ handler cleanup patch

v4: Fixed various pin leaks

Issue: VIZ-4277
Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  6 +++
 drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
 drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
 drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
 drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 6 files changed, 118 insertions(+), 82 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fbf0ae9..3d217f9 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
 	/** Execlists no. of times this request has been sent to the ELSP */
 	int elsp_submitted;
 
+	/**
+	 * Execlists: whether this requests's context has completed after
+	 * submission to the ELSP
+	 */
+	bool ctx_complete;
+
 };
 
 int i915_gem_request_alloc(struct intel_engine_cs *ring,
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 52642af..fc82171 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
 				       typeof(*tmp), list);
 
 		i915_gem_request_retire(tmp);
+
+		if (i915.enable_execlists) {
+			struct intel_context *ctx = tmp->ctx;
+			struct drm_i915_private *dev_priv =
+				engine->dev->dev_private;
+			unsigned long flags;
+			struct drm_i915_gem_object *ctx_obj =
+				ctx->engine[engine->id].state;
+
+			spin_lock_irqsave(&engine->execlist_lock, flags);
+
+			if (ctx_obj && (ctx != engine->default_context))
+				intel_lr_context_unpin(tmp);
+
+			intel_runtime_pm_put(dev_priv);
+			spin_unlock_irqrestore(&engine->execlist_lock, flags);
+		}
+
 	} while (tmp != req);
 
 	WARN_ON(i915_verify_lists(engine->dev));
@@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
 	list_move_tail(&vma->mm_list, &vma->vm->active_list);
 }
 
+static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
+{
+	return (i915_gem_request_completed(req, true) &&
+		(!req->elsp_submitted || req->ctx_complete));
+}
+
 static void
 i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
 {
@@ -2829,10 +2853,28 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 					   struct drm_i915_gem_request,
 					   list);
 
-		if (!i915_gem_request_completed(request, true))
+		if (!i915_gem_request_retireable(request))
 			break;
 
 		i915_gem_request_retire(request);
+
+		if (i915.enable_execlists) {
+			struct intel_context *ctx = request->ctx;
+			struct drm_i915_private *dev_priv =
+				ring->dev->dev_private;
+			unsigned long flags;
+			struct drm_i915_gem_object *ctx_obj =
+				ctx->engine[ring->id].state;
+
+			spin_lock_irqsave(&ring->execlist_lock, flags);
+
+			if (ctx_obj && (ctx != ring->default_context))
+				intel_lr_context_unpin(request);
+
+			intel_runtime_pm_put(dev_priv);
+			spin_unlock_irqrestore(&ring->execlist_lock, flags);
+		}
+
 	}
 
 	/* Move any buffers on the active list that are no longer referenced
@@ -2848,12 +2890,14 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
 
 		if (!list_empty(&obj->last_read_req[ring->id]->list))
 			break;
+		if (!i915_gem_request_retireable(obj->last_read_req[ring->id]))
+			break;
 
 		i915_gem_object_retire__read(obj, ring->id);
 	}
 
 	if (unlikely(ring->trace_irq_req &&
-		     i915_gem_request_completed(ring->trace_irq_req, true))) {
+		     i915_gem_request_retireable(ring->trace_irq_req))) {
 		ring->irq_put(ring);
 		i915_gem_request_assign(&ring->trace_irq_req, NULL);
 	}
@@ -2872,15 +2916,6 @@ i915_gem_retire_requests(struct drm_device *dev)
 	for_each_ring(ring, dev_priv, i) {
 		i915_gem_retire_requests_ring(ring);
 		idle &= list_empty(&ring->request_list);
-		if (i915.enable_execlists) {
-			unsigned long flags;
-
-			spin_lock_irqsave(&ring->execlist_lock, flags);
-			idle &= list_empty(&ring->execlist_queue);
-			spin_unlock_irqrestore(&ring->execlist_lock, flags);
-
-			intel_execlists_retire_requests(ring);
-		}
 	}
 
 	if (idle)
@@ -2956,12 +2991,14 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
 		if (req == NULL)
 			continue;
 
-		if (list_empty(&req->list))
-			goto retire;
+		if (list_empty(&req->list)) {
+			if (i915_gem_request_retireable(req))
+				i915_gem_object_retire__read(obj, i);
+			continue;
+		}
 
-		if (i915_gem_request_completed(req, true)) {
+		if (i915_gem_request_retireable(req)) {
 			__i915_gem_request_retire__upto(req);
-retire:
 			i915_gem_object_retire__read(obj, i);
 		}
 	}
diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
index 8ca772d..ebbafac 100644
--- a/drivers/gpu/drm/i915/i915_irq.c
+++ b/drivers/gpu/drm/i915/i915_irq.c
@@ -1291,66 +1291,91 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
 				       u32 master_ctl)
 {
 	irqreturn_t ret = IRQ_NONE;
+	bool need_notify;
 
 	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
-		u32 tmp = I915_READ_FW(GEN8_GT_IIR(0));
-		if (tmp) {
-			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
+		u32 iir = I915_READ_FW(GEN8_GT_IIR(0));
+
+		if (iir) {
+			I915_WRITE_FW(GEN8_GT_IIR(0), iir);
 			ret = IRQ_HANDLED;
 
-			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
-				intel_lrc_irq_handler(&dev_priv->ring[RCS]);
-			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
+			need_notify = false;
+			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
+					GEN8_RCS_IRQ_SHIFT))
+				need_notify = intel_lrc_irq_handler(
+						&dev_priv->ring[RCS]);
+			if (iir & (GT_RENDER_USER_INTERRUPT <<
+					GEN8_RCS_IRQ_SHIFT) || need_notify)
 				notify_ring(&dev_priv->ring[RCS]);
 
-			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
-				intel_lrc_irq_handler(&dev_priv->ring[BCS]);
-			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
+			need_notify = false;
+			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
+					GEN8_BCS_IRQ_SHIFT))
+				need_notify = intel_lrc_irq_handler(
+						&dev_priv->ring[BCS]);
+			if (iir & (GT_RENDER_USER_INTERRUPT <<
+					GEN8_BCS_IRQ_SHIFT) || need_notify)
 				notify_ring(&dev_priv->ring[BCS]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT0)!\n");
 	}
 
 	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
-		u32 tmp = I915_READ_FW(GEN8_GT_IIR(1));
-		if (tmp) {
-			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
+		u32 iir = I915_READ_FW(GEN8_GT_IIR(1));
+
+		if (iir) {
+			I915_WRITE_FW(GEN8_GT_IIR(1), iir);
 			ret = IRQ_HANDLED;
 
-			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
-				intel_lrc_irq_handler(&dev_priv->ring[VCS]);
-			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
+			need_notify = false;
+			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
+					GEN8_VCS1_IRQ_SHIFT))
+				need_notify = intel_lrc_irq_handler(
+						&dev_priv->ring[VCS]);
+			if (iir & (GT_RENDER_USER_INTERRUPT <<
+					GEN8_VCS1_IRQ_SHIFT) || need_notify)
 				notify_ring(&dev_priv->ring[VCS]);
 
-			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
-				intel_lrc_irq_handler(&dev_priv->ring[VCS2]);
-			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
+			need_notify = false;
+			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
+					GEN8_VCS2_IRQ_SHIFT))
+				need_notify = intel_lrc_irq_handler(
+						&dev_priv->ring[VCS2]);
+			if (iir & (GT_RENDER_USER_INTERRUPT <<
+					GEN8_VCS2_IRQ_SHIFT) || need_notify)
 				notify_ring(&dev_priv->ring[VCS2]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT1)!\n");
 	}
 
 	if (master_ctl & GEN8_GT_VECS_IRQ) {
-		u32 tmp = I915_READ_FW(GEN8_GT_IIR(3));
-		if (tmp) {
-			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
+		u32 iir = I915_READ_FW(GEN8_GT_IIR(3));
+
+		if (iir) {
+			I915_WRITE_FW(GEN8_GT_IIR(3), iir);
 			ret = IRQ_HANDLED;
 
-			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
-				intel_lrc_irq_handler(&dev_priv->ring[VECS]);
-			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
+			need_notify = false;
+			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
+					GEN8_VECS_IRQ_SHIFT))
+				need_notify = intel_lrc_irq_handler(
+						&dev_priv->ring[VECS]);
+			if (iir & (GT_RENDER_USER_INTERRUPT <<
+					GEN8_VECS_IRQ_SHIFT) || need_notify)
 				notify_ring(&dev_priv->ring[VECS]);
 		} else
 			DRM_ERROR("The master control interrupt lied (GT3)!\n");
 	}
 
 	if (master_ctl & GEN8_GT_PM_IRQ) {
-		u32 tmp = I915_READ_FW(GEN8_GT_IIR(2));
-		if (tmp & dev_priv->pm_rps_events) {
+		u32 iir = I915_READ_FW(GEN8_GT_IIR(2));
+
+		if (iir & dev_priv->pm_rps_events) {
 			I915_WRITE_FW(GEN8_GT_IIR(2),
-				      tmp & dev_priv->pm_rps_events);
+				      iir & dev_priv->pm_rps_events);
 			ret = IRQ_HANDLED;
-			gen6_rps_irq_handler(dev_priv, tmp);
+			gen6_rps_irq_handler(dev_priv, iir);
 		} else
 			DRM_ERROR("The master control interrupt lied (PM)!\n");
 	}
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 825fa7a..e8f5b6c 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -426,9 +426,8 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			cursor->elsp_submitted = req0->elsp_submitted;
+			req0->elsp_submitted = 0;
 			list_del(&req0->execlist_link);
-			list_add_tail(&req0->execlist_link,
-				&ring->execlist_retired_req_list);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -478,11 +477,9 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
 		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
 			WARN(head_req->elsp_submitted == 0,
 			     "Never submitted head request\n");
-
 			if (--head_req->elsp_submitted <= 0) {
+				head_req->ctx_complete = 1;
 				list_del(&head_req->execlist_link);
-				list_add_tail(&head_req->execlist_link,
-					&ring->execlist_retired_req_list);
 				return true;
 			}
 		}
@@ -497,8 +494,9 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
  *
  * Check the unread Context Status Buffers and manage the submission of new
  * contexts to the ELSP accordingly.
+ * @return whether a context completed
  */
-void intel_lrc_irq_handler(struct intel_engine_cs *ring)
+bool intel_lrc_irq_handler(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
 	u32 status_pointer;
@@ -558,6 +556,8 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
 		   _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
 				 ((u32)ring->next_context_status_buffer &
 				  GEN8_CSB_PTR_MASK) << 8));
+
+	return (submit_contexts != 0);
 }
 
 static int execlists_context_queue(struct drm_i915_gem_request *request)
@@ -569,8 +569,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 	if (request->ctx != ring->default_context)
 		intel_lr_context_pin(request);
 
-	i915_gem_request_reference(request);
-
 	spin_lock_irq(&ring->execlist_lock);
 
 	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
@@ -588,8 +586,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
 			WARN(tail_req->elsp_submitted != 0,
 				"More than 2 already-submitted reqs queued\n");
 			list_del(&tail_req->execlist_link);
-			list_add_tail(&tail_req->execlist_link,
-				&ring->execlist_retired_req_list);
 		}
 	}
 
@@ -958,32 +954,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	return 0;
 }
 
-void intel_execlists_retire_requests(struct intel_engine_cs *ring)
-{
-	struct drm_i915_gem_request *req, *tmp;
-	struct list_head retired_list;
-
-	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
-	if (list_empty(&ring->execlist_retired_req_list))
-		return;
-
-	INIT_LIST_HEAD(&retired_list);
-	spin_lock_irq(&ring->execlist_lock);
-	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
-	spin_unlock_irq(&ring->execlist_lock);
-
-	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
-		struct intel_context *ctx = req->ctx;
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[ring->id].state;
-
-		if (ctx_obj && (ctx != ring->default_context))
-			intel_lr_context_unpin(req);
-		list_del(&req->execlist_link);
-		i915_gem_request_unreference(req);
-	}
-}
-
 void intel_logical_ring_stop(struct intel_engine_cs *ring)
 {
 	struct drm_i915_private *dev_priv = ring->dev->dev_private;
@@ -1938,7 +1908,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 	init_waitqueue_head(&ring->irq_queue);
 
 	INIT_LIST_HEAD(&ring->execlist_queue);
-	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
 	spin_lock_init(&ring->execlist_lock);
 
 	ret = i915_cmd_parser_init_ring(ring);
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 4e60d54..e6a4900 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -95,7 +95,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct list_head *vmas);
 u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
 
-void intel_lrc_irq_handler(struct intel_engine_cs *ring);
+bool intel_lrc_irq_handler(struct intel_engine_cs *ring);
 void intel_execlists_retire_requests(struct intel_engine_cs *ring);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 49fa41d..d99b167 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -264,7 +264,6 @@ struct  intel_engine_cs {
 	/* Execlists */
 	spinlock_t execlist_lock;
 	struct list_head execlist_queue;
-	struct list_head execlist_retired_req_list;
 	u8 next_context_status_buffer;
 	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
 	int		(*emit_request)(struct drm_i915_gem_request *request);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects
  2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
  2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
@ 2015-10-06 14:52 ` Nick Hoath
  2015-10-07 16:05   ` Daniel Vetter
  2015-10-06 14:52 ` [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items Nick Hoath
  2015-10-06 14:52 ` [PATCH 4/4] drm/i915: Only update ringbuf address when necessary Nick Hoath
  3 siblings, 1 reply; 24+ messages in thread
From: Nick Hoath @ 2015-10-06 14:52 UTC (permalink / raw)
  To: intel-gfx

Shovel all context related objects through the active queue and obj
management.

- Added callback in vma_(un)bind to add CPU (un)mapping at same time
  if desired
- Inserted LRC hw context & ringbuf to vma active list

Issue: VIZ-4277
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h         |  4 ++
 drivers/gpu/drm/i915/i915_gem.c         |  3 ++
 drivers/gpu/drm/i915/i915_gem_gtt.c     |  8 ++++
 drivers/gpu/drm/i915/intel_lrc.c        | 28 +++++++++++--
 drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++++++++++++++++++---------------
 drivers/gpu/drm/i915/intel_ringbuffer.h |  3 --
 6 files changed, 79 insertions(+), 38 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index 3d217f9..d660ee3 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2169,6 +2169,10 @@ struct drm_i915_gem_object {
 			struct work_struct *work;
 		} userptr;
 	};
+
+	/** Support for automatic CPU side mapping of object */
+	int (*mmap)(struct drm_i915_gem_object *obj, bool unmap);
+	void *mappable;
 };
 #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
 
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index fc82171..56e0e00 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3262,6 +3262,9 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
 	if (vma->pin_count)
 		return -EBUSY;
 
+	if (obj->mmap)
+		obj->mmap(obj, true);
+
 	BUG_ON(obj->pages == NULL);
 
 	if (wait) {
diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
index 620d57e..786ec4b 100644
--- a/drivers/gpu/drm/i915/i915_gem_gtt.c
+++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
@@ -3495,6 +3495,14 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
 
 	vma->bound |= bind_flags;
 
+	if (vma->obj->mmap) {
+		ret = vma->obj->mmap(vma->obj, false);
+		if (ret) {
+			i915_vma_unbind(vma);
+			return ret;
+		}
+	}
+
 	return 0;
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index e8f5b6c..b807928 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -723,6 +723,18 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 
 	intel_logical_ring_advance(request->ringbuf);
 
+	/* Push the hw context on to the active list */
+	i915_vma_move_to_active(
+			i915_gem_obj_to_ggtt(
+				request->ctx->engine[ring->id].state),
+			request);
+
+	/* Push the ringbuf on to the active list */
+	i915_vma_move_to_active(
+			i915_gem_obj_to_ggtt(
+			request->ctx->engine[ring->id].ringbuf->obj),
+			request);
+
 	request->tail = request->ringbuf->tail;
 
 	if (intel_ring_stopped(ring))
@@ -1006,10 +1018,15 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 	if (ret)
 		return ret;
 
-	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
+	ret = i915_gem_obj_ggtt_pin(ringbuf->obj, PAGE_SIZE,
+			PIN_MAPPABLE);
 	if (ret)
 		goto unpin_ctx_obj;
 
+	ret = i915_gem_object_set_to_gtt_domain(ringbuf->obj, true);
+	if (ret)
+		goto unpin_rb_obj;
+
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
@@ -1018,6 +1035,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
 
 	return ret;
 
+unpin_rb_obj:
+	i915_gem_object_ggtt_unpin(ringbuf->obj);
 unpin_ctx_obj:
 	i915_gem_object_ggtt_unpin(ctx_obj);
 
@@ -1052,7 +1071,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
 	if (ctx_obj) {
 		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
 		if (--rq->ctx->engine[ring->id].pin_count == 0) {
-			intel_unpin_ringbuffer_obj(ringbuf);
+			i915_gem_object_ggtt_unpin(ringbuf->obj);
 			i915_gem_object_ggtt_unpin(ctx_obj);
 		}
 	}
@@ -2369,7 +2388,7 @@ void intel_lr_context_free(struct intel_context *ctx)
 			struct intel_engine_cs *ring = ringbuf->ring;
 
 			if (ctx == ring->default_context) {
-				intel_unpin_ringbuffer_obj(ringbuf);
+				i915_gem_object_ggtt_unpin(ringbuf->obj);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
 			WARN_ON(ctx->engine[ring->id].pin_count);
@@ -2536,5 +2555,8 @@ void intel_lr_context_reset(struct drm_device *dev,
 
 		ringbuf->head = 0;
 		ringbuf->tail = 0;
+
+		i915_gem_object_ggtt_unpin(
+				ctx->engine[ring->id].state);
 	}
 }
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
index c82c74c..79df8ca 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.c
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
@@ -1958,38 +1958,35 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
 	return 0;
 }
 
-void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
+static int intel_mmap_ringbuffer_obj(struct drm_i915_gem_object *obj,
+		bool unmap)
 {
-	iounmap(ringbuf->virtual_start);
-	ringbuf->virtual_start = NULL;
-	i915_gem_object_ggtt_unpin(ringbuf->obj);
-}
-
-int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
-				     struct intel_ringbuffer *ringbuf)
-{
-	struct drm_i915_private *dev_priv = to_i915(dev);
-	struct drm_i915_gem_object *obj = ringbuf->obj;
-	int ret;
-
-	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
-	if (ret)
-		return ret;
-
-	ret = i915_gem_object_set_to_gtt_domain(obj, true);
-	if (ret) {
-		i915_gem_object_ggtt_unpin(obj);
-		return ret;
-	}
-
-	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
-			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
-	if (ringbuf->virtual_start == NULL) {
-		i915_gem_object_ggtt_unpin(obj);
-		return -EINVAL;
+	int ret = 0;
+	struct intel_ringbuffer *ringbuf =
+	(struct intel_ringbuffer *)obj->mappable;
+
+	if (!unmap) {
+		struct drm_device *dev = ringbuf->ring->dev;
+		struct drm_i915_private *dev_priv = to_i915(dev);
+
+		WARN_ON(ringbuf->virtual_start != NULL);
+		if (ringbuf->virtual_start == NULL) {
+			ringbuf->virtual_start = ioremap_wc(
+					dev_priv->gtt.mappable_base +
+					i915_gem_obj_ggtt_offset(obj),
+					ringbuf->size);
+			if (ringbuf->virtual_start == NULL) {
+				i915_gem_object_ggtt_unpin(obj);
+				return -EINVAL;
+			}
+		}
+	} else {
+		if (!i915_gem_obj_is_pinned(ringbuf->obj)) {
+			iounmap(ringbuf->virtual_start);
+			ringbuf->virtual_start = NULL;
+		}
 	}
-
-	return 0;
+	return ret;
 }
 
 static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
@@ -2016,6 +2013,9 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
 
 	ringbuf->obj = obj;
 
+	obj->mmap = intel_mmap_ringbuffer_obj;
+	obj->mappable = ringbuf;
+
 	return 0;
 }
 
@@ -2094,7 +2094,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 			goto error;
 	}
 
-	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
+	ret = i915_gem_obj_ggtt_pin(ringbuf->obj, PAGE_SIZE, PIN_MAPPABLE);
 	if (ret) {
 		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
 				ring->name, ret);
@@ -2102,12 +2102,19 @@ static int intel_init_ring_buffer(struct drm_device *dev,
 		goto error;
 	}
 
+	ret = i915_gem_object_set_to_gtt_domain(ringbuf->obj, true);
+	if (ret)
+		goto error_unpin;
+
 	ret = i915_cmd_parser_init_ring(ring);
 	if (ret)
 		goto error;
 
 	return 0;
 
+error_unpin:
+	i915_gem_object_ggtt_unpin(ringbuf->obj);
+	intel_destroy_ringbuffer_obj(ringbuf);
 error:
 	intel_ringbuffer_free(ringbuf);
 	ring->buffer = NULL;
@@ -2126,7 +2133,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
 	intel_stop_ring_buffer(ring);
 	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
 
-	intel_unpin_ringbuffer_obj(ring->buffer);
+	i915_gem_object_ggtt_unpin(ring->buffer->obj);
 	intel_ringbuffer_free(ring->buffer);
 	ring->buffer = NULL;
 
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index d99b167..8daaf99 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -421,9 +421,6 @@ intel_write_status_page(struct intel_engine_cs *ring,
 
 struct intel_ringbuffer *
 intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size);
-int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
-				     struct intel_ringbuffer *ringbuf);
-void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
 void intel_ringbuffer_free(struct intel_ringbuffer *ring);
 
 void intel_stop_ring_buffer(struct intel_engine_cs *ring);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items.
  2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
  2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
  2015-10-06 14:52 ` [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects Nick Hoath
@ 2015-10-06 14:52 ` Nick Hoath
  2015-10-07 16:08   ` Daniel Vetter
  2015-10-06 14:52 ` [PATCH 4/4] drm/i915: Only update ringbuf address when necessary Nick Hoath
  3 siblings, 1 reply; 24+ messages in thread
From: Nick Hoath @ 2015-10-06 14:52 UTC (permalink / raw)
  To: intel-gfx

Pin the hw ctx mapping so that it is not mapped/unmapped per bb
when doing GuC submission.

Issue: VIZ-4277
Cc: David Gordon <david.s.gordon@intel.com>
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
---
 drivers/gpu/drm/i915/i915_debugfs.c | 14 ++++------
 drivers/gpu/drm/i915/i915_drv.h     |  4 ++-
 drivers/gpu/drm/i915/intel_lrc.c    | 56 +++++++++++++++++++++++++++----------
 3 files changed, 50 insertions(+), 24 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
index 3f2a7a7..e68cf5fa 100644
--- a/drivers/gpu/drm/i915/i915_debugfs.c
+++ b/drivers/gpu/drm/i915/i915_debugfs.c
@@ -1970,10 +1970,9 @@ static int i915_context_status(struct seq_file *m, void *unused)
 
 static void i915_dump_lrc_obj(struct seq_file *m,
 			      struct intel_engine_cs *ring,
-			      struct drm_i915_gem_object *ctx_obj)
+			      struct drm_i915_gem_object *ctx_obj,
+			      uint32_t *reg_state)
 {
-	struct page *page;
-	uint32_t *reg_state;
 	int j;
 	unsigned long ggtt_offset = 0;
 
@@ -1996,17 +1995,13 @@ static void i915_dump_lrc_obj(struct seq_file *m,
 		return;
 	}
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	if (!WARN_ON(page == NULL)) {
-		reg_state = kmap_atomic(page);
-
+	if (!WARN_ON(reg_state == NULL)) {
 		for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
 			seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
 				   ggtt_offset + 4096 + (j * 4),
 				   reg_state[j], reg_state[j + 1],
 				   reg_state[j + 2], reg_state[j + 3]);
 		}
-		kunmap_atomic(reg_state);
 	}
 
 	seq_putc(m, '\n');
@@ -2034,7 +2029,8 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
 		for_each_ring(ring, dev_priv, i) {
 			if (ring->default_context != ctx)
 				i915_dump_lrc_obj(m, ring,
-						  ctx->engine[i].state);
+						  ctx->engine[i].state,
+						  ctx->engine[i].reg_state);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index d660ee3..b49fd12 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -879,8 +879,10 @@ struct intel_context {
 	} legacy_hw_ctx;
 
 	/* Execlists */
-	struct {
+	struct intel_context_engine {
 		struct drm_i915_gem_object *state;
+		uint32_t *reg_state;
+		struct page *page;
 		struct intel_ringbuffer *ringbuf;
 		int pin_count;
 	} engine[I915_NUM_RINGS];
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index b807928..55a4de56 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -360,16 +360,13 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
 	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
 	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
-	struct page *page;
-	uint32_t *reg_state;
+	uint32_t *reg_state = rq->ctx->engine[ring->id].reg_state;
 
 	BUG_ON(!ctx_obj);
+	WARN_ON(!reg_state);
 	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
 	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
-	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-	reg_state = kmap_atomic(page);
-
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
 	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
 
@@ -385,8 +382,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
 	}
 
-	kunmap_atomic(reg_state);
-
 	return 0;
 }
 
@@ -1004,7 +999,31 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
 	return 0;
 }
 
-static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
+static int intel_mmap_hw_context(struct drm_i915_gem_object *obj,
+		bool unmap)
+{
+	int ret = 0;
+	struct intel_context_engine *ice =
+			(struct intel_context_engine *)obj->mappable;
+	struct page *page;
+	uint32_t *reg_state;
+
+	if (unmap) {
+		kunmap(ice->page);
+		ice->reg_state = NULL;
+		ice->page = NULL;
+	} else {
+		page = i915_gem_object_get_page(obj, LRC_STATE_PN);
+		reg_state = kmap(page);
+		ice->reg_state = reg_state;
+		ice->page = page;
+	}
+	return ret;
+}
+
+static int intel_lr_context_do_pin(
+		struct intel_context *ctx,
+		struct intel_engine_cs *ring,
 		struct drm_i915_gem_object *ctx_obj,
 		struct intel_ringbuffer *ringbuf)
 {
@@ -1051,7 +1070,7 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
 	struct intel_ringbuffer *ringbuf = rq->ringbuf;
 
 	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
-		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
+		ret = intel_lr_context_do_pin(rq->ctx, ring, ctx_obj, ringbuf);
 		if (ret)
 			goto reset_pin_count;
 	}
@@ -1915,6 +1934,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
 
 static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
 {
+	struct page *page;
+	uint32_t *reg_state;
 	int ret;
 
 	/* Intentionally left blank. */
@@ -1939,6 +1960,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 
 	/* As this is the default context, always pin it */
 	ret = intel_lr_context_do_pin(
+			ring->default_context,
 			ring,
 			ring->default_context->engine[ring->id].state,
 			ring->default_context->engine[ring->id].ringbuf);
@@ -1949,6 +1971,13 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
 		return ret;
 	}
 
+	page = i915_gem_object_get_page(
+			ring->default_context->engine[ring->id].state,
+			LRC_STATE_PN);
+	reg_state = kmap(page);
+	ring->default_context->engine[ring->id].reg_state = reg_state;
+	ring->default_context->engine[ring->id].page = page;
+
 	return ret;
 }
 
@@ -2388,6 +2417,7 @@ void intel_lr_context_free(struct intel_context *ctx)
 			struct intel_engine_cs *ring = ringbuf->ring;
 
 			if (ctx == ring->default_context) {
+				kunmap(ctx->engine[ring->id].page);
 				i915_gem_object_ggtt_unpin(ringbuf->obj);
 				i915_gem_object_ggtt_unpin(ctx_obj);
 			}
@@ -2489,6 +2519,8 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
 		goto error_ringbuf;
 	}
 
+	ctx_obj->mmap = intel_mmap_hw_context;
+	ctx_obj->mappable = &(ctx->engine[ring->id]);
 	ctx->engine[ring->id].ringbuf = ringbuf;
 	ctx->engine[ring->id].state = ctx_obj;
 
@@ -2536,7 +2568,6 @@ void intel_lr_context_reset(struct drm_device *dev,
 		struct intel_ringbuffer *ringbuf =
 				ctx->engine[ring->id].ringbuf;
 		uint32_t *reg_state;
-		struct page *page;
 
 		if (!ctx_obj)
 			continue;
@@ -2545,14 +2576,11 @@ void intel_lr_context_reset(struct drm_device *dev,
 			WARN(1, "Failed get_pages for context obj\n");
 			continue;
 		}
-		page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
-		reg_state = kmap_atomic(page);
+		reg_state = ctx->engine[ring->id].reg_state;
 
 		reg_state[CTX_RING_HEAD+1] = 0;
 		reg_state[CTX_RING_TAIL+1] = 0;
 
-		kunmap_atomic(reg_state);
-
 		ringbuf->head = 0;
 		ringbuf->tail = 0;
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/4] drm/i915: Only update ringbuf address when necessary
  2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
                   ` (2 preceding siblings ...)
  2015-10-06 14:52 ` [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items Nick Hoath
@ 2015-10-06 14:52 ` Nick Hoath
  3 siblings, 0 replies; 24+ messages in thread
From: Nick Hoath @ 2015-10-06 14:52 UTC (permalink / raw)
  To: intel-gfx

We now only need to update the address of the ringbuf object in the
hw context when it is pinned, and the hw context is first CPU mapped

Issue: VIZ-4277
Cc: David Gordon <david.s.gordon@intel.com>
Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 55a4de56..92a0ece 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -368,7 +368,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
 	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
 
 	reg_state[CTX_RING_TAIL+1] = rq->tail;
-	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
 
 	if (ppgtt && !USES_FULL_48BIT_PPGTT(ppgtt->base.dev)) {
 		/* True 32b PPGTT with dynamic page allocation: update PDP
@@ -1046,6 +1045,9 @@ static int intel_lr_context_do_pin(
 	if (ret)
 		goto unpin_rb_obj;
 
+	ctx->engine[ring->id].reg_state[CTX_RING_BUFFER_START+1] =
+			i915_gem_obj_ggtt_offset(ringbuf->obj);
+
 	ctx_obj->dirty = true;
 
 	/* Invalidate GuC TLB. */
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
@ 2015-10-07 16:03   ` Daniel Vetter
  2015-10-07 16:05     ` Chris Wilson
  2015-10-08 12:32   ` Chris Wilson
  1 sibling, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-07 16:03 UTC (permalink / raw)
  To: Nick Hoath; +Cc: intel-gfx

On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> There is a desire to simplify the i915 driver by reducing the number of
> different code paths introduced by the LRC / execlists support.  As the
> execlists request is now part of the gem request it is possible and
> desirable to unify the request life-cycles for execlist and legacy
> requests.
> 
> Added a context complete flag to a request which gets set during the
> context switch interrupt.
> 
> Added a function i915_gem_request_retireable().  A request is considered
> retireable if its seqno passed (i.e. the request has completed) and either
> it was never submitted to the ELSP or its context completed.  This ensures
> that context save is carried out before the last request for a context is
> considered retireable.  retire_requests_ring() now uses
> i915_gem_request_retireable() rather than request_complete() when deciding
> which requests to retire. Requests that were not waiting for a context
> switch interrupt (either as a result of being merged into a following
> request or by being a legacy request) will be considered retireable as
> soon as their seqno has passed.
> 
> Removed the extra request reference held for the execlist request.
> 
> Removed intel_execlists_retire_requests() and all references to
> intel_engine_cs.execlist_retired_req_list.
> 
> Moved context unpinning into retire_requests_ring() for now.  Further work
> is pending for the context pinning - this patch should allow us to use the
> active list to track context and ring buffer objects later.
> 
> Changed gen8_cs_irq_handler() so that notify_ring() is called when
> contexts complete as well as when a user interrupt occurs so that
> notification happens when a request is complete and context save has
> finished.
> 
> v2: Rebase over the read-read optimisation changes
> 
> v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> 
> v4: Fixed various pin leaks
> 
> Issue: VIZ-4277
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  6 +++
>  drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
>  drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
>  drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
>  drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>  6 files changed, 118 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index fbf0ae9..3d217f9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
>  	/** Execlists no. of times this request has been sent to the ELSP */
>  	int elsp_submitted;
>  
> +	/**
> +	 * Execlists: whether this requests's context has completed after
> +	 * submission to the ELSP
> +	 */
> +	bool ctx_complete;
> +
>  };
>  
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 52642af..fc82171 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
>  				       typeof(*tmp), list);
>  
>  		i915_gem_request_retire(tmp);
> +
> +		if (i915.enable_execlists) {
> +			struct intel_context *ctx = tmp->ctx;
> +			struct drm_i915_private *dev_priv =
> +				engine->dev->dev_private;
> +			unsigned long flags;
> +			struct drm_i915_gem_object *ctx_obj =
> +				ctx->engine[engine->id].state;
> +
> +			spin_lock_irqsave(&engine->execlist_lock, flags);
> +
> +			if (ctx_obj && (ctx != engine->default_context))
> +				intel_lr_context_unpin(tmp);
> +
> +			intel_runtime_pm_put(dev_priv);
> +			spin_unlock_irqrestore(&engine->execlist_lock, flags);
> +		}
> +
>  	} while (tmp != req);
>  
>  	WARN_ON(i915_verify_lists(engine->dev));
> @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>  	list_move_tail(&vma->mm_list, &vma->vm->active_list);
>  }
>  
> +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> +{
> +	return (i915_gem_request_completed(req, true) &&
> +		(!req->elsp_submitted || req->ctx_complete));
> +}
> +
>  static void
>  i915_gem_object_retire__write(struct drm_i915_gem_object *obj)
>  {
> @@ -2829,10 +2853,28 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  					   struct drm_i915_gem_request,
>  					   list);
>  
> -		if (!i915_gem_request_completed(request, true))
> +		if (!i915_gem_request_retireable(request))
>  			break;
>  
>  		i915_gem_request_retire(request);
> +
> +		if (i915.enable_execlists) {
> +			struct intel_context *ctx = request->ctx;
> +			struct drm_i915_private *dev_priv =
> +				ring->dev->dev_private;
> +			unsigned long flags;
> +			struct drm_i915_gem_object *ctx_obj =
> +				ctx->engine[ring->id].state;
> +
> +			spin_lock_irqsave(&ring->execlist_lock, flags);
> +
> +			if (ctx_obj && (ctx != ring->default_context))
> +				intel_lr_context_unpin(request);
> +
> +			intel_runtime_pm_put(dev_priv);
> +			spin_unlock_irqrestore(&ring->execlist_lock, flags);
> +		}
> +
>  	}
>  
>  	/* Move any buffers on the active list that are no longer referenced
> @@ -2848,12 +2890,14 @@ i915_gem_retire_requests_ring(struct intel_engine_cs *ring)
>  
>  		if (!list_empty(&obj->last_read_req[ring->id]->list))
>  			break;
> +		if (!i915_gem_request_retireable(obj->last_read_req[ring->id]))
> +			break;
>  
>  		i915_gem_object_retire__read(obj, ring->id);
>  	}
>  
>  	if (unlikely(ring->trace_irq_req &&
> -		     i915_gem_request_completed(ring->trace_irq_req, true))) {
> +		     i915_gem_request_retireable(ring->trace_irq_req))) {
>  		ring->irq_put(ring);
>  		i915_gem_request_assign(&ring->trace_irq_req, NULL);
>  	}
> @@ -2872,15 +2916,6 @@ i915_gem_retire_requests(struct drm_device *dev)
>  	for_each_ring(ring, dev_priv, i) {
>  		i915_gem_retire_requests_ring(ring);
>  		idle &= list_empty(&ring->request_list);
> -		if (i915.enable_execlists) {
> -			unsigned long flags;
> -
> -			spin_lock_irqsave(&ring->execlist_lock, flags);
> -			idle &= list_empty(&ring->execlist_queue);
> -			spin_unlock_irqrestore(&ring->execlist_lock, flags);
> -
> -			intel_execlists_retire_requests(ring);
> -		}
>  	}
>  
>  	if (idle)
> @@ -2956,12 +2991,14 @@ i915_gem_object_flush_active(struct drm_i915_gem_object *obj)
>  		if (req == NULL)
>  			continue;
>  
> -		if (list_empty(&req->list))
> -			goto retire;
> +		if (list_empty(&req->list)) {
> +			if (i915_gem_request_retireable(req))
> +				i915_gem_object_retire__read(obj, i);
> +			continue;
> +		}
>  
> -		if (i915_gem_request_completed(req, true)) {
> +		if (i915_gem_request_retireable(req)) {
>  			__i915_gem_request_retire__upto(req);
> -retire:
>  			i915_gem_object_retire__read(obj, i);
>  		}
>  	}
> diff --git a/drivers/gpu/drm/i915/i915_irq.c b/drivers/gpu/drm/i915/i915_irq.c
> index 8ca772d..ebbafac 100644
> --- a/drivers/gpu/drm/i915/i915_irq.c
> +++ b/drivers/gpu/drm/i915/i915_irq.c
> @@ -1291,66 +1291,91 @@ static irqreturn_t gen8_gt_irq_handler(struct drm_i915_private *dev_priv,
>  				       u32 master_ctl)
>  {
>  	irqreturn_t ret = IRQ_NONE;
> +	bool need_notify;
>  
>  	if (master_ctl & (GEN8_GT_RCS_IRQ | GEN8_GT_BCS_IRQ)) {
> -		u32 tmp = I915_READ_FW(GEN8_GT_IIR(0));
> -		if (tmp) {
> -			I915_WRITE_FW(GEN8_GT_IIR(0), tmp);
> +		u32 iir = I915_READ_FW(GEN8_GT_IIR(0));
> +
> +		if (iir) {
> +			I915_WRITE_FW(GEN8_GT_IIR(0), iir);
>  			ret = IRQ_HANDLED;
>  
> -			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
> -				intel_lrc_irq_handler(&dev_priv->ring[RCS]);
> -			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_RCS_IRQ_SHIFT))
> +			need_notify = false;
> +			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
> +					GEN8_RCS_IRQ_SHIFT))
> +				need_notify = intel_lrc_irq_handler(
> +						&dev_priv->ring[RCS]);
> +			if (iir & (GT_RENDER_USER_INTERRUPT <<
> +					GEN8_RCS_IRQ_SHIFT) || need_notify)
>  				notify_ring(&dev_priv->ring[RCS]);
>  
> -			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
> -				intel_lrc_irq_handler(&dev_priv->ring[BCS]);
> -			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_BCS_IRQ_SHIFT))
> +			need_notify = false;
> +			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
> +					GEN8_BCS_IRQ_SHIFT))
> +				need_notify = intel_lrc_irq_handler(
> +						&dev_priv->ring[BCS]);
> +			if (iir & (GT_RENDER_USER_INTERRUPT <<
> +					GEN8_BCS_IRQ_SHIFT) || need_notify)
>  				notify_ring(&dev_priv->ring[BCS]);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT0)!\n");
>  	}
>  
>  	if (master_ctl & (GEN8_GT_VCS1_IRQ | GEN8_GT_VCS2_IRQ)) {
> -		u32 tmp = I915_READ_FW(GEN8_GT_IIR(1));
> -		if (tmp) {
> -			I915_WRITE_FW(GEN8_GT_IIR(1), tmp);
> +		u32 iir = I915_READ_FW(GEN8_GT_IIR(1));
> +
> +		if (iir) {
> +			I915_WRITE_FW(GEN8_GT_IIR(1), iir);
>  			ret = IRQ_HANDLED;
>  
> -			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
> -				intel_lrc_irq_handler(&dev_priv->ring[VCS]);
> -			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS1_IRQ_SHIFT))
> +			need_notify = false;
> +			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
> +					GEN8_VCS1_IRQ_SHIFT))
> +				need_notify = intel_lrc_irq_handler(
> +						&dev_priv->ring[VCS]);
> +			if (iir & (GT_RENDER_USER_INTERRUPT <<
> +					GEN8_VCS1_IRQ_SHIFT) || need_notify)
>  				notify_ring(&dev_priv->ring[VCS]);
>  
> -			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
> -				intel_lrc_irq_handler(&dev_priv->ring[VCS2]);
> -			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VCS2_IRQ_SHIFT))
> +			need_notify = false;
> +			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
> +					GEN8_VCS2_IRQ_SHIFT))
> +				need_notify = intel_lrc_irq_handler(
> +						&dev_priv->ring[VCS2]);
> +			if (iir & (GT_RENDER_USER_INTERRUPT <<
> +					GEN8_VCS2_IRQ_SHIFT) || need_notify)
>  				notify_ring(&dev_priv->ring[VCS2]);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT1)!\n");
>  	}
>  
>  	if (master_ctl & GEN8_GT_VECS_IRQ) {
> -		u32 tmp = I915_READ_FW(GEN8_GT_IIR(3));
> -		if (tmp) {
> -			I915_WRITE_FW(GEN8_GT_IIR(3), tmp);
> +		u32 iir = I915_READ_FW(GEN8_GT_IIR(3));
> +
> +		if (iir) {
> +			I915_WRITE_FW(GEN8_GT_IIR(3), iir);
>  			ret = IRQ_HANDLED;
>  
> -			if (tmp & (GT_CONTEXT_SWITCH_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
> -				intel_lrc_irq_handler(&dev_priv->ring[VECS]);
> -			if (tmp & (GT_RENDER_USER_INTERRUPT << GEN8_VECS_IRQ_SHIFT))
> +			need_notify = false;
> +			if (iir & (GT_CONTEXT_SWITCH_INTERRUPT <<
> +					GEN8_VECS_IRQ_SHIFT))
> +				need_notify = intel_lrc_irq_handler(
> +						&dev_priv->ring[VECS]);
> +			if (iir & (GT_RENDER_USER_INTERRUPT <<
> +					GEN8_VECS_IRQ_SHIFT) || need_notify)
>  				notify_ring(&dev_priv->ring[VECS]);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (GT3)!\n");
>  	}
>  
>  	if (master_ctl & GEN8_GT_PM_IRQ) {
> -		u32 tmp = I915_READ_FW(GEN8_GT_IIR(2));
> -		if (tmp & dev_priv->pm_rps_events) {
> +		u32 iir = I915_READ_FW(GEN8_GT_IIR(2));
> +
> +		if (iir & dev_priv->pm_rps_events) {
>  			I915_WRITE_FW(GEN8_GT_IIR(2),
> -				      tmp & dev_priv->pm_rps_events);
> +				      iir & dev_priv->pm_rps_events);
>  			ret = IRQ_HANDLED;
> -			gen6_rps_irq_handler(dev_priv, tmp);
> +			gen6_rps_irq_handler(dev_priv, iir);
>  		} else
>  			DRM_ERROR("The master control interrupt lied (PM)!\n");
>  	}

I had a really hard time reading this patch until I realized what you've
done. And this is definitely getting too complicated. I think it'd be good
to first extract a little helper which takes the iir and the shift and
then calls both intel_lrc_irq_handler and notify ring.

Also if you do large-scale replacement like s/tmp/iir/ that needs to be a
separate patch.

With that technicality out of the way let's look at the larger picture.
Let me first describe in more detail how the legacy ctx retiring works,
since it doesn't work like what you've done here really. Let's look at the
case where we have some batch running on ctx A and then switch to a batch
running on ctx B:

| currently ctx A active on the hw
| MI_BB_START/flushes/...
| seqno write + interrupt for the batch just submitted, tracked in request 1
<- ctx A is still used by the hardware and not flushed out
| MI_SET_CTX to switch to B
| batch/flush
| seqno write/irq for request 2
v

What you seem to be doing is changing the logical retire point for request
1 to include the MI_SET_CTX (but in execlist terms, i.e. when the hw has
confirmed the ctx switch to the next lrc in the elsp).

But what the legacy stuff does is wait until request 2 with retiring the
ctx object for ctx B. So what happens in that context switch function
(which is called synchronously from execbuf ioctl code) is that we
directly unpin ctx A and pin ctx B. And then we throw the ctx A object
onto the active list with request 2 attached as a write fence to make sure
the eviction code waits for request 2 to complete, which necessarily means
the ctx switch has completed.

Of course this isn't entirely perfect since we could evict ctx A even
before request 2 has completed, but since throwing out active objects is
really a last resort thing that doesn't really matter.

Now we could do the exact same design for execlist since we never reorder
execlist ctx in upstream. And that's what I originally suggested. But that
would seriously piss of John with his scheduler patches.

Instead I think the proper design would be to create a new, separate
request object every time we submit a new batch for a different context
(we don't want to do this unconditionally for overheads) which will
complete exactly when the ctx switch completes. Since we have full-blown
request abstractio nowadays that should be possible without touching
generic code or adding special-cases to request or active object tracking
code.
-Daniel


> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 825fa7a..e8f5b6c 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -426,9 +426,8 @@ static void execlists_context_unqueue(struct intel_engine_cs *ring)
>  			/* Same ctx: ignore first request, as second request
>  			 * will update tail past first request's workload */
>  			cursor->elsp_submitted = req0->elsp_submitted;
> +			req0->elsp_submitted = 0;
>  			list_del(&req0->execlist_link);
> -			list_add_tail(&req0->execlist_link,
> -				&ring->execlist_retired_req_list);
>  			req0 = cursor;
>  		} else {
>  			req1 = cursor;
> @@ -478,11 +477,9 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>  		if (intel_execlists_ctx_id(ctx_obj) == request_id) {
>  			WARN(head_req->elsp_submitted == 0,
>  			     "Never submitted head request\n");
> -
>  			if (--head_req->elsp_submitted <= 0) {
> +				head_req->ctx_complete = 1;
>  				list_del(&head_req->execlist_link);
> -				list_add_tail(&head_req->execlist_link,
> -					&ring->execlist_retired_req_list);
>  				return true;
>  			}
>  		}
> @@ -497,8 +494,9 @@ static bool execlists_check_remove_request(struct intel_engine_cs *ring,
>   *
>   * Check the unread Context Status Buffers and manage the submission of new
>   * contexts to the ELSP accordingly.
> + * @return whether a context completed
>   */
> -void intel_lrc_irq_handler(struct intel_engine_cs *ring)
> +bool intel_lrc_irq_handler(struct intel_engine_cs *ring)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
>  	u32 status_pointer;
> @@ -558,6 +556,8 @@ void intel_lrc_irq_handler(struct intel_engine_cs *ring)
>  		   _MASKED_FIELD(GEN8_CSB_PTR_MASK << 8,
>  				 ((u32)ring->next_context_status_buffer &
>  				  GEN8_CSB_PTR_MASK) << 8));
> +
> +	return (submit_contexts != 0);
>  }
>  
>  static int execlists_context_queue(struct drm_i915_gem_request *request)
> @@ -569,8 +569,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>  	if (request->ctx != ring->default_context)
>  		intel_lr_context_pin(request);
>  
> -	i915_gem_request_reference(request);
> -
>  	spin_lock_irq(&ring->execlist_lock);
>  
>  	list_for_each_entry(cursor, &ring->execlist_queue, execlist_link)
> @@ -588,8 +586,6 @@ static int execlists_context_queue(struct drm_i915_gem_request *request)
>  			WARN(tail_req->elsp_submitted != 0,
>  				"More than 2 already-submitted reqs queued\n");
>  			list_del(&tail_req->execlist_link);
> -			list_add_tail(&tail_req->execlist_link,
> -				&ring->execlist_retired_req_list);
>  		}
>  	}
>  
> @@ -958,32 +954,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>  	return 0;
>  }
>  
> -void intel_execlists_retire_requests(struct intel_engine_cs *ring)
> -{
> -	struct drm_i915_gem_request *req, *tmp;
> -	struct list_head retired_list;
> -
> -	WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
> -	if (list_empty(&ring->execlist_retired_req_list))
> -		return;
> -
> -	INIT_LIST_HEAD(&retired_list);
> -	spin_lock_irq(&ring->execlist_lock);
> -	list_replace_init(&ring->execlist_retired_req_list, &retired_list);
> -	spin_unlock_irq(&ring->execlist_lock);
> -
> -	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
> -		struct intel_context *ctx = req->ctx;
> -		struct drm_i915_gem_object *ctx_obj =
> -				ctx->engine[ring->id].state;
> -
> -		if (ctx_obj && (ctx != ring->default_context))
> -			intel_lr_context_unpin(req);
> -		list_del(&req->execlist_link);
> -		i915_gem_request_unreference(req);
> -	}
> -}
> -
>  void intel_logical_ring_stop(struct intel_engine_cs *ring)
>  {
>  	struct drm_i915_private *dev_priv = ring->dev->dev_private;
> @@ -1938,7 +1908,6 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  	init_waitqueue_head(&ring->irq_queue);
>  
>  	INIT_LIST_HEAD(&ring->execlist_queue);
> -	INIT_LIST_HEAD(&ring->execlist_retired_req_list);
>  	spin_lock_init(&ring->execlist_lock);
>  
>  	ret = i915_cmd_parser_init_ring(ring);
> diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
> index 4e60d54..e6a4900 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.h
> +++ b/drivers/gpu/drm/i915/intel_lrc.h
> @@ -95,7 +95,7 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
>  			       struct list_head *vmas);
>  u32 intel_execlists_ctx_id(struct drm_i915_gem_object *ctx_obj);
>  
> -void intel_lrc_irq_handler(struct intel_engine_cs *ring);
> +bool intel_lrc_irq_handler(struct intel_engine_cs *ring);
>  void intel_execlists_retire_requests(struct intel_engine_cs *ring);
>  
>  #endif /* _INTEL_LRC_H_ */
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index 49fa41d..d99b167 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -264,7 +264,6 @@ struct  intel_engine_cs {
>  	/* Execlists */
>  	spinlock_t execlist_lock;
>  	struct list_head execlist_queue;
> -	struct list_head execlist_retired_req_list;
>  	u8 next_context_status_buffer;
>  	u32             irq_keep_mask; /* bitmask for interrupts that should not be masked */
>  	int		(*emit_request)(struct drm_i915_gem_request *request);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects
  2015-10-06 14:52 ` [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects Nick Hoath
@ 2015-10-07 16:05   ` Daniel Vetter
  2015-10-08 13:35     ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-07 16:05 UTC (permalink / raw)
  To: Nick Hoath; +Cc: intel-gfx

On Tue, Oct 06, 2015 at 03:52:02PM +0100, Nick Hoath wrote:
> Shovel all context related objects through the active queue and obj
> management.
> 
> - Added callback in vma_(un)bind to add CPU (un)mapping at same time
>   if desired
> - Inserted LRC hw context & ringbuf to vma active list
> 
> Issue: VIZ-4277
> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  4 ++
>  drivers/gpu/drm/i915/i915_gem.c         |  3 ++
>  drivers/gpu/drm/i915/i915_gem_gtt.c     |  8 ++++
>  drivers/gpu/drm/i915/intel_lrc.c        | 28 +++++++++++--
>  drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++++++++++++++++++---------------
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 --
>  6 files changed, 79 insertions(+), 38 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index 3d217f9..d660ee3 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2169,6 +2169,10 @@ struct drm_i915_gem_object {
>  			struct work_struct *work;
>  		} userptr;
>  	};
> +
> +	/** Support for automatic CPU side mapping of object */
> +	int (*mmap)(struct drm_i915_gem_object *obj, bool unmap);

I don't think we need a map hook, that can still be done (if not done so
already) by the callers. Also it's better to rename this to vma_unbind
(and it should be at the vma level I think) since there's other potential
users. So explicit maping, lazy unmapping for the kmaps we need. That's
the same design we're using for binding objects into gpu address spaces.

Also Chris Wilson has something similar, please align with him on the
precise design of this callback.

Thanks, Daniel

> +	void *mappable;
>  };
>  #define to_intel_bo(x) container_of(x, struct drm_i915_gem_object, base)
>  
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index fc82171..56e0e00 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -3262,6 +3262,9 @@ static int __i915_vma_unbind(struct i915_vma *vma, bool wait)
>  	if (vma->pin_count)
>  		return -EBUSY;
>  
> +	if (obj->mmap)
> +		obj->mmap(obj, true);
> +
>  	BUG_ON(obj->pages == NULL);
>  
>  	if (wait) {
> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c b/drivers/gpu/drm/i915/i915_gem_gtt.c
> index 620d57e..786ec4b 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -3495,6 +3495,14 @@ int i915_vma_bind(struct i915_vma *vma, enum i915_cache_level cache_level,
>  
>  	vma->bound |= bind_flags;
>  
> +	if (vma->obj->mmap) {
> +		ret = vma->obj->mmap(vma->obj, false);
> +		if (ret) {
> +			i915_vma_unbind(vma);
> +			return ret;
> +		}
> +	}
> +
>  	return 0;
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index e8f5b6c..b807928 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -723,6 +723,18 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
>  
>  	intel_logical_ring_advance(request->ringbuf);
>  
> +	/* Push the hw context on to the active list */
> +	i915_vma_move_to_active(
> +			i915_gem_obj_to_ggtt(
> +				request->ctx->engine[ring->id].state),
> +			request);
> +
> +	/* Push the ringbuf on to the active list */
> +	i915_vma_move_to_active(
> +			i915_gem_obj_to_ggtt(
> +			request->ctx->engine[ring->id].ringbuf->obj),
> +			request);
> +
>  	request->tail = request->ringbuf->tail;
>  
>  	if (intel_ring_stopped(ring))
> @@ -1006,10 +1018,15 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
>  	if (ret)
>  		return ret;
>  
> -	ret = intel_pin_and_map_ringbuffer_obj(ring->dev, ringbuf);
> +	ret = i915_gem_obj_ggtt_pin(ringbuf->obj, PAGE_SIZE,
> +			PIN_MAPPABLE);
>  	if (ret)
>  		goto unpin_ctx_obj;
>  
> +	ret = i915_gem_object_set_to_gtt_domain(ringbuf->obj, true);
> +	if (ret)
> +		goto unpin_rb_obj;
> +
>  	ctx_obj->dirty = true;
>  
>  	/* Invalidate GuC TLB. */
> @@ -1018,6 +1035,8 @@ static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
>  
>  	return ret;
>  
> +unpin_rb_obj:
> +	i915_gem_object_ggtt_unpin(ringbuf->obj);
>  unpin_ctx_obj:
>  	i915_gem_object_ggtt_unpin(ctx_obj);
>  
> @@ -1052,7 +1071,7 @@ void intel_lr_context_unpin(struct drm_i915_gem_request *rq)
>  	if (ctx_obj) {
>  		WARN_ON(!mutex_is_locked(&ring->dev->struct_mutex));
>  		if (--rq->ctx->engine[ring->id].pin_count == 0) {
> -			intel_unpin_ringbuffer_obj(ringbuf);
> +			i915_gem_object_ggtt_unpin(ringbuf->obj);
>  			i915_gem_object_ggtt_unpin(ctx_obj);
>  		}
>  	}
> @@ -2369,7 +2388,7 @@ void intel_lr_context_free(struct intel_context *ctx)
>  			struct intel_engine_cs *ring = ringbuf->ring;
>  
>  			if (ctx == ring->default_context) {
> -				intel_unpin_ringbuffer_obj(ringbuf);
> +				i915_gem_object_ggtt_unpin(ringbuf->obj);
>  				i915_gem_object_ggtt_unpin(ctx_obj);
>  			}
>  			WARN_ON(ctx->engine[ring->id].pin_count);
> @@ -2536,5 +2555,8 @@ void intel_lr_context_reset(struct drm_device *dev,
>  
>  		ringbuf->head = 0;
>  		ringbuf->tail = 0;
> +
> +		i915_gem_object_ggtt_unpin(
> +				ctx->engine[ring->id].state);
>  	}
>  }
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.c b/drivers/gpu/drm/i915/intel_ringbuffer.c
> index c82c74c..79df8ca 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.c
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.c
> @@ -1958,38 +1958,35 @@ static int init_phys_status_page(struct intel_engine_cs *ring)
>  	return 0;
>  }
>  
> -void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> +static int intel_mmap_ringbuffer_obj(struct drm_i915_gem_object *obj,
> +		bool unmap)
>  {
> -	iounmap(ringbuf->virtual_start);
> -	ringbuf->virtual_start = NULL;
> -	i915_gem_object_ggtt_unpin(ringbuf->obj);
> -}
> -
> -int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> -				     struct intel_ringbuffer *ringbuf)
> -{
> -	struct drm_i915_private *dev_priv = to_i915(dev);
> -	struct drm_i915_gem_object *obj = ringbuf->obj;
> -	int ret;
> -
> -	ret = i915_gem_obj_ggtt_pin(obj, PAGE_SIZE, PIN_MAPPABLE);
> -	if (ret)
> -		return ret;
> -
> -	ret = i915_gem_object_set_to_gtt_domain(obj, true);
> -	if (ret) {
> -		i915_gem_object_ggtt_unpin(obj);
> -		return ret;
> -	}
> -
> -	ringbuf->virtual_start = ioremap_wc(dev_priv->gtt.mappable_base +
> -			i915_gem_obj_ggtt_offset(obj), ringbuf->size);
> -	if (ringbuf->virtual_start == NULL) {
> -		i915_gem_object_ggtt_unpin(obj);
> -		return -EINVAL;
> +	int ret = 0;
> +	struct intel_ringbuffer *ringbuf =
> +	(struct intel_ringbuffer *)obj->mappable;
> +
> +	if (!unmap) {
> +		struct drm_device *dev = ringbuf->ring->dev;
> +		struct drm_i915_private *dev_priv = to_i915(dev);
> +
> +		WARN_ON(ringbuf->virtual_start != NULL);
> +		if (ringbuf->virtual_start == NULL) {
> +			ringbuf->virtual_start = ioremap_wc(
> +					dev_priv->gtt.mappable_base +
> +					i915_gem_obj_ggtt_offset(obj),
> +					ringbuf->size);
> +			if (ringbuf->virtual_start == NULL) {
> +				i915_gem_object_ggtt_unpin(obj);
> +				return -EINVAL;
> +			}
> +		}
> +	} else {
> +		if (!i915_gem_obj_is_pinned(ringbuf->obj)) {
> +			iounmap(ringbuf->virtual_start);
> +			ringbuf->virtual_start = NULL;
> +		}
>  	}
> -
> -	return 0;
> +	return ret;
>  }
>  
>  static void intel_destroy_ringbuffer_obj(struct intel_ringbuffer *ringbuf)
> @@ -2016,6 +2013,9 @@ static int intel_alloc_ringbuffer_obj(struct drm_device *dev,
>  
>  	ringbuf->obj = obj;
>  
> +	obj->mmap = intel_mmap_ringbuffer_obj;
> +	obj->mappable = ringbuf;
> +
>  	return 0;
>  }
>  
> @@ -2094,7 +2094,7 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  			goto error;
>  	}
>  
> -	ret = intel_pin_and_map_ringbuffer_obj(dev, ringbuf);
> +	ret = i915_gem_obj_ggtt_pin(ringbuf->obj, PAGE_SIZE, PIN_MAPPABLE);
>  	if (ret) {
>  		DRM_ERROR("Failed to pin and map ringbuffer %s: %d\n",
>  				ring->name, ret);
> @@ -2102,12 +2102,19 @@ static int intel_init_ring_buffer(struct drm_device *dev,
>  		goto error;
>  	}
>  
> +	ret = i915_gem_object_set_to_gtt_domain(ringbuf->obj, true);
> +	if (ret)
> +		goto error_unpin;
> +
>  	ret = i915_cmd_parser_init_ring(ring);
>  	if (ret)
>  		goto error;
>  
>  	return 0;
>  
> +error_unpin:
> +	i915_gem_object_ggtt_unpin(ringbuf->obj);
> +	intel_destroy_ringbuffer_obj(ringbuf);
>  error:
>  	intel_ringbuffer_free(ringbuf);
>  	ring->buffer = NULL;
> @@ -2126,7 +2133,7 @@ void intel_cleanup_ring_buffer(struct intel_engine_cs *ring)
>  	intel_stop_ring_buffer(ring);
>  	WARN_ON(!IS_GEN2(ring->dev) && (I915_READ_MODE(ring) & MODE_IDLE) == 0);
>  
> -	intel_unpin_ringbuffer_obj(ring->buffer);
> +	i915_gem_object_ggtt_unpin(ring->buffer->obj);
>  	intel_ringbuffer_free(ring->buffer);
>  	ring->buffer = NULL;
>  
> diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
> index d99b167..8daaf99 100644
> --- a/drivers/gpu/drm/i915/intel_ringbuffer.h
> +++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
> @@ -421,9 +421,6 @@ intel_write_status_page(struct intel_engine_cs *ring,
>  
>  struct intel_ringbuffer *
>  intel_engine_create_ringbuffer(struct intel_engine_cs *engine, int size);
> -int intel_pin_and_map_ringbuffer_obj(struct drm_device *dev,
> -				     struct intel_ringbuffer *ringbuf);
> -void intel_unpin_ringbuffer_obj(struct intel_ringbuffer *ringbuf);
>  void intel_ringbuffer_free(struct intel_ringbuffer *ring);
>  
>  void intel_stop_ring_buffer(struct intel_engine_cs *ring);
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-07 16:03   ` Daniel Vetter
@ 2015-10-07 16:05     ` Chris Wilson
  0 siblings, 0 replies; 24+ messages in thread
From: Chris Wilson @ 2015-10-07 16:05 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Oct 07, 2015 at 06:03:48PM +0200, Daniel Vetter wrote:
> On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> > There is a desire to simplify the i915 driver by reducing the number of
> > different code paths introduced by the LRC / execlists support.  As the
> > execlists request is now part of the gem request it is possible and
> > desirable to unify the request life-cycles for execlist and legacy
> > requests.
> > 
> > Added a context complete flag to a request which gets set during the
> > context switch interrupt.
> > 
> > Added a function i915_gem_request_retireable().  A request is considered
> > retireable if its seqno passed (i.e. the request has completed) and either
> > it was never submitted to the ELSP or its context completed.  This ensures
> > that context save is carried out before the last request for a context is
> > considered retireable.  retire_requests_ring() now uses
> > i915_gem_request_retireable() rather than request_complete() when deciding
> > which requests to retire. Requests that were not waiting for a context
> > switch interrupt (either as a result of being merged into a following
> > request or by being a legacy request) will be considered retireable as
> > soon as their seqno has passed.
> > 
> > Removed the extra request reference held for the execlist request.
> > 
> > Removed intel_execlists_retire_requests() and all references to
> > intel_engine_cs.execlist_retired_req_list.
> > 
> > Moved context unpinning into retire_requests_ring() for now.  Further work
> > is pending for the context pinning - this patch should allow us to use the
> > active list to track context and ring buffer objects later.
> > 
> > Changed gen8_cs_irq_handler() so that notify_ring() is called when
> > contexts complete as well as when a user interrupt occurs so that
> > notification happens when a request is complete and context save has
> > finished.
> > 
> > v2: Rebase over the read-read optimisation changes
> > 
> > v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> > 
> > v4: Fixed various pin leaks
> > 
> > Issue: VIZ-4277
> > Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  6 +++
> >  drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
> >  drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
> >  drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
> >  drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
> >  6 files changed, 118 insertions(+), 82 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index fbf0ae9..3d217f9 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
> >  	/** Execlists no. of times this request has been sent to the ELSP */
> >  	int elsp_submitted;
> >  
> > +	/**
> > +	 * Execlists: whether this requests's context has completed after
> > +	 * submission to the ELSP
> > +	 */
> > +	bool ctx_complete;
> > +
> >  };
> >  
> >  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 52642af..fc82171 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
> >  				       typeof(*tmp), list);
> >  
> >  		i915_gem_request_retire(tmp);
> > +
> > +		if (i915.enable_execlists) {
> > +			struct intel_context *ctx = tmp->ctx;
> > +			struct drm_i915_private *dev_priv =
> > +				engine->dev->dev_private;
> > +			unsigned long flags;
> > +			struct drm_i915_gem_object *ctx_obj =
> > +				ctx->engine[engine->id].state;
> > +
> > +			spin_lock_irqsave(&engine->execlist_lock, flags);
> > +
> > +			if (ctx_obj && (ctx != engine->default_context))
> > +				intel_lr_context_unpin(tmp);
> > +
> > +			intel_runtime_pm_put(dev_priv);
> > +			spin_unlock_irqrestore(&engine->execlist_lock, flags);
> > +		}
> > +
> >  	} while (tmp != req);

And NAK anyway.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items.
  2015-10-06 14:52 ` [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items Nick Hoath
@ 2015-10-07 16:08   ` Daniel Vetter
  0 siblings, 0 replies; 24+ messages in thread
From: Daniel Vetter @ 2015-10-07 16:08 UTC (permalink / raw)
  To: Nick Hoath; +Cc: intel-gfx

On Tue, Oct 06, 2015 at 03:52:03PM +0100, Nick Hoath wrote:
> Pin the hw ctx mapping so that it is not mapped/unmapped per bb
> when doing GuC submission.
> 
> Issue: VIZ-4277
> Cc: David Gordon <david.s.gordon@intel.com>
> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_debugfs.c | 14 ++++------
>  drivers/gpu/drm/i915/i915_drv.h     |  4 ++-
>  drivers/gpu/drm/i915/intel_lrc.c    | 56 +++++++++++++++++++++++++++----------
>  3 files changed, 50 insertions(+), 24 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_debugfs.c b/drivers/gpu/drm/i915/i915_debugfs.c
> index 3f2a7a7..e68cf5fa 100644
> --- a/drivers/gpu/drm/i915/i915_debugfs.c
> +++ b/drivers/gpu/drm/i915/i915_debugfs.c
> @@ -1970,10 +1970,9 @@ static int i915_context_status(struct seq_file *m, void *unused)
>  
>  static void i915_dump_lrc_obj(struct seq_file *m,
>  			      struct intel_engine_cs *ring,
> -			      struct drm_i915_gem_object *ctx_obj)
> +			      struct drm_i915_gem_object *ctx_obj,
> +			      uint32_t *reg_state)
>  {
> -	struct page *page;
> -	uint32_t *reg_state;
>  	int j;
>  	unsigned long ggtt_offset = 0;
>  
> @@ -1996,17 +1995,13 @@ static void i915_dump_lrc_obj(struct seq_file *m,
>  		return;
>  	}
>  
> -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -	if (!WARN_ON(page == NULL)) {
> -		reg_state = kmap_atomic(page);
> -
> +	if (!WARN_ON(reg_state == NULL)) {
>  		for (j = 0; j < 0x600 / sizeof(u32) / 4; j += 4) {
>  			seq_printf(m, "\t[0x%08lx] 0x%08x 0x%08x 0x%08x 0x%08x\n",
>  				   ggtt_offset + 4096 + (j * 4),
>  				   reg_state[j], reg_state[j + 1],
>  				   reg_state[j + 2], reg_state[j + 3]);
>  		}
> -		kunmap_atomic(reg_state);
>  	}
>  
>  	seq_putc(m, '\n');
> @@ -2034,7 +2029,8 @@ static int i915_dump_lrc(struct seq_file *m, void *unused)
>  		for_each_ring(ring, dev_priv, i) {
>  			if (ring->default_context != ctx)
>  				i915_dump_lrc_obj(m, ring,
> -						  ctx->engine[i].state);
> +						  ctx->engine[i].state,
> +						  ctx->engine[i].reg_state);
>  		}
>  	}
>  
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index d660ee3..b49fd12 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -879,8 +879,10 @@ struct intel_context {
>  	} legacy_hw_ctx;
>  
>  	/* Execlists */
> -	struct {
> +	struct intel_context_engine {
>  		struct drm_i915_gem_object *state;
> +		uint32_t *reg_state;
> +		struct page *page;
>  		struct intel_ringbuffer *ringbuf;
>  		int pin_count;
>  	} engine[I915_NUM_RINGS];
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index b807928..55a4de56 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -360,16 +360,13 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>  	struct i915_hw_ppgtt *ppgtt = rq->ctx->ppgtt;
>  	struct drm_i915_gem_object *ctx_obj = rq->ctx->engine[ring->id].state;
>  	struct drm_i915_gem_object *rb_obj = rq->ringbuf->obj;
> -	struct page *page;
> -	uint32_t *reg_state;
> +	uint32_t *reg_state = rq->ctx->engine[ring->id].reg_state;
>  
>  	BUG_ON(!ctx_obj);
> +	WARN_ON(!reg_state);
>  	WARN_ON(!i915_gem_obj_is_pinned(ctx_obj));
>  	WARN_ON(!i915_gem_obj_is_pinned(rb_obj));
>  
> -	page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -	reg_state = kmap_atomic(page);
> -
>  	reg_state[CTX_RING_TAIL+1] = rq->tail;
>  	reg_state[CTX_RING_BUFFER_START+1] = i915_gem_obj_ggtt_offset(rb_obj);
>  
> @@ -385,8 +382,6 @@ static int execlists_update_context(struct drm_i915_gem_request *rq)
>  		ASSIGN_CTX_PDP(ppgtt, reg_state, 0);
>  	}
>  
> -	kunmap_atomic(reg_state);
> -
>  	return 0;
>  }
>  
> @@ -1004,7 +999,31 @@ int logical_ring_flush_all_caches(struct drm_i915_gem_request *req)
>  	return 0;
>  }
>  
> -static int intel_lr_context_do_pin(struct intel_engine_cs *ring,
> +static int intel_mmap_hw_context(struct drm_i915_gem_object *obj,
> +		bool unmap)
> +{
> +	int ret = 0;
> +	struct intel_context_engine *ice =
> +			(struct intel_context_engine *)obj->mappable;
> +	struct page *page;
> +	uint32_t *reg_state;
> +
> +	if (unmap) {
> +		kunmap(ice->page);
> +		ice->reg_state = NULL;
> +		ice->page = NULL;
> +	} else {
> +		page = i915_gem_object_get_page(obj, LRC_STATE_PN);
> +		reg_state = kmap(page);
> +		ice->reg_state = reg_state;
> +		ice->page = page;
> +	}
> +	return ret;
> +}
> +
> +static int intel_lr_context_do_pin(
> +		struct intel_context *ctx,
> +		struct intel_engine_cs *ring,
>  		struct drm_i915_gem_object *ctx_obj,
>  		struct intel_ringbuffer *ringbuf)
>  {
> @@ -1051,7 +1070,7 @@ static int intel_lr_context_pin(struct drm_i915_gem_request *rq)
>  	struct intel_ringbuffer *ringbuf = rq->ringbuf;
>  
>  	if (rq->ctx->engine[ring->id].pin_count++ == 0) {
> -		ret = intel_lr_context_do_pin(ring, ctx_obj, ringbuf);
> +		ret = intel_lr_context_do_pin(rq->ctx, ring, ctx_obj, ringbuf);
>  		if (ret)
>  			goto reset_pin_count;
>  	}
> @@ -1915,6 +1934,8 @@ void intel_logical_ring_cleanup(struct intel_engine_cs *ring)
>  
>  static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *ring)
>  {
> +	struct page *page;
> +	uint32_t *reg_state;
>  	int ret;
>  
>  	/* Intentionally left blank. */
> @@ -1939,6 +1960,7 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  
>  	/* As this is the default context, always pin it */
>  	ret = intel_lr_context_do_pin(
> +			ring->default_context,
>  			ring,
>  			ring->default_context->engine[ring->id].state,
>  			ring->default_context->engine[ring->id].ringbuf);
> @@ -1949,6 +1971,13 @@ static int logical_ring_init(struct drm_device *dev, struct intel_engine_cs *rin
>  		return ret;
>  	}
>  
> +	page = i915_gem_object_get_page(
> +			ring->default_context->engine[ring->id].state,
> +			LRC_STATE_PN);
> +	reg_state = kmap(page);
> +	ring->default_context->engine[ring->id].reg_state = reg_state;
> +	ring->default_context->engine[ring->id].page = page;
> +
>  	return ret;
>  }
>  
> @@ -2388,6 +2417,7 @@ void intel_lr_context_free(struct intel_context *ctx)
>  			struct intel_engine_cs *ring = ringbuf->ring;
>  
>  			if (ctx == ring->default_context) {
> +				kunmap(ctx->engine[ring->id].page);

This shouldn't be necessary, the generic vma unbind hook should take care
of this like for other objects too. Or did I miss something?
-Daniel

>  				i915_gem_object_ggtt_unpin(ringbuf->obj);
>  				i915_gem_object_ggtt_unpin(ctx_obj);
>  			}
> @@ -2489,6 +2519,8 @@ int intel_lr_context_deferred_alloc(struct intel_context *ctx,
>  		goto error_ringbuf;
>  	}
>  
> +	ctx_obj->mmap = intel_mmap_hw_context;
> +	ctx_obj->mappable = &(ctx->engine[ring->id]);
>  	ctx->engine[ring->id].ringbuf = ringbuf;
>  	ctx->engine[ring->id].state = ctx_obj;
>  
> @@ -2536,7 +2568,6 @@ void intel_lr_context_reset(struct drm_device *dev,
>  		struct intel_ringbuffer *ringbuf =
>  				ctx->engine[ring->id].ringbuf;
>  		uint32_t *reg_state;
> -		struct page *page;
>  
>  		if (!ctx_obj)
>  			continue;
> @@ -2545,14 +2576,11 @@ void intel_lr_context_reset(struct drm_device *dev,
>  			WARN(1, "Failed get_pages for context obj\n");
>  			continue;
>  		}
> -		page = i915_gem_object_get_page(ctx_obj, LRC_STATE_PN);
> -		reg_state = kmap_atomic(page);
> +		reg_state = ctx->engine[ring->id].reg_state;
>  
>  		reg_state[CTX_RING_HEAD+1] = 0;
>  		reg_state[CTX_RING_TAIL+1] = 0;
>  
> -		kunmap_atomic(reg_state);
> -
>  		ringbuf->head = 0;
>  		ringbuf->tail = 0;
>  
> -- 
> 1.9.1
> 
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
  2015-10-07 16:03   ` Daniel Vetter
@ 2015-10-08 12:32   ` Chris Wilson
  2015-10-09  7:58     ` Daniel Vetter
  1 sibling, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2015-10-08 12:32 UTC (permalink / raw)
  To: Nick Hoath; +Cc: intel-gfx

On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> There is a desire to simplify the i915 driver by reducing the number of
> different code paths introduced by the LRC / execlists support.  As the
> execlists request is now part of the gem request it is possible and
> desirable to unify the request life-cycles for execlist and legacy
> requests.
> 
> Added a context complete flag to a request which gets set during the
> context switch interrupt.

Wrong approach. Legacy uses the request itself as the indicator for when
the context is complete, if you were to take the same approach for LRC
we would not need to add yet more execlist specific state.

Lines of code is reduced by keeping the GEM semantics the same and just
having execlists hold an indepedent ref on the request for its async
submission approach.
 
> Added a function i915_gem_request_retireable().  A request is considered
> retireable if its seqno passed (i.e. the request has completed) and either
> it was never submitted to the ELSP or its context completed.  This ensures
> that context save is carried out before the last request for a context is
> considered retireable.  retire_requests_ring() now uses
> i915_gem_request_retireable() rather than request_complete() when deciding
> which requests to retire. Requests that were not waiting for a context
> switch interrupt (either as a result of being merged into a following
> request or by being a legacy request) will be considered retireable as
> soon as their seqno has passed.
> 
> Removed the extra request reference held for the execlist request.
> 
> Removed intel_execlists_retire_requests() and all references to
> intel_engine_cs.execlist_retired_req_list.
> 
> Moved context unpinning into retire_requests_ring() for now.  Further work
> is pending for the context pinning - this patch should allow us to use the
> active list to track context and ring buffer objects later.
> 
> Changed gen8_cs_irq_handler() so that notify_ring() is called when
> contexts complete as well as when a user interrupt occurs so that
> notification happens when a request is complete and context save has
> finished.
> 
> v2: Rebase over the read-read optimisation changes
> 
> v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> 
> v4: Fixed various pin leaks
> 
> Issue: VIZ-4277
> Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> ---
>  drivers/gpu/drm/i915/i915_drv.h         |  6 +++
>  drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
>  drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
>  drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
>  drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
>  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
>  6 files changed, 118 insertions(+), 82 deletions(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index fbf0ae9..3d217f9 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
>  	/** Execlists no. of times this request has been sent to the ELSP */
>  	int elsp_submitted;
>  
> +	/**
> +	 * Execlists: whether this requests's context has completed after
> +	 * submission to the ELSP
> +	 */
> +	bool ctx_complete;
> +
>  };
>  
>  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index 52642af..fc82171 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
>  				       typeof(*tmp), list);
>  
>  		i915_gem_request_retire(tmp);
> +
> +		if (i915.enable_execlists) {
> +			struct intel_context *ctx = tmp->ctx;
> +			struct drm_i915_private *dev_priv =
> +				engine->dev->dev_private;
> +			unsigned long flags;
> +			struct drm_i915_gem_object *ctx_obj =
> +				ctx->engine[engine->id].state;
> +
> +			spin_lock_irqsave(&engine->execlist_lock, flags);
> +
> +			if (ctx_obj && (ctx != engine->default_context))
> +				intel_lr_context_unpin(tmp);
> +
> +			intel_runtime_pm_put(dev_priv);
> +			spin_unlock_irqrestore(&engine->execlist_lock, flags);
> +		}

Why here? Surely you intended this to be called by
i915_gem_request_retire(). The runtime pm interaction is wrong, that is
handled by GPU busyness tracking. But by doing this at this juncture you
now increase the frequency at which we have to recreate the iomapping,
most noticeable on bsw+ and take more spinlocks unnecessarily.

Also since you no longer do reference tracking for the
execlists_queue, tmp has just been freed.

>  	} while (tmp != req);
>  
>  	WARN_ON(i915_verify_lists(engine->dev));
> @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
>  	list_move_tail(&vma->mm_list, &vma->vm->active_list);
>  }
>  
> +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> +{
> +	return (i915_gem_request_completed(req, true) &&
> +		(!req->elsp_submitted || req->ctx_complete));

I disagree with this completely. A request must only be considered complete
when it's seqno is passed. The context should be tracking the request
and not the other way around (like legacy does).

The approach is just wrong. The lifecycle of the request is not the
bloat in execlists, and keeping an extra reference on the request struct
itself them until execlists is able to reap them amoritizes the cost of
the spinlock and atomic operations.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects
  2015-10-07 16:05   ` Daniel Vetter
@ 2015-10-08 13:35     ` Chris Wilson
  2015-10-16 14:42       ` Nick Hoath
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2015-10-08 13:35 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Wed, Oct 07, 2015 at 06:05:46PM +0200, Daniel Vetter wrote:
> On Tue, Oct 06, 2015 at 03:52:02PM +0100, Nick Hoath wrote:
> > Shovel all context related objects through the active queue and obj
> > management.
> > 
> > - Added callback in vma_(un)bind to add CPU (un)mapping at same time
> >   if desired
> > - Inserted LRC hw context & ringbuf to vma active list
> > 
> > Issue: VIZ-4277
> > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  4 ++
> >  drivers/gpu/drm/i915/i915_gem.c         |  3 ++
> >  drivers/gpu/drm/i915/i915_gem_gtt.c     |  8 ++++
> >  drivers/gpu/drm/i915/intel_lrc.c        | 28 +++++++++++--
> >  drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++++++++++++++++++---------------
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 --
> >  6 files changed, 79 insertions(+), 38 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index 3d217f9..d660ee3 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2169,6 +2169,10 @@ struct drm_i915_gem_object {
> >  			struct work_struct *work;
> >  		} userptr;
> >  	};
> > +
> > +	/** Support for automatic CPU side mapping of object */
> > +	int (*mmap)(struct drm_i915_gem_object *obj, bool unmap);
> 
> I don't think we need a map hook, that can still be done (if not done so
> already) by the callers. Also it's better to rename this to vma_unbind
> (and it should be at the vma level I think) since there's other potential
> users. So explicit maping, lazy unmapping for the kmaps we need. That's
> the same design we're using for binding objects into gpu address spaces.
> 
> Also Chris Wilson has something similar, please align with him on the
> precise design of this callback.

We need the unbind hook because of the movement in the first patch (it
is a separate issue, the code should work without it albeit having to
remap the ring/context state more often). The changelog in this patch
simply explains the i915_vma_move_to_active() additions. But to get the
shrink accurate we do need the context unpin on retirement and to do the
pin_count check in i915_vma_unbind() after waiting (rather than before,
as we currently do). However, the eviction code will not inspect the
active contexts objects yet (as it will continue to skip over the
ggtt->pin_count on them). The way I allowed ctx objects to be evicted was
to only keep the ctx->state pinned for the duration of the request
construction.

Note that I think it should be a vma->unbind hook not an object level
one (it is i915_vma_unbind, without only a modicum of object level state
being modified in that function).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-08 12:32   ` Chris Wilson
@ 2015-10-09  7:58     ` Daniel Vetter
  2015-10-09  8:36       ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-09  7:58 UTC (permalink / raw)
  To: Chris Wilson, Nick Hoath, intel-gfx

On Thu, Oct 08, 2015 at 01:32:07PM +0100, Chris Wilson wrote:
> On Tue, Oct 06, 2015 at 03:52:01PM +0100, Nick Hoath wrote:
> > There is a desire to simplify the i915 driver by reducing the number of
> > different code paths introduced by the LRC / execlists support.  As the
> > execlists request is now part of the gem request it is possible and
> > desirable to unify the request life-cycles for execlist and legacy
> > requests.
> > 
> > Added a context complete flag to a request which gets set during the
> > context switch interrupt.
> 
> Wrong approach. Legacy uses the request itself as the indicator for when
> the context is complete, if you were to take the same approach for LRC
> we would not need to add yet more execlist specific state.
> 
> Lines of code is reduced by keeping the GEM semantics the same and just
> having execlists hold an indepedent ref on the request for its async
> submission approach.
>  
> > Added a function i915_gem_request_retireable().  A request is considered
> > retireable if its seqno passed (i.e. the request has completed) and either
> > it was never submitted to the ELSP or its context completed.  This ensures
> > that context save is carried out before the last request for a context is
> > considered retireable.  retire_requests_ring() now uses
> > i915_gem_request_retireable() rather than request_complete() when deciding
> > which requests to retire. Requests that were not waiting for a context
> > switch interrupt (either as a result of being merged into a following
> > request or by being a legacy request) will be considered retireable as
> > soon as their seqno has passed.
> > 
> > Removed the extra request reference held for the execlist request.
> > 
> > Removed intel_execlists_retire_requests() and all references to
> > intel_engine_cs.execlist_retired_req_list.
> > 
> > Moved context unpinning into retire_requests_ring() for now.  Further work
> > is pending for the context pinning - this patch should allow us to use the
> > active list to track context and ring buffer objects later.
> > 
> > Changed gen8_cs_irq_handler() so that notify_ring() is called when
> > contexts complete as well as when a user interrupt occurs so that
> > notification happens when a request is complete and context save has
> > finished.
> > 
> > v2: Rebase over the read-read optimisation changes
> > 
> > v3: Reworked IRQ handler after removing IRQ handler cleanup patch
> > 
> > v4: Fixed various pin leaks
> > 
> > Issue: VIZ-4277
> > Signed-off-by: Thomas Daniel <thomas.daniel@intel.com>
> > Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> > ---
> >  drivers/gpu/drm/i915/i915_drv.h         |  6 +++
> >  drivers/gpu/drm/i915/i915_gem.c         | 67 +++++++++++++++++++++------
> >  drivers/gpu/drm/i915/i915_irq.c         | 81 +++++++++++++++++++++------------
> >  drivers/gpu/drm/i915/intel_lrc.c        | 43 +++--------------
> >  drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
> >  drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
> >  6 files changed, 118 insertions(+), 82 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> > index fbf0ae9..3d217f9 100644
> > --- a/drivers/gpu/drm/i915/i915_drv.h
> > +++ b/drivers/gpu/drm/i915/i915_drv.h
> > @@ -2262,6 +2262,12 @@ struct drm_i915_gem_request {
> >  	/** Execlists no. of times this request has been sent to the ELSP */
> >  	int elsp_submitted;
> >  
> > +	/**
> > +	 * Execlists: whether this requests's context has completed after
> > +	 * submission to the ELSP
> > +	 */
> > +	bool ctx_complete;
> > +
> >  };
> >  
> >  int i915_gem_request_alloc(struct intel_engine_cs *ring,
> > diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> > index 52642af..fc82171 100644
> > --- a/drivers/gpu/drm/i915/i915_gem.c
> > +++ b/drivers/gpu/drm/i915/i915_gem.c
> > @@ -1386,6 +1386,24 @@ __i915_gem_request_retire__upto(struct drm_i915_gem_request *req)
> >  				       typeof(*tmp), list);
> >  
> >  		i915_gem_request_retire(tmp);
> > +
> > +		if (i915.enable_execlists) {
> > +			struct intel_context *ctx = tmp->ctx;
> > +			struct drm_i915_private *dev_priv =
> > +				engine->dev->dev_private;
> > +			unsigned long flags;
> > +			struct drm_i915_gem_object *ctx_obj =
> > +				ctx->engine[engine->id].state;
> > +
> > +			spin_lock_irqsave(&engine->execlist_lock, flags);
> > +
> > +			if (ctx_obj && (ctx != engine->default_context))
> > +				intel_lr_context_unpin(tmp);
> > +
> > +			intel_runtime_pm_put(dev_priv);
> > +			spin_unlock_irqrestore(&engine->execlist_lock, flags);
> > +		}
> 
> Why here? Surely you intended this to be called by
> i915_gem_request_retire(). The runtime pm interaction is wrong, that is
> handled by GPU busyness tracking. But by doing this at this juncture you
> now increase the frequency at which we have to recreate the iomapping,
> most noticeable on bsw+ and take more spinlocks unnecessarily.
> 
> Also since you no longer do reference tracking for the
> execlists_queue, tmp has just been freed.
> 
> >  	} while (tmp != req);
> >  
> >  	WARN_ON(i915_verify_lists(engine->dev));
> > @@ -2359,6 +2377,12 @@ void i915_vma_move_to_active(struct i915_vma *vma,
> >  	list_move_tail(&vma->mm_list, &vma->vm->active_list);
> >  }
> >  
> > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> > +{
> > +	return (i915_gem_request_completed(req, true) &&
> > +		(!req->elsp_submitted || req->ctx_complete));
> 
> I disagree with this completely. A request must only be considered complete
> when it's seqno is passed. The context should be tracking the request
> and not the other way around (like legacy does).
> 
> The approach is just wrong. The lifecycle of the request is not the
> bloat in execlists, and keeping an extra reference on the request struct
> itself them until execlists is able to reap them amoritizes the cost of
> the spinlock and atomic operations.

Hm, that's slightly different approach than what I suggested with just
creating special ctx-switch requests (since with the scheduler we'll
reorder so can't do the same trick as with legacy ctx switching any more).
This trick here otoh extends the lifetime of a request until it's kicked
out, so that the shrinker doesn't it the request object too early.

How does holding an additional ref on the request prevent this? I mean I
don't see how that would prevent special casing in the eviction code ...
Or is your plan to put the final waiting for the request the ctx to idle
into the ->vma_unbind hook? That seems again a bit a quirk ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-09  7:58     ` Daniel Vetter
@ 2015-10-09  8:36       ` Chris Wilson
  2015-10-09  9:15         ` Daniel Vetter
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2015-10-09  8:36 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Oct 09, 2015 at 09:58:51AM +0200, Daniel Vetter wrote:
> > > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> > > +{
> > > +	return (i915_gem_request_completed(req, true) &&
> > > +		(!req->elsp_submitted || req->ctx_complete));
> > 
> > I disagree with this completely. A request must only be considered complete
> > when it's seqno is passed. The context should be tracking the request
> > and not the other way around (like legacy does).
> > 
> > The approach is just wrong. The lifecycle of the request is not the
> > bloat in execlists, and keeping an extra reference on the request struct
> > itself them until execlists is able to reap them amoritizes the cost of
> > the spinlock and atomic operations.
> 
> Hm, that's slightly different approach than what I suggested with just
> creating special ctx-switch requests (since with the scheduler we'll
> reorder so can't do the same trick as with legacy ctx switching any more).
> This trick here otoh extends the lifetime of a request until it's kicked
> out, so that the shrinker doesn't it the request object too early.

Note what you are considering here is how to use requests to manage
contexts. execlists is different enough from legacy in how contexts are
only active for as long as the request is, that we don't need anything
more than the current requests to manage their active cycles + lifetimes.

> How does holding an additional ref on the request prevent this? I mean I
> don't see how that would prevent special casing in the eviction code ...
> Or is your plan to put the final waiting for the request the ctx to idle
> into the ->vma_unbind hook? That seems again a bit a quirk ...

The reference is simply that the submission is asynchronous and not
under the purview of GEM, i.e. execlists has a list of requests with
which to submit and so needs a reference - especially as the ordering
between request completion and the interrupt is not guaranteed, and the
requests can be completed asynchronously. The reference is not to
prevent retiring, but simply to keep the request struct alive until
removed from the execlists.  Similarly for anything more exotic than
the FIFO scheduler.

The basic idea I have here is simply that requests are complete when the
GPU finishes executing them, that is when we retire them from the GEM lists.
I want to keep it that simple as then we can build upon requests for
tracking GPU activity at an abstract GEM level. So context lifecycles
depend on requests and not the other way around (which is how we do it
for legacy, pin until switch (whilst the hw references the ctx) then active
until the switch away is complete.)) I think underpining this is we do
have to keep the idea of GPU activity (requests) and object life cycles
as two seperate but interelated ideas/mechanisms.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-09  8:36       ` Chris Wilson
@ 2015-10-09  9:15         ` Daniel Vetter
  2015-10-09  9:45           ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-09  9:15 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Nick Hoath, intel-gfx

On Fri, Oct 09, 2015 at 09:36:58AM +0100, Chris Wilson wrote:
> On Fri, Oct 09, 2015 at 09:58:51AM +0200, Daniel Vetter wrote:
> > > > +static bool i915_gem_request_retireable(struct drm_i915_gem_request *req)
> > > > +{
> > > > +	return (i915_gem_request_completed(req, true) &&
> > > > +		(!req->elsp_submitted || req->ctx_complete));
> > > 
> > > I disagree with this completely. A request must only be considered complete
> > > when it's seqno is passed. The context should be tracking the request
> > > and not the other way around (like legacy does).
> > > 
> > > The approach is just wrong. The lifecycle of the request is not the
> > > bloat in execlists, and keeping an extra reference on the request struct
> > > itself them until execlists is able to reap them amoritizes the cost of
> > > the spinlock and atomic operations.
> > 
> > Hm, that's slightly different approach than what I suggested with just
> > creating special ctx-switch requests (since with the scheduler we'll
> > reorder so can't do the same trick as with legacy ctx switching any more).
> > This trick here otoh extends the lifetime of a request until it's kicked
> > out, so that the shrinker doesn't it the request object too early.
> 
> Note what you are considering here is how to use requests to manage
> contexts. execlists is different enough from legacy in how contexts are
> only active for as long as the request is, that we don't need anything
> more than the current requests to manage their active cycles + lifetimes.
> 
> > How does holding an additional ref on the request prevent this? I mean I
> > don't see how that would prevent special casing in the eviction code ...
> > Or is your plan to put the final waiting for the request the ctx to idle
> > into the ->vma_unbind hook? That seems again a bit a quirk ...
> 
> The reference is simply that the submission is asynchronous and not
> under the purview of GEM, i.e. execlists has a list of requests with
> which to submit and so needs a reference - especially as the ordering
> between request completion and the interrupt is not guaranteed, and the
> requests can be completed asynchronously. The reference is not to
> prevent retiring, but simply to keep the request struct alive until
> removed from the execlists.  Similarly for anything more exotic than
> the FIFO scheduler.

Hm yeah I thought we've had that already ... at least there's an
i915_gem_request_unreference in intel_execlists_retire_requests, separate
from the core request retiring.

> The basic idea I have here is simply that requests are complete when the
> GPU finishes executing them, that is when we retire them from the GEM lists.
> I want to keep it that simple as then we can build upon requests for
> tracking GPU activity at an abstract GEM level. So context lifecycles
> depend on requests and not the other way around (which is how we do it
> for legacy, pin until switch (whilst the hw references the ctx) then active
> until the switch away is complete.)) I think underpining this is we do
> have to keep the idea of GPU activity (requests) and object life cycles
> as two seperate but interelated ideas/mechanisms.

Yup agreed on that concept. The difference is in how exactly we track when
the context is no longer in use by the hardware. With execlist we have

1. MI_BB_START
2. seqno write/irq, which completes the requests that we currently have.
3. hw/guc retires the hw context and signals that with another, dedicated
irq (CTX_SWITCH_IRQ).

Nick's approach is to dealy the request completion with ctx_completed from
2 to 3, but open-coded in places (which breaks the shrinker).

You seem to suggest something similar, but I'm not entirely clear.

My idea was to create a new request for 3. which gets signalled by the
scheduler in intel_lrc_irq_handler. My idea was that we'd only create
these when a ctx switch might occur to avoid overhead, but I guess if we
just outright delay all requests a notch if need that might work too. But
I'm really not sure on the implications of that (i.e. does the hardware
really unlod the ctx if it's idle?), and whether that would fly still with
the scheduler.

But figuring this one out here seems to be the cornestone of this reorg.
Without it we can't just throw contexts onto the active list.
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-09  9:15         ` Daniel Vetter
@ 2015-10-09  9:45           ` Chris Wilson
  2015-10-09 17:18             ` Daniel Vetter
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2015-10-09  9:45 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
> My idea was to create a new request for 3. which gets signalled by the
> scheduler in intel_lrc_irq_handler. My idea was that we'd only create
> these when a ctx switch might occur to avoid overhead, but I guess if we
> just outright delay all requests a notch if need that might work too. But
> I'm really not sure on the implications of that (i.e. does the hardware
> really unlod the ctx if it's idle?), and whether that would fly still with
> the scheduler.
>
> But figuring this one out here seems to be the cornestone of this reorg.
> Without it we can't just throw contexts onto the active list.

(Let me see if I understand it correctly)

Basically the problem is that we can't trust the context object to be
synchronized until after the status interrupt. The way we handled that
for legacy is to track the currently bound context and keep the
vma->pin_count asserted until the request containing the switch away.
Doing the same for execlists would trivially fix the issue and if done
smartly allows us to share more code (been there, done that).

That satisfies me for keeping requests as a basic fence in the GPU
timeline and should keep everyone happy that the context can't vanish
until after it is complete. The only caveat is that we cannot evict the
most recent context. For legacy, we do a switch back to the always
pinned default context. For execlists we don't, but it still means we
should only have one context which cannot be evicted (like legacy). But
it does leave us with the issue that i915_gpu_idle() returns early and
i915_gem_context_fini() must keep the explicit gpu reset to be
absolutely sure that the pending context writes are completed before the
final context is unbound.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-09  9:45           ` Chris Wilson
@ 2015-10-09 17:18             ` Daniel Vetter
  2015-10-09 17:23               ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-09 17:18 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Nick Hoath, intel-gfx

On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
> On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
> > My idea was to create a new request for 3. which gets signalled by the
> > scheduler in intel_lrc_irq_handler. My idea was that we'd only create
> > these when a ctx switch might occur to avoid overhead, but I guess if we
> > just outright delay all requests a notch if need that might work too. But
> > I'm really not sure on the implications of that (i.e. does the hardware
> > really unlod the ctx if it's idle?), and whether that would fly still with
> > the scheduler.
> >
> > But figuring this one out here seems to be the cornestone of this reorg.
> > Without it we can't just throw contexts onto the active list.
> 
> (Let me see if I understand it correctly)
> 
> Basically the problem is that we can't trust the context object to be
> synchronized until after the status interrupt. The way we handled that
> for legacy is to track the currently bound context and keep the
> vma->pin_count asserted until the request containing the switch away.
> Doing the same for execlists would trivially fix the issue and if done
> smartly allows us to share more code (been there, done that).
> 
> That satisfies me for keeping requests as a basic fence in the GPU
> timeline and should keep everyone happy that the context can't vanish
> until after it is complete. The only caveat is that we cannot evict the
> most recent context. For legacy, we do a switch back to the always
> pinned default context. For execlists we don't, but it still means we
> should only have one context which cannot be evicted (like legacy). But
> it does leave us with the issue that i915_gpu_idle() returns early and
> i915_gem_context_fini() must keep the explicit gpu reset to be
> absolutely sure that the pending context writes are completed before the
> final context is unbound.

Yes, and that was what I originally had in mind. Meanwhile the scheduler
(will) happen and that means we won't have FIFO ordering. Which means when
we switch contexts (as opposed to just adding more to the ringbuffer of
the current one) we won't have any idea which context will be the next
one. Which also means we don't know which request to pick to retire the
old context. Hence why I think we need to be better.

Of course we can first implement the legacy ctx scheme and then let John
Harrison deal with the mess again, but he might not like that too much ;-)

The other upside of tracking the real ctx-no-longer-in-use with the ctx
itself is that we don't need to pin anything ever (I think), at least
conceptually. But decidedly less sure about that ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-09 17:18             ` Daniel Vetter
@ 2015-10-09 17:23               ` Chris Wilson
  2015-10-13 11:29                 ` Daniel Vetter
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2015-10-09 17:23 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
> On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
> > On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
> > > My idea was to create a new request for 3. which gets signalled by the
> > > scheduler in intel_lrc_irq_handler. My idea was that we'd only create
> > > these when a ctx switch might occur to avoid overhead, but I guess if we
> > > just outright delay all requests a notch if need that might work too. But
> > > I'm really not sure on the implications of that (i.e. does the hardware
> > > really unlod the ctx if it's idle?), and whether that would fly still with
> > > the scheduler.
> > >
> > > But figuring this one out here seems to be the cornestone of this reorg.
> > > Without it we can't just throw contexts onto the active list.
> > 
> > (Let me see if I understand it correctly)
> > 
> > Basically the problem is that we can't trust the context object to be
> > synchronized until after the status interrupt. The way we handled that
> > for legacy is to track the currently bound context and keep the
> > vma->pin_count asserted until the request containing the switch away.
> > Doing the same for execlists would trivially fix the issue and if done
> > smartly allows us to share more code (been there, done that).
> > 
> > That satisfies me for keeping requests as a basic fence in the GPU
> > timeline and should keep everyone happy that the context can't vanish
> > until after it is complete. The only caveat is that we cannot evict the
> > most recent context. For legacy, we do a switch back to the always
> > pinned default context. For execlists we don't, but it still means we
> > should only have one context which cannot be evicted (like legacy). But
> > it does leave us with the issue that i915_gpu_idle() returns early and
> > i915_gem_context_fini() must keep the explicit gpu reset to be
> > absolutely sure that the pending context writes are completed before the
> > final context is unbound.
> 
> Yes, and that was what I originally had in mind. Meanwhile the scheduler
> (will) happen and that means we won't have FIFO ordering. Which means when
> we switch contexts (as opposed to just adding more to the ringbuffer of
> the current one) we won't have any idea which context will be the next
> one. Which also means we don't know which request to pick to retire the
> old context. Hence why I think we need to be better.

But the scheduler does - it is also in charge of making sure the
retirement queue is in order. The essence is that we only actually pin
engine->last_context, which is chosen as we submit stuff to the hw.
 
> Of course we can first implement the legacy ctx scheme and then let John
> Harrison deal with the mess again, but he might not like that too much ;-)
> 
> The other upside of tracking the real ctx-no-longer-in-use with the ctx
> itself is that we don't need to pin anything ever (I think), at least
> conceptually. But decidedly less sure about that ...

Right. There's still the reservation phase, but after that the pin just
tracks the hw.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-09 17:23               ` Chris Wilson
@ 2015-10-13 11:29                 ` Daniel Vetter
  2015-10-13 11:36                   ` Chris Wilson
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-13 11:29 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter, Nick Hoath, intel-gfx

On Fri, Oct 09, 2015 at 06:23:50PM +0100, Chris Wilson wrote:
> On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
> > On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
> > > On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
> > > > My idea was to create a new request for 3. which gets signalled by the
> > > > scheduler in intel_lrc_irq_handler. My idea was that we'd only create
> > > > these when a ctx switch might occur to avoid overhead, but I guess if we
> > > > just outright delay all requests a notch if need that might work too. But
> > > > I'm really not sure on the implications of that (i.e. does the hardware
> > > > really unlod the ctx if it's idle?), and whether that would fly still with
> > > > the scheduler.
> > > >
> > > > But figuring this one out here seems to be the cornestone of this reorg.
> > > > Without it we can't just throw contexts onto the active list.
> > > 
> > > (Let me see if I understand it correctly)
> > > 
> > > Basically the problem is that we can't trust the context object to be
> > > synchronized until after the status interrupt. The way we handled that
> > > for legacy is to track the currently bound context and keep the
> > > vma->pin_count asserted until the request containing the switch away.
> > > Doing the same for execlists would trivially fix the issue and if done
> > > smartly allows us to share more code (been there, done that).
> > > 
> > > That satisfies me for keeping requests as a basic fence in the GPU
> > > timeline and should keep everyone happy that the context can't vanish
> > > until after it is complete. The only caveat is that we cannot evict the
> > > most recent context. For legacy, we do a switch back to the always
> > > pinned default context. For execlists we don't, but it still means we
> > > should only have one context which cannot be evicted (like legacy). But
> > > it does leave us with the issue that i915_gpu_idle() returns early and
> > > i915_gem_context_fini() must keep the explicit gpu reset to be
> > > absolutely sure that the pending context writes are completed before the
> > > final context is unbound.
> > 
> > Yes, and that was what I originally had in mind. Meanwhile the scheduler
> > (will) happen and that means we won't have FIFO ordering. Which means when
> > we switch contexts (as opposed to just adding more to the ringbuffer of
> > the current one) we won't have any idea which context will be the next
> > one. Which also means we don't know which request to pick to retire the
> > old context. Hence why I think we need to be better.
> 
> But the scheduler does - it is also in charge of making sure the
> retirement queue is in order. The essence is that we only actually pin
> engine->last_context, which is chosen as we submit stuff to the hw.

Well I'm not sure how much it will reorder, but I'd expect it wants to
reorder stuff pretty freely. And as soon as it reorders context (ofc they
can't depend on each another) then the legacy hw ctx tracking won't work.

I think at least ...
-Daniel
-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-13 11:29                 ` Daniel Vetter
@ 2015-10-13 11:36                   ` Chris Wilson
  2015-10-14 14:42                     ` Dave Gordon
  0 siblings, 1 reply; 24+ messages in thread
From: Chris Wilson @ 2015-10-13 11:36 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On Tue, Oct 13, 2015 at 01:29:56PM +0200, Daniel Vetter wrote:
> On Fri, Oct 09, 2015 at 06:23:50PM +0100, Chris Wilson wrote:
> > On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
> > > On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
> > > > On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
> > > > > My idea was to create a new request for 3. which gets signalled by the
> > > > > scheduler in intel_lrc_irq_handler. My idea was that we'd only create
> > > > > these when a ctx switch might occur to avoid overhead, but I guess if we
> > > > > just outright delay all requests a notch if need that might work too. But
> > > > > I'm really not sure on the implications of that (i.e. does the hardware
> > > > > really unlod the ctx if it's idle?), and whether that would fly still with
> > > > > the scheduler.
> > > > >
> > > > > But figuring this one out here seems to be the cornestone of this reorg.
> > > > > Without it we can't just throw contexts onto the active list.
> > > > 
> > > > (Let me see if I understand it correctly)
> > > > 
> > > > Basically the problem is that we can't trust the context object to be
> > > > synchronized until after the status interrupt. The way we handled that
> > > > for legacy is to track the currently bound context and keep the
> > > > vma->pin_count asserted until the request containing the switch away.
> > > > Doing the same for execlists would trivially fix the issue and if done
> > > > smartly allows us to share more code (been there, done that).
> > > > 
> > > > That satisfies me for keeping requests as a basic fence in the GPU
> > > > timeline and should keep everyone happy that the context can't vanish
> > > > until after it is complete. The only caveat is that we cannot evict the
> > > > most recent context. For legacy, we do a switch back to the always
> > > > pinned default context. For execlists we don't, but it still means we
> > > > should only have one context which cannot be evicted (like legacy). But
> > > > it does leave us with the issue that i915_gpu_idle() returns early and
> > > > i915_gem_context_fini() must keep the explicit gpu reset to be
> > > > absolutely sure that the pending context writes are completed before the
> > > > final context is unbound.
> > > 
> > > Yes, and that was what I originally had in mind. Meanwhile the scheduler
> > > (will) happen and that means we won't have FIFO ordering. Which means when
> > > we switch contexts (as opposed to just adding more to the ringbuffer of
> > > the current one) we won't have any idea which context will be the next
> > > one. Which also means we don't know which request to pick to retire the
> > > old context. Hence why I think we need to be better.
> > 
> > But the scheduler does - it is also in charge of making sure the
> > retirement queue is in order. The essence is that we only actually pin
> > engine->last_context, which is chosen as we submit stuff to the hw.
> 
> Well I'm not sure how much it will reorder, but I'd expect it wants to
> reorder stuff pretty freely. And as soon as it reorders context (ofc they
> can't depend on each another) then the legacy hw ctx tracking won't work.
> 
> I think at least ...

Not the way it is written today, but the principle behind it still
stands. The last_context submitted to the hardware is pinned until a new
one is submitted (such that it remains bound in the GGTT until after the
context switch is complete due to the active reference). Instead of
doing the context tracking at the start of the execbuffer, the context
tracking needs to be pushed down to the submission backend/middleman.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-13 11:36                   ` Chris Wilson
@ 2015-10-14 14:42                     ` Dave Gordon
  2015-10-14 16:19                       ` Nick Hoath
  0 siblings, 1 reply; 24+ messages in thread
From: Dave Gordon @ 2015-10-14 14:42 UTC (permalink / raw)
  To: intel-gfx

On 13/10/15 12:36, Chris Wilson wrote:
> On Tue, Oct 13, 2015 at 01:29:56PM +0200, Daniel Vetter wrote:
>> On Fri, Oct 09, 2015 at 06:23:50PM +0100, Chris Wilson wrote:
>>> On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
>>>> On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
>>>>> On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
>>>>>> My idea was to create a new request for 3. which gets signalled by the
>>>>>> scheduler in intel_lrc_irq_handler. My idea was that we'd only create
>>>>>> these when a ctx switch might occur to avoid overhead, but I guess if we
>>>>>> just outright delay all requests a notch if need that might work too. But
>>>>>> I'm really not sure on the implications of that (i.e. does the hardware
>>>>>> really unlod the ctx if it's idle?), and whether that would fly still with
>>>>>> the scheduler.
>>>>>>
>>>>>> But figuring this one out here seems to be the cornestone of this reorg.
>>>>>> Without it we can't just throw contexts onto the active list.
>>>>>
>>>>> (Let me see if I understand it correctly)
>>>>>
>>>>> Basically the problem is that we can't trust the context object to be
>>>>> synchronized until after the status interrupt. The way we handled that
>>>>> for legacy is to track the currently bound context and keep the
>>>>> vma->pin_count asserted until the request containing the switch away.
>>>>> Doing the same for execlists would trivially fix the issue and if done
>>>>> smartly allows us to share more code (been there, done that).
>>>>>
>>>>> That satisfies me for keeping requests as a basic fence in the GPU
>>>>> timeline and should keep everyone happy that the context can't vanish
>>>>> until after it is complete. The only caveat is that we cannot evict the
>>>>> most recent context. For legacy, we do a switch back to the always
>>>>> pinned default context. For execlists we don't, but it still means we
>>>>> should only have one context which cannot be evicted (like legacy). But
>>>>> it does leave us with the issue that i915_gpu_idle() returns early and
>>>>> i915_gem_context_fini() must keep the explicit gpu reset to be
>>>>> absolutely sure that the pending context writes are completed before the
>>>>> final context is unbound.
>>>>
>>>> Yes, and that was what I originally had in mind. Meanwhile the scheduler
>>>> (will) happen and that means we won't have FIFO ordering. Which means when
>>>> we switch contexts (as opposed to just adding more to the ringbuffer of
>>>> the current one) we won't have any idea which context will be the next
>>>> one. Which also means we don't know which request to pick to retire the
>>>> old context. Hence why I think we need to be better.
>>>
>>> But the scheduler does - it is also in charge of making sure the
>>> retirement queue is in order. The essence is that we only actually pin
>>> engine->last_context, which is chosen as we submit stuff to the hw.
>>
>> Well I'm not sure how much it will reorder, but I'd expect it wants to
>> reorder stuff pretty freely. And as soon as it reorders context (ofc they
>> can't depend on each another) then the legacy hw ctx tracking won't work.
>>
>> I think at least ...
>
> Not the way it is written today, but the principle behind it still
> stands. The last_context submitted to the hardware is pinned until a new
> one is submitted (such that it remains bound in the GGTT until after the
> context switch is complete due to the active reference). Instead of
> doing the context tracking at the start of the execbuffer, the context
> tracking needs to be pushed down to the submission backend/middleman.
> -Chris

Does anyone actually know what guarantees (if any) the GPU provides 
w.r.t access to context images vs. USER_INTERRUPTs and CSB-updated 
interrupts? Does 'active->idle' really mean that the context has been 
fully updated in memory (and can therefore be unmapped), or just that
the engine has stopped processing (but the context might not be saved 
until it's known that it isn't going to be reactivated).

For example, it could implement this:

(End of last batch in current context)
	1.	Update seqno
	2.	Generate USER_INTERRUPT
	3.	Engine finishes work
		(HEAD == TAIL and no further contexts queued in ELSP)
	4.	Save all per-context registers to context image
	5.	Flush to memory and invalidate
	6.	Update CSB
	7.	Flush to memory
	8.	Generate CSB-update interrupt.

(New batch in same context submitted via ELSP)
	9.	Reload entire context image from memory
	10.	Update CSB
	11.	Generate CSB-update interrupt

Or this:
	1. Update seqno
	2. Generate USER_INTERRUPT
	3. Engine finishes work
		(HEAD == TAIL and no further contexts queued in ELSP)
	4. Update CSB
	5. Generate CSB-update interrupt.

(New batch in DIFFERENT context submitted via ELSP)
	6. Save all per-context registers to old context image
	7. Load entire context image from new image
	8. Update CSB
	9. Generate CSB-update interrupt

The former is synchronous and relatively easy to model, the latter is 
more like the way legacy mode works. Any various other permutations are 
possible (sync save vs async save vs deferred save, full reload vs 
lite-restore, etc). So I think we either need to know what really 
happens (and assume future chips will work the same way), or make only 
minimal assumptions and code something that will work no matter how the 
hardware actually behaves. That probably precludes any attempt at 
tracking individual context-switches at the CSB level, which in any case 
aren't passed to the CPU in GuC submission mode.

.Dave.

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles
  2015-10-14 14:42                     ` Dave Gordon
@ 2015-10-14 16:19                       ` Nick Hoath
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Hoath @ 2015-10-14 16:19 UTC (permalink / raw)
  To: intel-gfx, Daniel Vetter

On 14/10/2015 15:42, Dave Gordon wrote:
> On 13/10/15 12:36, Chris Wilson wrote:
>> On Tue, Oct 13, 2015 at 01:29:56PM +0200, Daniel Vetter wrote:
>>> On Fri, Oct 09, 2015 at 06:23:50PM +0100, Chris Wilson wrote:
>>>> On Fri, Oct 09, 2015 at 07:18:21PM +0200, Daniel Vetter wrote:
>>>>> On Fri, Oct 09, 2015 at 10:45:35AM +0100, Chris Wilson wrote:
>>>>>> On Fri, Oct 09, 2015 at 11:15:08AM +0200, Daniel Vetter wrote:
>>>>>>> My idea was to create a new request for 3. which gets signalled by the
>>>>>>> scheduler in intel_lrc_irq_handler. My idea was that we'd only create
>>>>>>> these when a ctx switch might occur to avoid overhead, but I guess if we
>>>>>>> just outright delay all requests a notch if need that might work too. But
>>>>>>> I'm really not sure on the implications of that (i.e. does the hardware
>>>>>>> really unlod the ctx if it's idle?), and whether that would fly still with
>>>>>>> the scheduler.
>>>>>>>
>>>>>>> But figuring this one out here seems to be the cornestone of this reorg.
>>>>>>> Without it we can't just throw contexts onto the active list.
>>>>>>
>>>>>> (Let me see if I understand it correctly)
>>>>>>
>>>>>> Basically the problem is that we can't trust the context object to be
>>>>>> synchronized until after the status interrupt. The way we handled that
>>>>>> for legacy is to track the currently bound context and keep the
>>>>>> vma->pin_count asserted until the request containing the switch away.
>>>>>> Doing the same for execlists would trivially fix the issue and if done
>>>>>> smartly allows us to share more code (been there, done that).
>>>>>>
>>>>>> That satisfies me for keeping requests as a basic fence in the GPU
>>>>>> timeline and should keep everyone happy that the context can't vanish
>>>>>> until after it is complete. The only caveat is that we cannot evict the
>>>>>> most recent context. For legacy, we do a switch back to the always
>>>>>> pinned default context. For execlists we don't, but it still means we
>>>>>> should only have one context which cannot be evicted (like legacy). But
>>>>>> it does leave us with the issue that i915_gpu_idle() returns early and
>>>>>> i915_gem_context_fini() must keep the explicit gpu reset to be
>>>>>> absolutely sure that the pending context writes are completed before the
>>>>>> final context is unbound.
>>>>>
>>>>> Yes, and that was what I originally had in mind. Meanwhile the scheduler
>>>>> (will) happen and that means we won't have FIFO ordering. Which means when
>>>>> we switch contexts (as opposed to just adding more to the ringbuffer of
>>>>> the current one) we won't have any idea which context will be the next
>>>>> one. Which also means we don't know which request to pick to retire the
>>>>> old context. Hence why I think we need to be better.
>>>>
>>>> But the scheduler does - it is also in charge of making sure the
>>>> retirement queue is in order. The essence is that we only actually pin
>>>> engine->last_context, which is chosen as we submit stuff to the hw.
>>>
>>> Well I'm not sure how much it will reorder, but I'd expect it wants to
>>> reorder stuff pretty freely. And as soon as it reorders context (ofc they
>>> can't depend on each another) then the legacy hw ctx tracking won't work.
>>>
>>> I think at least ...
>>
>> Not the way it is written today, but the principle behind it still
>> stands. The last_context submitted to the hardware is pinned until a new
>> one is submitted (such that it remains bound in the GGTT until after the
>> context switch is complete due to the active reference). Instead of
>> doing the context tracking at the start of the execbuffer, the context
>> tracking needs to be pushed down to the submission backend/middleman.
>> -Chris
>
> Does anyone actually know what guarantees (if any) the GPU provides
> w.r.t access to context images vs. USER_INTERRUPTs and CSB-updated
> interrupts? Does 'active->idle' really mean that the context has been
> fully updated in memory (and can therefore be unmapped), or just that
> the engine has stopped processing (but the context might not be saved
> until it's known that it isn't going to be reactivated).
>
> For example, it could implement this:
>
> (End of last batch in current context)
> 	1.	Update seqno
> 	2.	Generate USER_INTERRUPT
> 	3.	Engine finishes work
> 		(HEAD == TAIL and no further contexts queued in ELSP)
> 	4.	Save all per-context registers to context image
> 	5.	Flush to memory and invalidate
> 	6.	Update CSB
> 	7.	Flush to memory
> 	8.	Generate CSB-update interrupt.
>
> (New batch in same context submitted via ELSP)
> 	9.	Reload entire context image from memory
> 	10.	Update CSB
> 	11.	Generate CSB-update interrupt
>
> Or this:
> 	1. Update seqno
> 	2. Generate USER_INTERRUPT
> 	3. Engine finishes work
> 		(HEAD == TAIL and no further contexts queued in ELSP)
> 	4. Update CSB
> 	5. Generate CSB-update interrupt.
>
> (New batch in DIFFERENT context submitted via ELSP)
> 	6. Save all per-context registers to old context image
> 	7. Load entire context image from new image
> 	8. Update CSB
> 	9. Generate CSB-update interrupt
>
> The former is synchronous and relatively easy to model, the latter is
> more like the way legacy mode works. Any various other permutations are
> possible (sync save vs async save vs deferred save, full reload vs
> lite-restore, etc). So I think we either need to know what really
> happens (and assume future chips will work the same way), or make only
> minimal assumptions and code something that will work no matter how the
> hardware actually behaves. That probably precludes any attempt at
> tracking individual context-switches at the CSB level, which in any case
> aren't passed to the CPU in GuC submission mode.
>
> .Dave.
>
> _______________________________________________
> Intel-gfx mailing list
> Intel-gfx@lists.freedesktop.org
> http://lists.freedesktop.org/mailman/listinfo/intel-gfx
>

Tracking context via last_context at request retirement.

In LRC/ELSP mode:
	At startup:
		- Double refcount default context
		- Set last_context to default context

	When a request is complete
		- If last_context == current_context
			- queue request for cleanup
		- If last_context != current_context
			- unref last_context
			- update last_context to current_context
			- queue request for cleanup

What this achieves:
	Make the code path closer to legacy submission
	Can now use active_list tracking for contexts & ringbufs

Additional work 1:
	- When there is no work pending on an engine, at some point:
		- Send a nop request on the default context
			This moves last_context to be default context,
			allowing previous last_context to be unref'd

Additional work 2:
	- Change legacy mode to use last_context post request completion
		This will allow us to unify the code paths.



_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects
  2015-10-08 13:35     ` Chris Wilson
@ 2015-10-16 14:42       ` Nick Hoath
  2015-10-19  9:48         ` Daniel Vetter
  0 siblings, 1 reply; 24+ messages in thread
From: Nick Hoath @ 2015-10-16 14:42 UTC (permalink / raw)
  To: Chris Wilson, Daniel Vetter; +Cc: intel-gfx

On 08/10/2015 14:35, Chris Wilson wrote:
> On Wed, Oct 07, 2015 at 06:05:46PM +0200, Daniel Vetter wrote:
>> On Tue, Oct 06, 2015 at 03:52:02PM +0100, Nick Hoath wrote:
>>> Shovel all context related objects through the active queue and obj
>>> management.
>>>
>>> - Added callback in vma_(un)bind to add CPU (un)mapping at same time
>>>    if desired
>>> - Inserted LRC hw context & ringbuf to vma active list
>>>
>>> Issue: VIZ-4277
>>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
>>> ---
>>>   drivers/gpu/drm/i915/i915_drv.h         |  4 ++
>>>   drivers/gpu/drm/i915/i915_gem.c         |  3 ++
>>>   drivers/gpu/drm/i915/i915_gem_gtt.c     |  8 ++++
>>>   drivers/gpu/drm/i915/intel_lrc.c        | 28 +++++++++++--
>>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++++++++++++++++++---------------
>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 --
>>>   6 files changed, 79 insertions(+), 38 deletions(-)
>>>
>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>> index 3d217f9..d660ee3 100644
>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>> @@ -2169,6 +2169,10 @@ struct drm_i915_gem_object {
>>>   			struct work_struct *work;
>>>   		} userptr;
>>>   	};
>>> +
>>> +	/** Support for automatic CPU side mapping of object */
>>> +	int (*mmap)(struct drm_i915_gem_object *obj, bool unmap);
>>
>> I don't think we need a map hook, that can still be done (if not done so
I disagree - this keeps the interface symmetrical. Searching for the 
do/undo code paths and finding they are in difference places, called via 
different routes makes code harder to follow.
>> already) by the callers. Also it's better to rename this to vma_unbind
>> (and it should be at the vma level I think) since there's other potential
Nope - the obj is created first, at a point where the map/unamp function 
can be known. Moving the map/unmap to the vma would mean having a 
callback path to the object just to set up the callback path when the 
vma is created anonymously at some later point.
>> users. So explicit maping, lazy unmapping for the kmaps we need. That's
>> the same design we're using for binding objects into gpu address spaces.
>>
>> Also Chris Wilson has something similar, please align with him on the
>> precise design of this callback.
>
> We need the unbind hook because of the movement in the first patch (it
> is a separate issue, the code should work without it albeit having to
> remap the ring/context state more often). The changelog in this patch
> simply explains the i915_vma_move_to_active() additions. But to get the
> shrink accurate we do need the context unpin on retirement and to do the
> pin_count check in i915_vma_unbind() after waiting (rather than before,
> as we currently do). However, the eviction code will not inspect the
> active contexts objects yet (as it will continue to skip over the
> ggtt->pin_count on them). The way I allowed ctx objects to be evicted was
> to only keep the ctx->state pinned for the duration of the request
> construction.
>
> Note that I think it should be a vma->unbind hook not an object level
> one (it is i915_vma_unbind, without only a modicum of object level state
> being modified in that function).
> -Chris
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects
  2015-10-16 14:42       ` Nick Hoath
@ 2015-10-19  9:48         ` Daniel Vetter
  2015-10-19 10:54           ` Nick Hoath
  0 siblings, 1 reply; 24+ messages in thread
From: Daniel Vetter @ 2015-10-19  9:48 UTC (permalink / raw)
  To: Nick Hoath; +Cc: intel-gfx

On Fri, Oct 16, 2015 at 03:42:53PM +0100, Nick Hoath wrote:
> On 08/10/2015 14:35, Chris Wilson wrote:
> >On Wed, Oct 07, 2015 at 06:05:46PM +0200, Daniel Vetter wrote:
> >>On Tue, Oct 06, 2015 at 03:52:02PM +0100, Nick Hoath wrote:
> >>>Shovel all context related objects through the active queue and obj
> >>>management.
> >>>
> >>>- Added callback in vma_(un)bind to add CPU (un)mapping at same time
> >>>   if desired
> >>>- Inserted LRC hw context & ringbuf to vma active list
> >>>
> >>>Issue: VIZ-4277
> >>>Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
> >>>---
> >>>  drivers/gpu/drm/i915/i915_drv.h         |  4 ++
> >>>  drivers/gpu/drm/i915/i915_gem.c         |  3 ++
> >>>  drivers/gpu/drm/i915/i915_gem_gtt.c     |  8 ++++
> >>>  drivers/gpu/drm/i915/intel_lrc.c        | 28 +++++++++++--
> >>>  drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++++++++++++++++++---------------
> >>>  drivers/gpu/drm/i915/intel_ringbuffer.h |  3 --
> >>>  6 files changed, 79 insertions(+), 38 deletions(-)
> >>>
> >>>diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> >>>index 3d217f9..d660ee3 100644
> >>>--- a/drivers/gpu/drm/i915/i915_drv.h
> >>>+++ b/drivers/gpu/drm/i915/i915_drv.h
> >>>@@ -2169,6 +2169,10 @@ struct drm_i915_gem_object {
> >>>  			struct work_struct *work;
> >>>  		} userptr;
> >>>  	};
> >>>+
> >>>+	/** Support for automatic CPU side mapping of object */
> >>>+	int (*mmap)(struct drm_i915_gem_object *obj, bool unmap);
> >>
> >>I don't think we need a map hook, that can still be done (if not done so
> I disagree - this keeps the interface symmetrical. Searching for the do/undo
> code paths and finding they are in difference places, called via different
> routes makes code harder to follow.
> >>already) by the callers. Also it's better to rename this to vma_unbind
> >>(and it should be at the vma level I think) since there's other potential
> Nope - the obj is created first, at a point where the map/unamp function can
> be known. Moving the map/unmap to the vma would mean having a callback path
> to the object just to set up the callback path when the vma is created
> anonymously at some later point.

One of the plans for this is to also use it for to-be-unpinned
framebuffers (4k buffers are huge ...). And in that case the unmap hook
only, and on the vma is the design we want. And since it also seems to
accomodate all the other users I do think it's the right choice.

Like I said, explicit setup and lazy, implicit cleanup is kinda how a lot
of things in gem work.
-Daniel

> >>users. So explicit maping, lazy unmapping for the kmaps we need. That's
> >>the same design we're using for binding objects into gpu address spaces.
> >>
> >>Also Chris Wilson has something similar, please align with him on the
> >>precise design of this callback.
> >
> >We need the unbind hook because of the movement in the first patch (it
> >is a separate issue, the code should work without it albeit having to
> >remap the ring/context state more often). The changelog in this patch
> >simply explains the i915_vma_move_to_active() additions. But to get the
> >shrink accurate we do need the context unpin on retirement and to do the
> >pin_count check in i915_vma_unbind() after waiting (rather than before,
> >as we currently do). However, the eviction code will not inspect the
> >active contexts objects yet (as it will continue to skip over the
> >ggtt->pin_count on them). The way I allowed ctx objects to be evicted was
> >to only keep the ctx->state pinned for the duration of the request
> >construction.
> >
> >Note that I think it should be a vma->unbind hook not an object level
> >one (it is i915_vma_unbind, without only a modicum of object level state
> >being modified in that function).
> >-Chris
> >
> 

-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects
  2015-10-19  9:48         ` Daniel Vetter
@ 2015-10-19 10:54           ` Nick Hoath
  0 siblings, 0 replies; 24+ messages in thread
From: Nick Hoath @ 2015-10-19 10:54 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: intel-gfx

On 19/10/2015 10:48, Daniel Vetter wrote:
> On Fri, Oct 16, 2015 at 03:42:53PM +0100, Nick Hoath wrote:
>> On 08/10/2015 14:35, Chris Wilson wrote:
>>> On Wed, Oct 07, 2015 at 06:05:46PM +0200, Daniel Vetter wrote:
>>>> On Tue, Oct 06, 2015 at 03:52:02PM +0100, Nick Hoath wrote:
>>>>> Shovel all context related objects through the active queue and obj
>>>>> management.
>>>>>
>>>>> - Added callback in vma_(un)bind to add CPU (un)mapping at same time
>>>>>    if desired
>>>>> - Inserted LRC hw context & ringbuf to vma active list
>>>>>
>>>>> Issue: VIZ-4277
>>>>> Signed-off-by: Nick Hoath <nicholas.hoath@intel.com>
>>>>> ---
>>>>>   drivers/gpu/drm/i915/i915_drv.h         |  4 ++
>>>>>   drivers/gpu/drm/i915/i915_gem.c         |  3 ++
>>>>>   drivers/gpu/drm/i915/i915_gem_gtt.c     |  8 ++++
>>>>>   drivers/gpu/drm/i915/intel_lrc.c        | 28 +++++++++++--
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.c | 71 ++++++++++++++++++---------------
>>>>>   drivers/gpu/drm/i915/intel_ringbuffer.h |  3 --
>>>>>   6 files changed, 79 insertions(+), 38 deletions(-)
>>>>>
>>>>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>>>>> index 3d217f9..d660ee3 100644
>>>>> --- a/drivers/gpu/drm/i915/i915_drv.h
>>>>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>>>>> @@ -2169,6 +2169,10 @@ struct drm_i915_gem_object {
>>>>>   			struct work_struct *work;
>>>>>   		} userptr;
>>>>>   	};
>>>>> +
>>>>> +	/** Support for automatic CPU side mapping of object */
>>>>> +	int (*mmap)(struct drm_i915_gem_object *obj, bool unmap);
>>>>
>>>> I don't think we need a map hook, that can still be done (if not done so
>> I disagree - this keeps the interface symmetrical. Searching for the do/undo
>> code paths and finding they are in difference places, called via different
>> routes makes code harder to follow.
>>>> already) by the callers. Also it's better to rename this to vma_unbind
>>>> (and it should be at the vma level I think) since there's other potential
>> Nope - the obj is created first, at a point where the map/unamp function can
>> be known. Moving the map/unmap to the vma would mean having a callback path
>> to the object just to set up the callback path when the vma is created
>> anonymously at some later point.
>
> One of the plans for this is to also use it for to-be-unpinned
> framebuffers (4k buffers are huge ...). And in that case the unmap hook
> only, and on the vma is the design we want. And since it also seems to
> accomodate all the other users I do think it's the right choice.

I refer you to these words found on the mail list. The may be familiar:

As a rule of thumb for refactoring and share infastructure we use the 
following recipe in drm:
- first driver implements things as straightforward as possible
- 2nd user copypastes
- 3rd one has the duty to figure out whether some refactoring is in order
   or not.

The code as I have written it works best and simplest for my use case. 
If someone else wants to refactor it differently to shoe horn in their 
use case, that's up to them.


>
> Like I said, explicit setup and lazy, implicit cleanup is kinda how a lot
> of things in gem work.

The most dangerous phrase in the language is ‘we’ve always done it this 
way.’ - Grace Hopper

> -Daniel
>
>>>> users. So explicit maping, lazy unmapping for the kmaps we need. That's
>>>> the same design we're using for binding objects into gpu address spaces.
>>>>
>>>> Also Chris Wilson has something similar, please align with him on the
>>>> precise design of this callback.
>>>
>>> We need the unbind hook because of the movement in the first patch (it
>>> is a separate issue, the code should work without it albeit having to
>>> remap the ring/context state more often). The changelog in this patch
>>> simply explains the i915_vma_move_to_active() additions. But to get the
>>> shrink accurate we do need the context unpin on retirement and to do the
>>> pin_count check in i915_vma_unbind() after waiting (rather than before,
>>> as we currently do). However, the eviction code will not inspect the
>>> active contexts objects yet (as it will continue to skip over the
>>> ggtt->pin_count on them). The way I allowed ctx objects to be evicted was
>>> to only keep the ctx->state pinned for the duration of the request
>>> construction.
>>>
>>> Note that I think it should be a vma->unbind hook not an object level
>>> one (it is i915_vma_unbind, without only a modicum of object level state
>>> being modified in that function).
>>> -Chris
>>>
>>
>

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
http://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2015-10-19 10:54 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-10-06 14:52 [PATCH 0/4] lrc lifecycle cleanups Nick Hoath
2015-10-06 14:52 ` [PATCH 1/4] drm/i915: Unify execlist and legacy request life-cycles Nick Hoath
2015-10-07 16:03   ` Daniel Vetter
2015-10-07 16:05     ` Chris Wilson
2015-10-08 12:32   ` Chris Wilson
2015-10-09  7:58     ` Daniel Vetter
2015-10-09  8:36       ` Chris Wilson
2015-10-09  9:15         ` Daniel Vetter
2015-10-09  9:45           ` Chris Wilson
2015-10-09 17:18             ` Daniel Vetter
2015-10-09 17:23               ` Chris Wilson
2015-10-13 11:29                 ` Daniel Vetter
2015-10-13 11:36                   ` Chris Wilson
2015-10-14 14:42                     ` Dave Gordon
2015-10-14 16:19                       ` Nick Hoath
2015-10-06 14:52 ` [PATCH 2/4] drm/i915: Improve dynamic management/eviction of lrc backing objects Nick Hoath
2015-10-07 16:05   ` Daniel Vetter
2015-10-08 13:35     ` Chris Wilson
2015-10-16 14:42       ` Nick Hoath
2015-10-19  9:48         ` Daniel Vetter
2015-10-19 10:54           ` Nick Hoath
2015-10-06 14:52 ` [PATCH 3/4] drm/i915: Add the CPU mapping of the hw context to the pinned items Nick Hoath
2015-10-07 16:08   ` Daniel Vetter
2015-10-06 14:52 ` [PATCH 4/4] drm/i915: Only update ringbuf address when necessary Nick Hoath

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.