All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC 0/4] Eliminating the execlist retired request queue
@ 2016-04-08 13:54 Tvrtko Ursulin
  2016-04-08 13:54 ` [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel Tvrtko Ursulin
                   ` (5 more replies)
  0 siblings, 6 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-08 13:54 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

Retired request queue coupled with retired work handler is a scalability
concern on some workloads of which one dramatic example is gem_close_race.

This series depend on i915_gem_object_pin_map series by Chris Wilson.

Brief outline of patches:

 1/4) Allow lockless request unreference - needed for 4/4.
 2/4) Pin the LRC until the following request is completed - needed for
      GuC mode on its own and also enables 4/4.
 3/4) Use i915_gem_object_pin_map in execlists to alleviate increase in
      LRC pin/unpin this series adds.
 4/4) Removes retired_req_queue.

Lightly tested so RFC for now. Did not spot any performance difference in
my usual benchmarks but did not look very closely yet. There may be wins in
CPU usage. TBD.

Easiest way to see the change is gem_close_race which finishes in 10-20 seconds
with this series and otherwise takes ten minutes to half an hour.

TODO:

1) Patches 2 and 3 should probably swap order.
2) More benchmarking including CPU usage and power.
3) Need a better cover letter as well.

Chris Wilson (1):
  drm/i915: Move releasing of the GEM request from free to retire/cancel

Tvrtko Ursulin (3):
  drm/i915/guc: Keep the previous context pinned until the next one has
    been completed
  drm/i915: Use new i915_gem_object_pin_map for LRC
  drm/i915: Stop tracking execlists retired requests

 drivers/gpu/drm/i915/i915_drv.h         | 18 ++------
 drivers/gpu/drm/i915/i915_gem.c         | 63 +++++++++++++-------------
 drivers/gpu/drm/i915/intel_display.c    |  2 +-
 drivers/gpu/drm/i915/intel_lrc.c        | 78 ++++++++++++++++-----------------
 drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
 drivers/gpu/drm/i915/intel_pm.c         |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 7 files changed, 74 insertions(+), 92 deletions(-)

-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
@ 2016-04-08 13:54 ` Tvrtko Ursulin
  2016-04-08 15:01   ` Chris Wilson
  2016-04-08 13:54 ` [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed Tvrtko Ursulin
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-08 13:54 UTC (permalink / raw)
  To: Intel-gfx

From: Chris Wilson <chris@chris-wilson.co.uk>

If we move the release of the GEM request (i.e. decoupling it from the
various lists used for client and context tracking) after it is complete
(either by the GPU retiring the request, or by the caller cancelling the
request), we can remove the requirement that the final unreference of
the GEM request need to be under the struct_mutex.

v2: Execlists as always is badly asymetric and year old patches still
haven't landed to fix it up.

v3: Extracted, rebased and fixed for GuC. (Tvrtko Ursulin)

Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/i915_drv.h      | 14 ----------
 drivers/gpu/drm/i915/i915_gem.c      | 50 ++++++++++++++++++------------------
 drivers/gpu/drm/i915/intel_display.c |  2 +-
 drivers/gpu/drm/i915/intel_pm.c      |  2 +-
 4 files changed, 27 insertions(+), 41 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index da403f4fc679..fa199b7e51b5 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2352,23 +2352,9 @@ i915_gem_request_reference(struct drm_i915_gem_request *req)
 static inline void
 i915_gem_request_unreference(struct drm_i915_gem_request *req)
 {
-	WARN_ON(!mutex_is_locked(&req->engine->dev->struct_mutex));
 	kref_put(&req->ref, i915_gem_request_free);
 }
 
-static inline void
-i915_gem_request_unreference__unlocked(struct drm_i915_gem_request *req)
-{
-	struct drm_device *dev;
-
-	if (!req)
-		return;
-
-	dev = req->engine->dev;
-	if (kref_put_mutex(&req->ref, i915_gem_request_free, &dev->struct_mutex))
-		mutex_unlock(&dev->struct_mutex);
-}
-
 static inline void i915_gem_request_assign(struct drm_i915_gem_request **pdst,
 					   struct drm_i915_gem_request *src)
 {
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 239452933a89..c1df696f9002 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1410,10 +1410,31 @@ i915_gem_request_remove_from_client(struct drm_i915_gem_request *request)
 	request->pid = NULL;
 }
 
+static void __i915_gem_request_release(struct drm_i915_gem_request *request)
+{
+	if (i915.enable_execlists) {
+		if (request->ctx != request->i915->kernel_context)
+			intel_lr_context_unpin(request->ctx, request->engine);
+	}
+
+	i915_gem_request_remove_from_client(request);
+
+	i915_gem_context_unreference(request->ctx);
+	i915_gem_request_unreference(request);
+}
+
+void i915_gem_request_cancel(struct drm_i915_gem_request *req)
+{
+	intel_ring_reserved_space_cancel(req->ringbuf);
+	__i915_gem_request_release(req);
+}
+
 static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 {
 	trace_i915_gem_request_retire(request);
 
+	list_del_init(&request->list);
+
 	/* We know the GPU must have read the request to have
 	 * sent us the seqno + interrupt, so use the position
 	 * of tail of the request to update the last known position
@@ -1424,10 +1445,7 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 */
 	request->ringbuf->last_retired_head = request->postfix;
 
-	list_del_init(&request->list);
-	i915_gem_request_remove_from_client(request);
-
-	i915_gem_request_unreference(request);
+	__i915_gem_request_release(request);
 }
 
 static void
@@ -2723,18 +2741,7 @@ static void i915_set_reset_status(struct drm_i915_private *dev_priv,
 void i915_gem_request_free(struct kref *req_ref)
 {
 	struct drm_i915_gem_request *req = container_of(req_ref,
-						 typeof(*req), ref);
-	struct intel_context *ctx = req->ctx;
-
-	if (req->file_priv)
-		i915_gem_request_remove_from_client(req);
-
-	if (ctx) {
-		if (i915.enable_execlists && ctx != req->i915->kernel_context)
-			intel_lr_context_unpin(ctx, req->engine);
-
-		i915_gem_context_unreference(ctx);
-	}
+							typeof(*req), ref);
 
 	kmem_cache_free(req->i915->requests, req);
 }
@@ -2830,13 +2837,6 @@ i915_gem_request_alloc(struct intel_engine_cs *engine,
 	return err ? ERR_PTR(err) : req;
 }
 
-void i915_gem_request_cancel(struct drm_i915_gem_request *req)
-{
-	intel_ring_reserved_space_cancel(req->ringbuf);
-
-	i915_gem_request_unreference(req);
-}
-
 struct drm_i915_gem_request *
 i915_gem_find_active_request(struct intel_engine_cs *engine)
 {
@@ -3194,7 +3194,7 @@ i915_gem_wait_ioctl(struct drm_device *dev, void *data, struct drm_file *file)
 			ret = __i915_wait_request(req[i], reset_counter, true,
 						  args->timeout_ns > 0 ? &args->timeout_ns : NULL,
 						  to_rps_client(file));
-		i915_gem_request_unreference__unlocked(req[i]);
+		i915_gem_request_unreference(req[i]);
 	}
 	return ret;
 
@@ -4216,7 +4216,7 @@ i915_gem_ring_throttle(struct drm_device *dev, struct drm_file *file)
 	if (ret == 0)
 		queue_delayed_work(dev_priv->wq, &dev_priv->mm.retire_work, 0);
 
-	i915_gem_request_unreference__unlocked(target);
+	i915_gem_request_unreference(target);
 
 	return ret;
 }
diff --git a/drivers/gpu/drm/i915/intel_display.c b/drivers/gpu/drm/i915/intel_display.c
index feb7028341b8..f2ce7310e2b6 100644
--- a/drivers/gpu/drm/i915/intel_display.c
+++ b/drivers/gpu/drm/i915/intel_display.c
@@ -11362,7 +11362,7 @@ static void intel_mmio_flip_work_func(struct work_struct *work)
 					    mmio_flip->crtc->reset_counter,
 					    false, NULL,
 					    &mmio_flip->i915->rps.mmioflips));
-		i915_gem_request_unreference__unlocked(mmio_flip->req);
+		i915_gem_request_unreference(mmio_flip->req);
 	}
 
 	/* For framebuffer backed by dmabuf, wait for fence */
diff --git a/drivers/gpu/drm/i915/intel_pm.c b/drivers/gpu/drm/i915/intel_pm.c
index 43b24a1f5ee6..346e353b4fe8 100644
--- a/drivers/gpu/drm/i915/intel_pm.c
+++ b/drivers/gpu/drm/i915/intel_pm.c
@@ -7378,7 +7378,7 @@ static void __intel_rps_boost_work(struct work_struct *work)
 		gen6_rps_boost(to_i915(req->engine->dev), NULL,
 			       req->emitted_jiffies);
 
-	i915_gem_request_unreference__unlocked(req);
+	i915_gem_request_unreference(req);
 	kfree(boost);
 }
 
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed
  2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
  2016-04-08 13:54 ` [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel Tvrtko Ursulin
@ 2016-04-08 13:54 ` Tvrtko Ursulin
  2016-04-08 14:37   ` Chris Wilson
  2016-04-08 13:54 ` [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC Tvrtko Ursulin
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-08 13:54 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko@ursulin.net>

In GuC mode submitting requests is only putting them into the
GuC queue with the actual submission to hardware following at
some future point. This makes the per engine last context
tracking insufficient for closing the premature context
unpin race.

Instead we need to make requests track and pin the previous
context on the same engine, and only unpin them when they
themselves are retired. This will ensure contexts remain pinned
in all cases until the are fully saved by the GPU.

v2: Only track contexts.

Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
Cc: Alex Dai <yu.dai@intel.com>
Cc: Dave Gordon <david.s.gordon@intel.com>
Cc: Nick Hoath <nicholas.hoath@intel.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_drv.h  |  4 +++-
 drivers/gpu/drm/i915/i915_gem.c  |  5 +++++
 drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++
 3 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
index fa199b7e51b5..216b2297ae3e 100644
--- a/drivers/gpu/drm/i915/i915_drv.h
+++ b/drivers/gpu/drm/i915/i915_drv.h
@@ -2283,6 +2283,9 @@ struct drm_i915_gem_request {
 	struct intel_context *ctx;
 	struct intel_ringbuffer *ringbuf;
 
+	/** Context preceding this one on the same engine. Can be NULL. */
+	struct intel_context *prev_ctx;
+
 	/** Batch buffer related to this request if any (used for
 	    error state dump only) */
 	struct drm_i915_gem_object *batch_obj;
@@ -2318,7 +2321,6 @@ struct drm_i915_gem_request {
 
 	/** Execlists no. of times this request has been sent to the ELSP */
 	int elsp_submitted;
-
 };
 
 struct drm_i915_gem_request * __must_check
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index c1df696f9002..118911ce7231 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -1445,6 +1445,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
 	 */
 	request->ringbuf->last_retired_head = request->postfix;
 
+	if (request->prev_ctx) {
+		intel_lr_context_unpin(request->prev_ctx, request->engine);
+		request->prev_ctx = NULL;
+	}
+
 	__i915_gem_request_release(request);
 }
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 31445aa3429b..3d08dac0a9b0 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -798,6 +798,23 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
 	if (intel_engine_stopped(engine))
 		return 0;
 
+	/*
+	 * Pin the previous context on this engine to ensure it is not
+	 * prematurely unpinned in GuC mode.
+	 * Previous context will be unpinned when this request is retired,
+	 * ensuring the GPU has switched out from the previous context and into
+	 * a new context at that point.
+	 */
+	if (engine->last_context) {
+		if (engine->last_context != request->i915->kernel_context) {
+			intel_lr_context_pin(engine->last_context, engine);
+			request->prev_ctx = engine->last_context;
+		}
+	}
+
+	/*
+	 * Track and pin the last context submitted on an engine.
+	 */
 	if (engine->last_context != request->ctx) {
 		if (engine->last_context)
 			intel_lr_context_unpin(engine->last_context, engine);
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC
  2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
  2016-04-08 13:54 ` [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel Tvrtko Ursulin
  2016-04-08 13:54 ` [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed Tvrtko Ursulin
@ 2016-04-08 13:54 ` Tvrtko Ursulin
  2016-04-08 14:40   ` Chris Wilson
  2016-04-08 13:54 ` [RFC 4/4] drm/i915: Stop tracking execlists retired requests Tvrtko Ursulin
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-08 13:54 UTC (permalink / raw)
  To: Intel-gfx

From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>

We can use the new pin/lazy unpin API for more performance
in the execlist submission paths.

Signed-off-by: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
---
 drivers/gpu/drm/i915/intel_lrc.c | 15 +++++++--------
 1 file changed, 7 insertions(+), 8 deletions(-)

diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 3d08dac0a9b0..5e967943ce49 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -1111,8 +1111,8 @@ static int intel_lr_context_do_pin(struct intel_context *ctx,
 	struct drm_i915_private *dev_priv = dev->dev_private;
 	struct drm_i915_gem_object *ctx_obj = ctx->engine[engine->id].state;
 	struct intel_ringbuffer *ringbuf = ctx->engine[engine->id].ringbuf;
-	struct page *lrc_state_page;
-	uint32_t *lrc_reg_state;
+	void *obj_addr;
+	u32 *lrc_reg_state;
 	int ret;
 
 	WARN_ON(!mutex_is_locked(&engine->dev->struct_mutex));
@@ -1122,11 +1122,11 @@ static int intel_lr_context_do_pin(struct intel_context *ctx,
 	if (ret)
 		return ret;
 
-	lrc_state_page = i915_gem_object_get_dirty_page(ctx_obj, LRC_STATE_PN);
-	if (WARN_ON(!lrc_state_page)) {
-		ret = -ENODEV;
+	obj_addr = i915_gem_object_pin_map(ctx_obj);
+	if (!obj_addr)
 		goto unpin_ctx_obj;
-	}
+
+	lrc_reg_state = obj_addr + LRC_STATE_PN * PAGE_SIZE;
 
 	ret = intel_pin_and_map_ringbuffer_obj(engine->dev, ringbuf);
 	if (ret)
@@ -1134,7 +1134,6 @@ static int intel_lr_context_do_pin(struct intel_context *ctx,
 
 	ctx->engine[engine->id].lrc_vma = i915_gem_obj_to_ggtt(ctx_obj);
 	intel_lr_context_descriptor_update(ctx, engine);
-	lrc_reg_state = kmap(lrc_state_page);
 	lrc_reg_state[CTX_RING_BUFFER_START+1] = ringbuf->vma->node.start;
 	ctx->engine[engine->id].lrc_reg_state = lrc_reg_state;
 	ctx_obj->dirty = true;
@@ -1177,7 +1176,7 @@ void intel_lr_context_unpin(struct intel_context *ctx,
 
 	WARN_ON(!mutex_is_locked(&ctx->i915->dev->struct_mutex));
 	if (--ctx->engine[engine->id].pin_count == 0) {
-		kunmap(kmap_to_page(ctx->engine[engine->id].lrc_reg_state));
+		i915_gem_object_unpin_map(ctx_obj);
 		intel_unpin_ringbuffer_obj(ctx->engine[engine->id].ringbuf);
 		i915_gem_object_ggtt_unpin(ctx_obj);
 		ctx->engine[engine->id].lrc_vma = NULL;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC 4/4] drm/i915: Stop tracking execlists retired requests
  2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
                   ` (2 preceding siblings ...)
  2016-04-08 13:54 ` [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC Tvrtko Ursulin
@ 2016-04-08 13:54 ` Tvrtko Ursulin
  2016-04-08 14:57   ` Chris Wilson
  2016-04-08 14:08 ` ✗ Fi.CI.BAT: failure for Eliminating the execlist retired request queue Patchwork
  2016-04-09  8:03 ` [RFC 0/4] " Chris Wilson
  5 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-08 13:54 UTC (permalink / raw)
  To: Intel-gfx; +Cc: Tvrtko Ursulin

From: Tvrtko Ursulin <tvrtko@ursulin.net>

With the previous patch having extended the pinned lifetime of
contexts by referencing the previous context from the current
request until the latter is retired (completed by the GPU),
we can now remove usage of execlist retired queue entirely.

This is because the above now guarantees that all execlist
object access requirements are satisfied by this new tracking,
and we can stop taking additional references and stop keeping
request on the execlists retired queue.

The latter was a source of significant scalability issues in
the driver causing performance hits on some tests. Most
dramatical of which was igt/gem_close_race which had run time
in tens of minutes which is now reduced to tens of seconds.

Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
---
 drivers/gpu/drm/i915/i915_gem.c         | 10 +------
 drivers/gpu/drm/i915/intel_lrc.c        | 46 ++++++++++-----------------------
 drivers/gpu/drm/i915/intel_lrc.h        |  2 +-
 drivers/gpu/drm/i915/intel_ringbuffer.h |  1 -
 4 files changed, 16 insertions(+), 43 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 118911ce7231..2f4bfe93b20c 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -2901,13 +2901,7 @@ static void i915_gem_reset_engine_cleanup(struct drm_i915_private *dev_priv,
 		/* Ensure irq handler finishes or is cancelled. */
 		tasklet_kill(&engine->irq_tasklet);
 
-		spin_lock_bh(&engine->execlist_lock);
-		/* list_splice_tail_init checks for empty lists */
-		list_splice_tail_init(&engine->execlist_queue,
-				      &engine->execlist_retired_req_list);
-		spin_unlock_bh(&engine->execlist_lock);
-
-		intel_execlists_retire_requests(engine);
+		intel_execlists_cancel_requests(engine);
 	}
 
 	/*
@@ -3029,8 +3023,6 @@ i915_gem_retire_requests(struct drm_device *dev)
 			spin_lock_bh(&engine->execlist_lock);
 			idle &= list_empty(&engine->execlist_queue);
 			spin_unlock_bh(&engine->execlist_lock);
-
-			intel_execlists_retire_requests(engine);
 		}
 	}
 
diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
index 5e967943ce49..f241cf6e4227 100644
--- a/drivers/gpu/drm/i915/intel_lrc.c
+++ b/drivers/gpu/drm/i915/intel_lrc.c
@@ -456,8 +456,8 @@ static void execlists_context_unqueue(struct intel_engine_cs *engine)
 			/* Same ctx: ignore first request, as second request
 			 * will update tail past first request's workload */
 			cursor->elsp_submitted = req0->elsp_submitted;
-			list_move_tail(&req0->execlist_link,
-				       &engine->execlist_retired_req_list);
+			list_del(&req0->execlist_link);
+			i915_gem_request_unreference(req0);
 			req0 = cursor;
 		} else {
 			req1 = cursor;
@@ -499,19 +499,17 @@ execlists_check_remove_request(struct intel_engine_cs *engine, u32 request_id)
 					    struct drm_i915_gem_request,
 					    execlist_link);
 
-	if (!head_req)
-		return 0;
-
-	if (unlikely(intel_execlists_ctx_id(head_req->ctx, engine) != request_id))
-		return 0;
+	if (WARN_ON(!head_req ||
+	    (intel_execlists_ctx_id(head_req->ctx, engine) != request_id)))
+               return 0;
 
 	WARN(head_req->elsp_submitted == 0, "Never submitted head request\n");
 
 	if (--head_req->elsp_submitted > 0)
 		return 0;
 
-	list_move_tail(&head_req->execlist_link,
-		       &engine->execlist_retired_req_list);
+	list_del(&head_req->execlist_link);
+	i915_gem_request_unreference(head_req);
 
 	return 1;
 }
@@ -615,11 +613,6 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
 	struct drm_i915_gem_request *cursor;
 	int num_elements = 0;
 
-	if (request->ctx != request->i915->kernel_context)
-		intel_lr_context_pin(request->ctx, engine);
-
-	i915_gem_request_reference(request);
-
 	spin_lock_bh(&engine->execlist_lock);
 
 	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
@@ -636,11 +629,12 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
 		if (request->ctx == tail_req->ctx) {
 			WARN(tail_req->elsp_submitted != 0,
 				"More than 2 already-submitted reqs queued\n");
-			list_move_tail(&tail_req->execlist_link,
-				       &engine->execlist_retired_req_list);
+			list_del(&tail_req->execlist_link);
+			i915_gem_request_unreference(tail_req);
 		}
 	}
 
+	i915_gem_request_reference(request);
 	list_add_tail(&request->execlist_link, &engine->execlist_queue);
 	if (num_elements == 0)
 		execlists_context_unqueue(engine);
@@ -1039,28 +1033,18 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 	return 0;
 }
 
-void intel_execlists_retire_requests(struct intel_engine_cs *engine)
+void intel_execlists_cancel_requests(struct intel_engine_cs *engine)
 {
 	struct drm_i915_gem_request *req, *tmp;
-	struct list_head retired_list;
+	LIST_HEAD(cancel_list);
 
 	WARN_ON(!mutex_is_locked(&engine->dev->struct_mutex));
-	if (list_empty(&engine->execlist_retired_req_list))
-		return;
 
-	INIT_LIST_HEAD(&retired_list);
 	spin_lock_bh(&engine->execlist_lock);
-	list_replace_init(&engine->execlist_retired_req_list, &retired_list);
+	list_replace_init(&engine->execlist_queue, &cancel_list);
 	spin_unlock_bh(&engine->execlist_lock);
 
-	list_for_each_entry_safe(req, tmp, &retired_list, execlist_link) {
-		struct intel_context *ctx = req->ctx;
-		struct drm_i915_gem_object *ctx_obj =
-				ctx->engine[engine->id].state;
-
-		if (ctx_obj && (ctx != req->i915->kernel_context))
-			intel_lr_context_unpin(ctx, engine);
-
+	list_for_each_entry_safe(req, tmp, &cancel_list, execlist_link) {
 		list_del(&req->execlist_link);
 		i915_gem_request_unreference(req);
 	}
@@ -1180,7 +1164,6 @@ void intel_lr_context_unpin(struct intel_context *ctx,
 		intel_unpin_ringbuffer_obj(ctx->engine[engine->id].ringbuf);
 		i915_gem_object_ggtt_unpin(ctx_obj);
 		ctx->engine[engine->id].lrc_vma = NULL;
-		ctx->engine[engine->id].lrc_desc = 0;
 		ctx->engine[engine->id].lrc_reg_state = NULL;
 
 		i915_gem_context_unreference(ctx);
@@ -2110,7 +2093,6 @@ logical_ring_init(struct drm_device *dev, struct intel_engine_cs *engine)
 
 	INIT_LIST_HEAD(&engine->buffers);
 	INIT_LIST_HEAD(&engine->execlist_queue);
-	INIT_LIST_HEAD(&engine->execlist_retired_req_list);
 	spin_lock_init(&engine->execlist_lock);
 
 	tasklet_init(&engine->irq_tasklet,
diff --git a/drivers/gpu/drm/i915/intel_lrc.h b/drivers/gpu/drm/i915/intel_lrc.h
index 8de1ea536ad4..04cee14f2bb2 100644
--- a/drivers/gpu/drm/i915/intel_lrc.h
+++ b/drivers/gpu/drm/i915/intel_lrc.h
@@ -119,6 +119,6 @@ int intel_execlists_submission(struct i915_execbuffer_params *params,
 			       struct drm_i915_gem_execbuffer2 *args,
 			       struct list_head *vmas);
 
-void intel_execlists_retire_requests(struct intel_engine_cs *engine);
+void intel_execlists_cancel_requests(struct intel_engine_cs *engine);
 
 #endif /* _INTEL_LRC_H_ */
diff --git a/drivers/gpu/drm/i915/intel_ringbuffer.h b/drivers/gpu/drm/i915/intel_ringbuffer.h
index 3a11705222fc..712c754bbd4e 100644
--- a/drivers/gpu/drm/i915/intel_ringbuffer.h
+++ b/drivers/gpu/drm/i915/intel_ringbuffer.h
@@ -269,7 +269,6 @@ struct  intel_engine_cs {
 	struct tasklet_struct irq_tasklet;
 	spinlock_t execlist_lock; /* used inside tasklet, use spin_lock_bh */
 	struct list_head execlist_queue;
-	struct list_head execlist_retired_req_list;
 	unsigned int fw_domains;
 	unsigned int next_context_status_buffer;
 	unsigned int idle_lite_restore_wa;
-- 
1.9.1

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* ✗ Fi.CI.BAT: failure for Eliminating the execlist retired request queue
  2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
                   ` (3 preceding siblings ...)
  2016-04-08 13:54 ` [RFC 4/4] drm/i915: Stop tracking execlists retired requests Tvrtko Ursulin
@ 2016-04-08 14:08 ` Patchwork
  2016-04-09  8:03 ` [RFC 0/4] " Chris Wilson
  5 siblings, 0 replies; 20+ messages in thread
From: Patchwork @ 2016-04-08 14:08 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: intel-gfx

== Series Details ==

Series: Eliminating the execlist retired request queue
URL   : https://patchwork.freedesktop.org/series/5459/
State : failure

== Summary ==

  CC [M]  drivers/gpu/drm/i915/intel_dpll_mgr.o
  CC [M]  drivers/gpu/drm/i915/intel_fbc.o
  CC [M]  drivers/gpu/drm/i915/intel_fifo_underrun.o
  CC [M]  drivers/gpu/drm/i915/intel_frontbuffer.o
  CC [M]  drivers/gpu/drm/i915/intel_hotplug.o
  CC [M]  drivers/gpu/drm/i915/intel_modes.o
  CC [M]  drivers/gpu/drm/i915/intel_overlay.o
  CC [M]  drivers/gpu/drm/i915/intel_psr.o
  CC [M]  drivers/gpu/drm/i915/intel_sideband.o
  LD      drivers/usb/host/xhci-hcd.o
  LD      drivers/usb/host/built-in.o
  CC [M]  drivers/gpu/drm/i915/intel_sprite.o
  LD      drivers/usb/built-in.o
  CC [M]  drivers/gpu/drm/i915/intel_acpi.o
  CC [M]  drivers/gpu/drm/i915/intel_opregion.o
  CC [M]  drivers/gpu/drm/i915/dvo_ch7017.o
  CC [M]  drivers/gpu/drm/i915/intel_fbdev.o
  CC [M]  drivers/gpu/drm/i915/dvo_ch7xxx.o
cc1: some warnings being treated as errors
scripts/Makefile.build:291: recipe for target 'drivers/gpu/drm/i915/intel_lrc.o' failed
make[4]: *** [drivers/gpu/drm/i915/intel_lrc.o] Error 1
make[4]: *** Waiting for unfinished jobs....
scripts/Makefile.build:440: recipe for target 'drivers/gpu/drm/i915' failed
make[3]: *** [drivers/gpu/drm/i915] Error 2
scripts/Makefile.build:440: recipe for target 'drivers/gpu/drm' failed
make[2]: *** [drivers/gpu/drm] Error 2
scripts/Makefile.build:440: recipe for target 'drivers/gpu' failed
make[1]: *** [drivers/gpu] Error 2
Makefile:962: recipe for target 'drivers' failed
make: *** [drivers] Error 2

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed
  2016-04-08 13:54 ` [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed Tvrtko Ursulin
@ 2016-04-08 14:37   ` Chris Wilson
  2016-04-11  8:58     ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-04-08 14:37 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Tvrtko Ursulin, Intel-gfx

On Fri, Apr 08, 2016 at 02:54:56PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko@ursulin.net>
> 
> In GuC mode submitting requests is only putting them into the
> GuC queue with the actual submission to hardware following at
> some future point. This makes the per engine last context
> tracking insufficient for closing the premature context
> unpin race.
> 
> Instead we need to make requests track and pin the previous
> context on the same engine, and only unpin them when they
> themselves are retired. This will ensure contexts remain pinned
> in all cases until the are fully saved by the GPU.
> 
> v2: Only track contexts.
> 
> Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
> Cc: Alex Dai <yu.dai@intel.com>
> Cc: Dave Gordon <david.s.gordon@intel.com>
> Cc: Nick Hoath <nicholas.hoath@intel.com>
> Cc: Chris Wilson <chris@chris-wilson.co.uk>
> ---
>  drivers/gpu/drm/i915/i915_drv.h  |  4 +++-
>  drivers/gpu/drm/i915/i915_gem.c  |  5 +++++
>  drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++
>  3 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
> index fa199b7e51b5..216b2297ae3e 100644
> --- a/drivers/gpu/drm/i915/i915_drv.h
> +++ b/drivers/gpu/drm/i915/i915_drv.h
> @@ -2283,6 +2283,9 @@ struct drm_i915_gem_request {
>  	struct intel_context *ctx;
>  	struct intel_ringbuffer *ringbuf;
>  
> +	/** Context preceding this one on the same engine. Can be NULL. */
> +	struct intel_context *prev_ctx;
> +
>  	/** Batch buffer related to this request if any (used for
>  	    error state dump only) */
>  	struct drm_i915_gem_object *batch_obj;
> @@ -2318,7 +2321,6 @@ struct drm_i915_gem_request {
>  
>  	/** Execlists no. of times this request has been sent to the ELSP */
>  	int elsp_submitted;
> -
>  };
>  
>  struct drm_i915_gem_request * __must_check
> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
> index c1df696f9002..118911ce7231 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -1445,6 +1445,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>  	 */
>  	request->ringbuf->last_retired_head = request->postfix;
>  
> +	if (request->prev_ctx) {
> +		intel_lr_context_unpin(request->prev_ctx, request->engine);
> +		request->prev_ctx = NULL;
> +	}
> +
>  	__i915_gem_request_release(request);
>  }
>  
> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
> index 31445aa3429b..3d08dac0a9b0 100644
> --- a/drivers/gpu/drm/i915/intel_lrc.c
> +++ b/drivers/gpu/drm/i915/intel_lrc.c
> @@ -798,6 +798,23 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
>  	if (intel_engine_stopped(engine))
>  		return 0;
>  
> +	/*
> +	 * Pin the previous context on this engine to ensure it is not
> +	 * prematurely unpinned in GuC mode.
> +	 * Previous context will be unpinned when this request is retired,
> +	 * ensuring the GPU has switched out from the previous context and into
> +	 * a new context at that point.
> +	 */
> +	if (engine->last_context) {
> +		if (engine->last_context != request->i915->kernel_context) {
> +			intel_lr_context_pin(engine->last_context, engine);
> +			request->prev_ctx = engine->last_context;
> +		}
> +	}
> +
> +	/*
> +	 * Track and pin the last context submitted on an engine.
> +	 */
>  	if (engine->last_context != request->ctx) {
>  		if (engine->last_context)
>  			intel_lr_context_unpin(engine->last_context, engine);

We can use pinned_context to reduce the code complexity, not make it
worse.

If you first apply the execlists default context fix, adding
pinned_context is a reduction is code.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC
  2016-04-08 13:54 ` [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC Tvrtko Ursulin
@ 2016-04-08 14:40   ` Chris Wilson
  2016-04-11  9:05     ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-04-08 14:40 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> We can use the new pin/lazy unpin API for more performance
> in the execlist submission paths.

The sticking point for me was the ringbuffer and Braswell having to
re-ioremap it every context pin. I hadn't noticed the kmap of the
context, but this does seem like a valid use for pin_map so I'm not
complaining... (Just saying this half the solution ;)

Oh, and you don't have pin_map for your ringbuffer either yet.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/4] drm/i915: Stop tracking execlists retired requests
  2016-04-08 13:54 ` [RFC 4/4] drm/i915: Stop tracking execlists retired requests Tvrtko Ursulin
@ 2016-04-08 14:57   ` Chris Wilson
  2016-04-11  9:10     ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-04-08 14:57 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Tvrtko Ursulin, Intel-gfx

On Fri, Apr 08, 2016 at 02:54:58PM +0100, Tvrtko Ursulin wrote:
> @@ -615,11 +613,6 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
>  	struct drm_i915_gem_request *cursor;
>  	int num_elements = 0;
>  
> -	if (request->ctx != request->i915->kernel_context)
> -		intel_lr_context_pin(request->ctx, engine);
> -
> -	i915_gem_request_reference(request);
> -
>  	spin_lock_bh(&engine->execlist_lock);
>  
>  	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
> @@ -636,11 +629,12 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
>  		if (request->ctx == tail_req->ctx) {
>  			WARN(tail_req->elsp_submitted != 0,
>  				"More than 2 already-submitted reqs queued\n");
> -			list_move_tail(&tail_req->execlist_link,
> -				       &engine->execlist_retired_req_list);
> +			list_del(&tail_req->execlist_link);
> +			i915_gem_request_unreference(tail_req);
>  		}
>  	}
>  
> +	i915_gem_request_reference(request);
>  	list_add_tail(&request->execlist_link, &engine->execlist_queue);

If you want to get truly radical, we do not need the ref on the request
until it is submitted to hardware. (As the request cannot be retired
until it has done so, it can leave the execlist_queue until we commit it
to hw, or perform the cancel).

Lgtm,
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-08 13:54 ` [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel Tvrtko Ursulin
@ 2016-04-08 15:01   ` Chris Wilson
  2016-04-11  9:19     ` Tvrtko Ursulin
  0 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-04-08 15:01 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Fri, Apr 08, 2016 at 02:54:55PM +0100, Tvrtko Ursulin wrote:
> From: Chris Wilson <chris@chris-wilson.co.uk>
> 
> If we move the release of the GEM request (i.e. decoupling it from the
> various lists used for client and context tracking) after it is complete
> (either by the GPU retiring the request, or by the caller cancelling the
> request), we can remove the requirement that the final unreference of
> the GEM request need to be under the struct_mutex.
> 
> v2: Execlists as always is badly asymetric and year old patches still
> haven't landed to fix it up.
> 
> v3: Extracted, rebased and fixed for GuC. (Tvrtko Ursulin)

After you mentioned the unbalanced, the patches I reordered to fix that
are:

https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=83dcde26caa26f4113c3e441c3c96c504fd88e13
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=9f386a21d3f28db763102b5c4f97a90bd0dcfd08
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=9afd878e2c9f7825b99dc839c7b5deb553b62e32
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=a842a2b0b7e90148966f35488209c969a9a9da54
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/4] Eliminating the execlist retired request queue
  2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
                   ` (4 preceding siblings ...)
  2016-04-08 14:08 ` ✗ Fi.CI.BAT: failure for Eliminating the execlist retired request queue Patchwork
@ 2016-04-09  8:03 ` Chris Wilson
  2016-04-11  9:19   ` Tvrtko Ursulin
  5 siblings, 1 reply; 20+ messages in thread
From: Chris Wilson @ 2016-04-09  8:03 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Fri, Apr 08, 2016 at 02:54:54PM +0100, Tvrtko Ursulin wrote:
> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> 
> Retired request queue coupled with retired work handler is a scalability
> concern on some workloads of which one dramatic example is gem_close_race.
> 
> This series depend on i915_gem_object_pin_map series by Chris Wilson.

My biggest concern with this is that this touches code that has a known
GPF and so imo cannot be tested until that *regression* has been fixed.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed
  2016-04-08 14:37   ` Chris Wilson
@ 2016-04-11  8:58     ` Tvrtko Ursulin
  0 siblings, 0 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-11  8:58 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin, Alex Dai, Dave Gordon,
	Nick Hoath


On 08/04/16 15:37, Chris Wilson wrote:
> On Fri, Apr 08, 2016 at 02:54:56PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko@ursulin.net>
>>
>> In GuC mode submitting requests is only putting them into the
>> GuC queue with the actual submission to hardware following at
>> some future point. This makes the per engine last context
>> tracking insufficient for closing the premature context
>> unpin race.
>>
>> Instead we need to make requests track and pin the previous
>> context on the same engine, and only unpin them when they
>> themselves are retired. This will ensure contexts remain pinned
>> in all cases until the are fully saved by the GPU.
>>
>> v2: Only track contexts.
>>
>> Signed-off-by: Tvrtko Ursulin <tvrtko@ursulin.net>
>> Cc: Alex Dai <yu.dai@intel.com>
>> Cc: Dave Gordon <david.s.gordon@intel.com>
>> Cc: Nick Hoath <nicholas.hoath@intel.com>
>> Cc: Chris Wilson <chris@chris-wilson.co.uk>
>> ---
>>   drivers/gpu/drm/i915/i915_drv.h  |  4 +++-
>>   drivers/gpu/drm/i915/i915_gem.c  |  5 +++++
>>   drivers/gpu/drm/i915/intel_lrc.c | 17 +++++++++++++++++
>>   3 files changed, 25 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/gpu/drm/i915/i915_drv.h b/drivers/gpu/drm/i915/i915_drv.h
>> index fa199b7e51b5..216b2297ae3e 100644
>> --- a/drivers/gpu/drm/i915/i915_drv.h
>> +++ b/drivers/gpu/drm/i915/i915_drv.h
>> @@ -2283,6 +2283,9 @@ struct drm_i915_gem_request {
>>   	struct intel_context *ctx;
>>   	struct intel_ringbuffer *ringbuf;
>>
>> +	/** Context preceding this one on the same engine. Can be NULL. */
>> +	struct intel_context *prev_ctx;
>> +
>>   	/** Batch buffer related to this request if any (used for
>>   	    error state dump only) */
>>   	struct drm_i915_gem_object *batch_obj;
>> @@ -2318,7 +2321,6 @@ struct drm_i915_gem_request {
>>
>>   	/** Execlists no. of times this request has been sent to the ELSP */
>>   	int elsp_submitted;
>> -
>>   };
>>
>>   struct drm_i915_gem_request * __must_check
>> diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
>> index c1df696f9002..118911ce7231 100644
>> --- a/drivers/gpu/drm/i915/i915_gem.c
>> +++ b/drivers/gpu/drm/i915/i915_gem.c
>> @@ -1445,6 +1445,11 @@ static void i915_gem_request_retire(struct drm_i915_gem_request *request)
>>   	 */
>>   	request->ringbuf->last_retired_head = request->postfix;
>>
>> +	if (request->prev_ctx) {
>> +		intel_lr_context_unpin(request->prev_ctx, request->engine);
>> +		request->prev_ctx = NULL;
>> +	}
>> +
>>   	__i915_gem_request_release(request);
>>   }
>>
>> diff --git a/drivers/gpu/drm/i915/intel_lrc.c b/drivers/gpu/drm/i915/intel_lrc.c
>> index 31445aa3429b..3d08dac0a9b0 100644
>> --- a/drivers/gpu/drm/i915/intel_lrc.c
>> +++ b/drivers/gpu/drm/i915/intel_lrc.c
>> @@ -798,6 +798,23 @@ intel_logical_ring_advance_and_submit(struct drm_i915_gem_request *request)
>>   	if (intel_engine_stopped(engine))
>>   		return 0;
>>
>> +	/*
>> +	 * Pin the previous context on this engine to ensure it is not
>> +	 * prematurely unpinned in GuC mode.
>> +	 * Previous context will be unpinned when this request is retired,
>> +	 * ensuring the GPU has switched out from the previous context and into
>> +	 * a new context at that point.
>> +	 */
>> +	if (engine->last_context) {
>> +		if (engine->last_context != request->i915->kernel_context) {
>> +			intel_lr_context_pin(engine->last_context, engine);
>> +			request->prev_ctx = engine->last_context;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * Track and pin the last context submitted on an engine.
>> +	 */
>>   	if (engine->last_context != request->ctx) {
>>   		if (engine->last_context)
>>   			intel_lr_context_unpin(engine->last_context, engine);
>
> We can use pinned_context to reduce the code complexity, not make it
> worse.
>
> If you first apply the execlists default context fix, adding
> pinned_context is a reduction is code.

Okay I'll pull that patch in this series.

Regards,

Tvrtko


_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC
  2016-04-08 14:40   ` Chris Wilson
@ 2016-04-11  9:05     ` Tvrtko Ursulin
  2016-04-11  9:18       ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-11  9:05 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 08/04/16 15:40, Chris Wilson wrote:
> On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> We can use the new pin/lazy unpin API for more performance
>> in the execlist submission paths.
>
> The sticking point for me was the ringbuffer and Braswell having to
> re-ioremap it every context pin. I hadn't noticed the kmap of the
> context, but this does seem like a valid use for pin_map so I'm not
> complaining... (Just saying this half the solution ;)

Pull in to the series "drm/i915: Move ioremap_wc tracking onto VMA" as 
well then?

> Oh, and you don't have pin_map for your ringbuffer either yet.

Confused, I did base this on top of "drm/i915: Refactor duplicate object 
vmap functions" and the half missing is the ioremap paths. Which you 
commented on above. What other ringbuffer is missing?

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/4] drm/i915: Stop tracking execlists retired requests
  2016-04-08 14:57   ` Chris Wilson
@ 2016-04-11  9:10     ` Tvrtko Ursulin
  2016-04-11  9:23       ` Chris Wilson
  2016-04-11  9:26       ` Chris Wilson
  0 siblings, 2 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-11  9:10 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 08/04/16 15:57, Chris Wilson wrote:
> On Fri, Apr 08, 2016 at 02:54:58PM +0100, Tvrtko Ursulin wrote:
>> @@ -615,11 +613,6 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
>>   	struct drm_i915_gem_request *cursor;
>>   	int num_elements = 0;
>>
>> -	if (request->ctx != request->i915->kernel_context)
>> -		intel_lr_context_pin(request->ctx, engine);
>> -
>> -	i915_gem_request_reference(request);
>> -
>>   	spin_lock_bh(&engine->execlist_lock);
>>
>>   	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
>> @@ -636,11 +629,12 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
>>   		if (request->ctx == tail_req->ctx) {
>>   			WARN(tail_req->elsp_submitted != 0,
>>   				"More than 2 already-submitted reqs queued\n");
>> -			list_move_tail(&tail_req->execlist_link,
>> -				       &engine->execlist_retired_req_list);
>> +			list_del(&tail_req->execlist_link);
>> +			i915_gem_request_unreference(tail_req);
>>   		}
>>   	}
>>
>> +	i915_gem_request_reference(request);
>>   	list_add_tail(&request->execlist_link, &engine->execlist_queue);
>
> If you want to get truly radical, we do not need the ref on the request
> until it is submitted to hardware. (As the request cannot be retired
> until it has done so, it can leave the execlist_queue until we commit it
> to hw, or perform the cancel).

Don't know. It is simple and nice that reference is tied to presence on 
execlist_queue.

More importantly, the patch as presented has a flaw that it dereferences 
req->ctx from execlists_check_remove_request where the context pin may 
have disappeared already due context complete interrupts getting behind 
when coallescing.

I will need to cache ctx_id in the request I think. It is fine do to 
that since ctx id must be unique and stable for a request.

Maybe even I should pull in your patch which makes ctx ids stable and 
persistent aligned with context existence and not pin. Think I've seen 
something like that somewhere.

Regards,

Tvrtko
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC
  2016-04-11  9:05     ` Tvrtko Ursulin
@ 2016-04-11  9:18       ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2016-04-11  9:18 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Apr 11, 2016 at 10:05:34AM +0100, Tvrtko Ursulin wrote:
> 
> On 08/04/16 15:40, Chris Wilson wrote:
> >On Fri, Apr 08, 2016 at 02:54:57PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>We can use the new pin/lazy unpin API for more performance
> >>in the execlist submission paths.
> >
> >The sticking point for me was the ringbuffer and Braswell having to
> >re-ioremap it every context pin. I hadn't noticed the kmap of the
> >context, but this does seem like a valid use for pin_map so I'm not
> >complaining... (Just saying this half the solution ;)
> 
> Pull in to the series "drm/i915: Move ioremap_wc tracking onto VMA"
> as well then?

That depends on "io-mapping: Specify mapping size for io_mapping_map_wc"
and the vma->map_and_fenceable (though you can workaround that) iirc .
 
> >Oh, and you don't have pin_map for your ringbuffer either yet.
> 
> Confused, I did base this on top of "drm/i915: Refactor duplicate
> object vmap functions" and the half missing is the ioremap paths.
> Which you commented on above. What other ringbuffer is missing?

No, I was just thinking it was still pending because I had a fixup in my
tree: drm/i915: Treat ringbuffer writes as write to normal memory

That and a patch to use WC vmaps for Braswell as well (that's still in
preliminary form, but if you are interested
https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=3fa97e044f3b5a2c23e44ec659a8199f134d8fb5
it really helps with context creation stress tests - and not much else!
;)
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel
  2016-04-08 15:01   ` Chris Wilson
@ 2016-04-11  9:19     ` Tvrtko Ursulin
  0 siblings, 0 replies; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-11  9:19 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx, Tvrtko Ursulin


On 08/04/16 16:01, Chris Wilson wrote:
> On Fri, Apr 08, 2016 at 02:54:55PM +0100, Tvrtko Ursulin wrote:
>> From: Chris Wilson <chris@chris-wilson.co.uk>
>>
>> If we move the release of the GEM request (i.e. decoupling it from the
>> various lists used for client and context tracking) after it is complete
>> (either by the GPU retiring the request, or by the caller cancelling the
>> request), we can remove the requirement that the final unreference of
>> the GEM request need to be under the struct_mutex.
>>
>> v2: Execlists as always is badly asymetric and year old patches still
>> haven't landed to fix it up.
>>
>> v3: Extracted, rebased and fixed for GuC. (Tvrtko Ursulin)
>
> After you mentioned the unbalanced, the patches I reordered to fix that
> are:
>
> https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=83dcde26caa26f4113c3e441c3c96c504fd88e13
> https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=9f386a21d3f28db763102b5c4f97a90bd0dcfd08
> https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=9afd878e2c9f7825b99dc839c7b5deb553b62e32
> https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=a842a2b0b7e90148966f35488209c969a9a9da54

Want to send these four as standalone straight away for review then?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/4] Eliminating the execlist retired request queue
  2016-04-09  8:03 ` [RFC 0/4] " Chris Wilson
@ 2016-04-11  9:19   ` Tvrtko Ursulin
  2016-04-11 10:20     ` Chris Wilson
  0 siblings, 1 reply; 20+ messages in thread
From: Tvrtko Ursulin @ 2016-04-11  9:19 UTC (permalink / raw)
  To: Chris Wilson, Intel-gfx


On 09/04/16 09:03, Chris Wilson wrote:
> On Fri, Apr 08, 2016 at 02:54:54PM +0100, Tvrtko Ursulin wrote:
>> From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
>>
>> Retired request queue coupled with retired work handler is a scalability
>> concern on some workloads of which one dramatic example is gem_close_race.
>>
>> This series depend on i915_gem_object_pin_map series by Chris Wilson.
>
> My biggest concern with this is that this touches code that has a known
> GPF and so imo cannot be tested until that *regression* has been fixed.

That's the ringbuffer one or something else?

Regards,

Tvrtko

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/4] drm/i915: Stop tracking execlists retired requests
  2016-04-11  9:10     ` Tvrtko Ursulin
@ 2016-04-11  9:23       ` Chris Wilson
  2016-04-11  9:26       ` Chris Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2016-04-11  9:23 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Tvrtko Ursulin, Intel-gfx

On Mon, Apr 11, 2016 at 10:10:56AM +0100, Tvrtko Ursulin wrote:
> 
> On 08/04/16 15:57, Chris Wilson wrote:
> >On Fri, Apr 08, 2016 at 02:54:58PM +0100, Tvrtko Ursulin wrote:
> >>@@ -615,11 +613,6 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
> >>  	struct drm_i915_gem_request *cursor;
> >>  	int num_elements = 0;
> >>
> >>-	if (request->ctx != request->i915->kernel_context)
> >>-		intel_lr_context_pin(request->ctx, engine);
> >>-
> >>-	i915_gem_request_reference(request);
> >>-
> >>  	spin_lock_bh(&engine->execlist_lock);
> >>
> >>  	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
> >>@@ -636,11 +629,12 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
> >>  		if (request->ctx == tail_req->ctx) {
> >>  			WARN(tail_req->elsp_submitted != 0,
> >>  				"More than 2 already-submitted reqs queued\n");
> >>-			list_move_tail(&tail_req->execlist_link,
> >>-				       &engine->execlist_retired_req_list);
> >>+			list_del(&tail_req->execlist_link);
> >>+			i915_gem_request_unreference(tail_req);
> >>  		}
> >>  	}
> >>
> >>+	i915_gem_request_reference(request);
> >>  	list_add_tail(&request->execlist_link, &engine->execlist_queue);
> >
> >If you want to get truly radical, we do not need the ref on the request
> >until it is submitted to hardware. (As the request cannot be retired
> >until it has done so, it can leave the execlist_queue until we commit it
> >to hw, or perform the cancel).
> 
> Don't know. It is simple and nice that reference is tied to presence
> on execlist_queue.
> 
> More importantly, the patch as presented has a flaw that it
> dereferences req->ctx from execlists_check_remove_request where the
> context pin may have disappeared already due context complete
> interrupts getting behind when coallescing.
> 
> I will need to cache ctx_id in the request I think. It is fine do to
> that since ctx id must be unique and stable for a request.
> 
> Maybe even I should pull in your patch which makes ctx ids stable
> and persistent aligned with context existence and not pin. Think
> I've seen something like that somewhere.

Yes, both OA, PASID and vGPU depend upon having stable ctx-id for the
lifetime of the context. Dave expressed a concern that the GuC maybe
interpretting it as the LRCA, but as far as I can see, lrc->ring_lcra
and lrc->context_id are distinct and we never use the global context
identifier itself (the closest is the low 32bits of lrc_desc which do
not include our unique id).
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 4/4] drm/i915: Stop tracking execlists retired requests
  2016-04-11  9:10     ` Tvrtko Ursulin
  2016-04-11  9:23       ` Chris Wilson
@ 2016-04-11  9:26       ` Chris Wilson
  1 sibling, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2016-04-11  9:26 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Tvrtko Ursulin, Intel-gfx

On Mon, Apr 11, 2016 at 10:10:56AM +0100, Tvrtko Ursulin wrote:
> 
> On 08/04/16 15:57, Chris Wilson wrote:
> >On Fri, Apr 08, 2016 at 02:54:58PM +0100, Tvrtko Ursulin wrote:
> >>@@ -615,11 +613,6 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
> >>  	struct drm_i915_gem_request *cursor;
> >>  	int num_elements = 0;
> >>
> >>-	if (request->ctx != request->i915->kernel_context)
> >>-		intel_lr_context_pin(request->ctx, engine);
> >>-
> >>-	i915_gem_request_reference(request);
> >>-
> >>  	spin_lock_bh(&engine->execlist_lock);
> >>
> >>  	list_for_each_entry(cursor, &engine->execlist_queue, execlist_link)
> >>@@ -636,11 +629,12 @@ static void execlists_context_queue(struct drm_i915_gem_request *request)
> >>  		if (request->ctx == tail_req->ctx) {
> >>  			WARN(tail_req->elsp_submitted != 0,
> >>  				"More than 2 already-submitted reqs queued\n");
> >>-			list_move_tail(&tail_req->execlist_link,
> >>-				       &engine->execlist_retired_req_list);
> >>+			list_del(&tail_req->execlist_link);
> >>+			i915_gem_request_unreference(tail_req);
> >>  		}
> >>  	}
> >>
> >>+	i915_gem_request_reference(request);
> >>  	list_add_tail(&request->execlist_link, &engine->execlist_queue);
> >
> >If you want to get truly radical, we do not need the ref on the request
> >until it is submitted to hardware. (As the request cannot be retired
> >until it has done so, it can leave the execlist_queue until we commit it
> >to hw, or perform the cancel).
> 
> Don't know. It is simple and nice that reference is tied to presence
> on execlist_queue.
> 
> More importantly, the patch as presented has a flaw that it
> dereferences req->ctx from execlists_check_remove_request where the
> context pin may have disappeared already due context complete
> interrupts getting behind when coallescing.
> 
> I will need to cache ctx_id in the request I think. It is fine do to
> that since ctx id must be unique and stable for a request.

For comparison, tracking by requests rather than contexts:

https://cgit.freedesktop.org/~ickle/linux-2.6/commit/?h=tasklet&id=e23791ef88636aa6e3f850b61ab2c4e027af0e52
-Chris
> 

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC 0/4] Eliminating the execlist retired request queue
  2016-04-11  9:19   ` Tvrtko Ursulin
@ 2016-04-11 10:20     ` Chris Wilson
  0 siblings, 0 replies; 20+ messages in thread
From: Chris Wilson @ 2016-04-11 10:20 UTC (permalink / raw)
  To: Tvrtko Ursulin; +Cc: Intel-gfx

On Mon, Apr 11, 2016 at 10:19:29AM +0100, Tvrtko Ursulin wrote:
> 
> On 09/04/16 09:03, Chris Wilson wrote:
> >On Fri, Apr 08, 2016 at 02:54:54PM +0100, Tvrtko Ursulin wrote:
> >>From: Tvrtko Ursulin <tvrtko.ursulin@intel.com>
> >>
> >>Retired request queue coupled with retired work handler is a scalability
> >>concern on some workloads of which one dramatic example is gem_close_race.
> >>
> >>This series depend on i915_gem_object_pin_map series by Chris Wilson.
> >
> >My biggest concern with this is that this touches code that has a known
> >GPF and so imo cannot be tested until that *regression* has been fixed.
> 
> That's the ringbuffer one or something else?

There's a GPF when we try to require a stale request because our state
tracking is caput.

https://bugs.freedesktop.org/show_bug.cgi?id=93907

Skylake was also misrendering under the same load, but that's likely to
be a different bug that I couldn't reproduce after fixing the above.
-Chris

-- 
Chris Wilson, Intel Open Source Technology Centre
_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2016-04-11 10:20 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08 13:54 [RFC 0/4] Eliminating the execlist retired request queue Tvrtko Ursulin
2016-04-08 13:54 ` [RFC 1/4] drm/i915: Move releasing of the GEM request from free to retire/cancel Tvrtko Ursulin
2016-04-08 15:01   ` Chris Wilson
2016-04-11  9:19     ` Tvrtko Ursulin
2016-04-08 13:54 ` [RFC 2/4] drm/i915/guc: Keep the previous context pinned until the next one has been completed Tvrtko Ursulin
2016-04-08 14:37   ` Chris Wilson
2016-04-11  8:58     ` Tvrtko Ursulin
2016-04-08 13:54 ` [RFC 3/4] drm/i915: Use new i915_gem_object_pin_map for LRC Tvrtko Ursulin
2016-04-08 14:40   ` Chris Wilson
2016-04-11  9:05     ` Tvrtko Ursulin
2016-04-11  9:18       ` Chris Wilson
2016-04-08 13:54 ` [RFC 4/4] drm/i915: Stop tracking execlists retired requests Tvrtko Ursulin
2016-04-08 14:57   ` Chris Wilson
2016-04-11  9:10     ` Tvrtko Ursulin
2016-04-11  9:23       ` Chris Wilson
2016-04-11  9:26       ` Chris Wilson
2016-04-08 14:08 ` ✗ Fi.CI.BAT: failure for Eliminating the execlist retired request queue Patchwork
2016-04-09  8:03 ` [RFC 0/4] " Chris Wilson
2016-04-11  9:19   ` Tvrtko Ursulin
2016-04-11 10:20     ` Chris Wilson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.